name	data-source-connect
description	Connect your own data source to replace the demo unicorns data. Use when the user wants to use their own database URL or CSV file instead of the sample data. Triggers on requests to connect database, import CSV, change data source, use own data, or switch from demo data.

Data Source Connect

Replace the demo unicorns data with user's own data source. Supports two modes:

Database URL - Connect to an existing PostgreSQL database
CSV File - Import a CSV file and create a new table

Mode 1: Database URL

When user provides a PostgreSQL connection URL:

Step 1: Update Environment

Update .env file:

POSTGRES_URL="<user-provided-connection-string>"

Step 2: Discover Schema

Connect to the database and retrieve schema information:

SELECT table_name, column_name, data_type, is_nullable
FROM information_schema.columns
WHERE table_schema = 'public'
ORDER BY table_name, ordinal_position;

Step 3: Update Actions

Edit app/actions.ts - update the schema in these locations:

generateQuery function (~line 18): Replace the unicorns schema in the system prompt with the new table schema. Include:
- Table name and column definitions
- Data type hints (e.g., "valuation is in billions")
- Available enum values for categorical fields
- Query patterns specific to the data
explainQuery function (~line 115): Replace the unicorns schema in the system prompt.
runGenerateSQLQuery function (~line 93): Update the table name in error handling.

Step 4: Generate Sample Prompts

Create 3-5 sample natural language queries for the new data to help users understand what they can ask.

Mode 2: CSV File

When user provides a CSV file:

Step 1: Analyze CSV Structure

Read the CSV file and analyze:

Column names (header row)
Data types (infer from sample values)
Null/empty value handling requirements

Step 2: Generate Schema

Create SQL schema based on CSV structure. Type inference rules:

Numbers with decimals -> DECIMAL(10, 2)
Integers -> INTEGER
Dates (various formats) -> DATE
Boolean values -> BOOLEAN
Everything else -> VARCHAR(255) or TEXT for long content

Step 3: Update seed.ts

Edit lib/seed.ts:

Update CREATE TABLE statement with new schema
Update CSV parsing logic for new column names
Add type conversion for each column
Handle missing/null values with appropriate defaults or skip logic

Example seed.ts structure:

export async function seed() {
  await pool.query(`
    CREATE TABLE IF NOT EXISTS <table_name> (
      id SERIAL PRIMARY KEY,
      <column1> <type1>,
      <column2> <type2>,
      ...
    );
  `);

  // Parse CSV
  const results: any[] = [];
  const csvFilePath = path.join(process.cwd(), '<filename>.csv');

  await new Promise((resolve, reject) => {
    fs.createReadStream(csvFilePath)
      .pipe(csv())
      .on('data', (data) => results.push(data))
      .on('end', resolve)
      .on('error', reject);
  });

  // Insert with null handling
  for (const row of results) {
    // Skip rows with critical missing data or use defaults
    const value1 = row['ColumnName'] || null;

    await pool.query(
      `INSERT INTO <table_name> (...) VALUES (...) ON CONFLICT DO NOTHING`,
      [value1, ...]
    );
  }
}

Step 4: Update Actions

Same as Database URL Mode Step 3 - update schema references in app/actions.ts.

Step 5: Run Seed

pnpm run seed

Step 6: Generate Sample Prompts

Create 3-5 sample natural language queries for the new data to help users understand what they can ask. Examples should cover:

Basic filtering (e.g., "Show all records where X > Y")
Aggregations (e.g., "What is the total/average X by Y?")
Time-based queries if applicable (e.g., "Show trends over time")
Top/bottom queries (e.g., "Top 10 by X")

Critical Files to Update

File	What to Update
`.env`	`POSTGRES_URL` connection string
`lib/seed.ts`	Table schema, CSV parsing, insert logic
`app/actions.ts`	Schema in `generateQuery` and `explainQuery` prompts

Schema Prompt Template

When updating the schema in app/actions.ts, use this template:

system: `You are a SQL (postgres) and data visualization expert. Your job is to help the user write a SQL query to retrieve the data they need. The table schema is as follows:

<table_name> (
  id SERIAL PRIMARY KEY,
  <column_name> <data_type> <constraints>,
  ...
);

Only retrieval queries are allowed.

[Add data-specific hints here, e.g.:]
- Use ILIKE for case-insensitive text search
- <field_name> values include: value1, value2, value3
- <numeric_field> is in <units> (e.g., thousands, millions)

EVERY QUERY SHOULD RETURN QUANTITATIVE DATA THAT CAN BE PLOTTED ON A CHART!
`,

Null Value Handling

For CSV imports with missing data:

Skip row: When critical identifier is missing
Default value: Use sensible defaults (0 for numbers, 'Unknown' for strings)
Allow null: For truly optional fields

data-source-connect

Install Skill

SKILL.md