name	prerequisite-validator
description	Automatically validates Treasure Data environment and prerequisites before CDP workflow creation. Checks database connectivity, table existence, schema validation, and credential setup to prevent workflow failures.

Prerequisite Validator Skill

Purpose

Proactively validate the environment and prerequisites before users create CDP workflows. This skill runs automatically when users express intent to create ingestion, staging, unification, or hist-union workflows, catching issues early and reducing failed deployments.

When to Activate

Automatically trigger when user mentions or requests:

Creating new ingestion workflows
Adding data sources or objects
Setting up staging transformations
Running ID unification
Creating hist-union workflows
Any CDP pipeline setup or deployment
Troubleshooting workflow failures

Keywords to watch for:

"ingest", "ingestion", "load data", "import"
"transform", "staging", "clean data"
"unify", "unification", "merge identities"
"hist-union", "combine historical"
"create workflow", "setup pipeline"

Validation Workflow

Step 0: Client Name & Database Validation (NEW - CRITICAL)

ALWAYS run this check FIRST before any workflow creation:

1. Detect database names mentioned by user:
   - Parse user's message for database references
   - Identify: source database, staging database, config database
   - Action: Extract client prefix from database names

2. Check for GENERIC database names (CRITICAL):
   - ❌ REJECT: "client_src", "client_stg", "demo_db", "demo_db_stg"
   - ❌ REJECT: Any database matching documentation examples
   - ✅ ACCEPT: Real client names (e.g., "acme_src", "nike_stg")

3. If GENERIC names detected:
   - STOP validation immediately
   - Prompt user for REAL client name:
     ```
     ⚠️ GENERIC DATABASE NAME DETECTED

     You mentioned: "{detected_database}"

     This appears to be a documentation example name. For production workflows,
     please provide your actual client name/prefix.

     Question: What is your client name or prefix?
     Examples: "acme", "nike", "walmart", "retail_co"

     This will build your databases as:
     - Source: {client}_src
     - Staging: {client}_stg
     - Config: client_config (shared)
     ```
   - Wait for user response
   - Validate provided client name (alphanumeric, underscores only)
   - Build database names: {client}_src, {client}_stg

4. Verify databases exist using MCP:
   - Use: mcp__demo_treasuredata__list_databases
   - Check if {client}_src exists
   - Check if {client}_stg exists (or inform it will be created)
   - Action: Report which databases exist vs need creation

5. Store client name for workflow generation:
   - Pass to slash command context
   - Ensure all generated files use real client name

Step 1: Database Connectivity Check

Use MCP tools to verify TD connection:

1. Check current database context:
   - Use: mcp__demo_treasuredata__current_database
   - Verify: Connection successful
   - Action: If fails, inform user to check TD credentials

2. List available databases:
   - Use: mcp__demo_treasuredata__list_databases
   - Verify: User has access to target databases
   - Action: Show available databases if target not found

Step 2: Source Table Validation

When user mentions specific tables to process:

1. Verify source tables exist:
   - Use: mcp__demo_treasuredata__describe_table
   - For each table user mentions
   - Action: Report which tables exist vs missing

2. Validate table schemas:
   - Check column count > 0
   - Verify 'time' column exists (required for TD)
   - Check for expected identifier columns (email, user_id, etc.)
   - Action: Warn if schema looks incomplete or suspicious

3. Check table data availability:
   - Use: mcp__demo_treasuredata__query with LIMIT 1
   - Verify: Table has data (not empty)
   - Action: Warn if table is empty (may indicate upstream issue)

Step 3: Target Database Validation

1. Verify target database exists:
   - Check if database in user's list
   - Action: Suggest creation command if missing

2. Check write permissions (when possible):
   - Attempt to query existing tables in target DB
   - Action: Warn if permissions might be insufficient

Step 4: Credential Validation

For ingestion workflows specifically:

1. Identify required credentials from source type:
   - BigQuery: Service account JSON
   - Klaviyo: API key
   - Shopify: API token, shop URL
   - OneTrust: Client ID, client secret
   - SFTP: SSH key or password

2. Check if user has mentioned credentials:
   - Parse user's message for credential references
   - Action: Remind user to set TD secrets if not mentioned

3. Provide secret setup guidance:
   - Show exact td secret:set command
   - Reference credential naming conventions

Step 5: Schema Compatibility Check

For staging and unification workflows:

1. Check source-target compatibility:
   - Verify source table schema matches expected pattern
   - Check for required columns based on workflow type
   - Action: Warn about potential schema mismatches

2. Validate naming conventions:
   - Check table names follow pattern: {source}_{object}_histunion
   - Verify staging tables end in _staging
   - Action: Suggest corrections if naming is inconsistent

Output Format

Provide clear, structured validation report:

## 🔍 Prerequisite Validation Report

### ✅ Client Configuration
- Client name: acme
- Source database: acme_src (exists ✅)
- Staging database: acme_stg (exists ✅)
- Config database: client_config (exists ✅)

### ✅ Database Connectivity
- Connection: PASS
- Current database: {database_name}
- Available databases: {count} found

### ✅ Source Tables
- acme_src.klaviyo_events_histunion: PASS (234 columns, 1.2M rows)
- acme_src.klaviyo_profiles_histunion: PASS (156 columns, 450K rows)
- acme_src.shopify_orders_histunion: WARNING - Table empty

### ✅ Target Database
- acme_stg: PASS (exists, accessible)

### ⚠️ Credentials
- klaviyo_api_key: NOT VERIFIED (reminder to set)

  Set credential before workflow execution:
  ```bash
  td secret:set klaviyo_api_key --project {project_name}

✅ Schema Validation

Required 'time' column: PASS (all tables)
Email columns detected: 2 tables
User ID columns detected: 3 tables

Overall Status: READY with 1 warning

Recommendation: Set klaviyo_api_key credential, then proceed with workflow creation.

Next Step: You can now use /cdp-staging:transform-batch to transform these tables.


## Validation Rules Reference

### Critical Issues (BLOCK workflow creation)
- ❌ Generic database name detected (client_src, demo_db, etc.)
- ❌ Database connection fails
- ❌ Source table does not exist
- ❌ Source table has 0 columns (corrupted)
- ❌ Missing required 'time' column

### Warnings (WARN but allow to proceed)
- ⚠️ Source table is empty (0 rows)
- ⚠️ Credentials not verified
- ⚠️ Naming convention doesn't match pattern
- ⚠️ Target database not found (user might create it)

### Info (FYI only)
- ℹ️ Table has many columns (>100)
- ℹ️ Multiple databases available
- ℹ️ Schema includes JSON columns

## Handoff to Slash Commands

After validation completes:

1. **If CRITICAL issues found**:
   - DO NOT suggest slash commands
   - Provide fix instructions
   - Wait for user to resolve

2. **If WARNINGS only**:
   - Show warnings clearly
   - Suggest slash command to proceed
   - Note what needs attention

3. **If ALL PASS**:
   - Confirm ready state
   - Recommend appropriate slash command based on intent
   - Pre-fill parameters from validation (table names, databases)

## Example Usage Flow

**User says:** "I need to transform klaviyo_events_histunion and klaviyo_profiles_histunion to staging"

**Skill activates:**
1. Detects intent: staging transformation
2. Validates database connection ✅
3. Checks both tables exist ✅
4. Verifies schemas ✅
5. Checks target database ✅
6. Reports validation success
7. Suggests: `/cdp-staging:transform-batch` with pre-filled table names

**User can proceed** with confidence that environment is ready.

## Integration with Existing Agents

This skill **complements** (not replaces) existing agents:

- **Before agent runs**: Skill validates environment
- **Agent runs**: Focused on file generation (current behavior)
- **After agent runs**: Other skills can validate output

**No changes needed** to existing slash commands or agents. This skill adds a safety layer.

## Edge Cases to Handle

1. **User provides database.table format**: Parse correctly
2. **User mentions tables from multiple databases**: Validate all
3. **User doesn't specify database**: Use current_database from MCP
4. **Table names have special characters**: Handle quotes properly
5. **Large number of tables**: Batch validation, summarize results

## Performance Considerations

- **Fast validation**: Use MCP describe_table (metadata only)
- **Smart sampling**: Query LIMIT 1 to check data existence
- **Parallel checks**: Validate multiple tables concurrently where possible
- **Caching**: Remember validation results within same conversation

## Error Handling

If validation checks fail:
- Catch MCP errors gracefully
- Provide helpful error messages
- Suggest next steps
- Don't crash or stop the conversation

## Success Metrics

Track improvement in:
- Reduced workflow failures due to missing tables
- Fewer credential-related errors
- Faster time-to-deployment
- User confidence in proceeding

---

**Remember**: This skill is about being helpful and proactive, not blocking users. Validate thoroughly, report clearly, and guide users toward success.

prerequisite-validator

Install Skill

SKILL.md