| name | bigconfig-generator |
| description | Use this skill when creating or updating Bigeye monitoring configurations (bigconfig.yml files) for BigQuery tables. Works with metadata-manager skill. |
Bigconfig Generator
Composable: Works with metadata-manager (for schema/metadata generation) and bigquery-etl-core (for conventions) When to use: Creating/updating Bigeye configurations, data quality monitoring
Overview
Generate and manage Bigeye monitoring configurations for BigQuery tables in the Mozilla bigquery-etl repository. Bigeye is Mozilla's data quality monitoring platform that checks for freshness, volume anomalies, null values, uniqueness violations, and custom business logic validation.
This skill helps configure monitoring through:
- metadata.yaml - High-level monitoring settings (freshness, volume, collections)
- bigconfig.yml - Detailed metric definitions (auto-generated via bqetl CLI)
- bigeye_custom_rules.sql - Custom SQL validation rules (optional, for complex business logic)
Official Documentation:
- bigConfig Reference: https://mozilla.github.io/bigquery-etl/reference/bigconfig/ (docs/reference/bigconfig.md)
- Bigeye Intro: https://mozilla.github.io/data-docs/cookbooks/data_monitoring/intro.html
- Bigeye Official Docs: https://docs.bigeye.com/docs/bigconfig
🚨 REQUIRED READING - Start Here
BEFORE creating monitoring configurations, READ these resources:
Existing Collections: READ
references/existing_collections.md- Collections already in use across the repository
- Notification channels by dataset/team
- Helps maintain consistency and avoid creating duplicate collections
Monitoring Patterns: READ
references/monitoring_patterns.md- Common monitoring scenarios
- Freshness vs volume monitoring
- When to use custom rules
- Configuration workflow
📋 Templates - Copy These Structures
When adding monitoring to metadata.yaml, READ and COPY from these templates:
Basic monitoring (most tables)? → READ
assets/metadata_monitoring_basic.yaml- Standard freshness and volume checks
- Collection assignment
Critical table (high priority)? → READ
assets/metadata_monitoring_critical.yaml- More aggressive monitoring settings
- Faster alerting
View (non-partitioned)? → READ
assets/metadata_monitoring_view.yaml- Monitoring for views without partitions
For custom validation rules:
- Custom SQL checks? → READ
assets/custom_rules_template.sql- Template for bigeye_custom_rules.sql
- Shows how to write validation queries
When to Use This Skill
Use this skill when:
- Creating new tables and user wants to enable monitoring
- User explicitly requests "create a bigeye config for..."
- User asks about adding data quality monitoring
- Setting up freshness or volume checks
- Creating custom validation rules
- Troubleshooting monitoring configurations
Integration with metadata-manager: When metadata-manager creates new tables, it should ask the user: "Would you like to enable Bigeye monitoring for this table?" If yes, invoke this skill.
🚨 IMPORTANT: Deployment Safety
Manual deployment is BLOCKED for safety reasons.
If a user asks to run ./bqetl monitoring deploy, warn them:
⚠️ Manual deployment can accidentally delete existing metrics. The recommended workflow is to commit your changes and let the
bqetl_artifact_deploymentDAG deploy automatically. Manual deployment is disabled in this environment.If you need to manually deploy for testing purposes, you'll need to:
- Ensure you have
BIGEYE_API_KEYset- Understand that deploying only specific tables can remove metrics from other tables
- Use
--dry-runfirst to review changes- Contact Data Engineering if you're unsure
Proceed with caution - this can affect production monitoring.
The standard workflow (update → validate → commit → push) is safe and recommended.
Prerequisites
- Table must have metadata.yaml file
- Table must be deployed to BigQuery
- Understanding of table's update schedule (daily, hourly, etc.)
- For manual deployment (discouraged):
BIGEYE_API_KEYenvironment variable must be set
Staying Current with Documentation
Always prefer official documentation over this skill's references:
- For bigConfig syntax and structure: Read docs/reference/bigconfig.md or use WebFetch on https://mozilla.github.io/bigquery-etl/reference/bigconfig/
- For available saved metrics: Check sql/bigconfig.yml in the repository (source of truth)
- For Bigeye concepts: Use WebFetch on https://mozilla.github.io/data-docs/cookbooks/data_monitoring/intro.html
- For bqetl CLI commands: Check
./bqetl monitoring --helpor the monitoring.py source code
When to use WebFetch:
- User asks about specific bigConfig features not covered in this skill
- Need to verify current syntax or available options
- References in this skill seem outdated or incomplete
- Troubleshooting issues not covered in common patterns
This skill focuses on workflow and decision-making rather than being a comprehensive bigConfig reference.
Workflow
Step 1: Determine Monitoring Requirements
Ask the user what type of monitoring they need:
For new tables created by metadata-manager: "Would you like to enable Bigeye monitoring for this table? This can check for:
- Freshness (when data was last updated)
- Volume (row count anomalies)
- Column-level validation (nulls, uniqueness, formats)
- Custom business logic validation"
For existing tables: "What type of monitoring would you like to configure?
- Basic (freshness + volume)
- Critical (freshness + volume with blocking)
- Column-level validation
- Custom SQL rules
- All of the above"
After determining monitoring type, check existing collections:
Before configuring metadata.yaml, READ references/existing_collections.md to:
- Find the dataset in "Collections by Dataset" section
- Check if there's an existing collection for this dataset/team
- Note the notification channels used by similar tables
Ask the user: "Based on existing configurations, would you like to use the [Collection Name] collection with [notification channels]? Or create a new collection?"
Step 2: Configure metadata.yaml
Add a monitoring section to metadata.yaml based on table type:
- Basic (most tables):
assets/metadata_monitoring_basic.yaml- Freshness + volume, non-blocking - Critical (production):
assets/metadata_monitoring_critical.yaml- Blocking failures, collection assignment - Views:
assets/metadata_monitoring_view.yaml- Requires explicit partition_column
Key settings:
blocking: true- Failures block deployments (use for critical tables)collection- Groups related tables, configures alertspartition_column- Required for views (or null if non-partitioned)
Step 3: Generate bigconfig.yml
Use the bqetl CLI to auto-generate bigconfig.yml from metadata.yaml:
./bqetl monitoring update <dataset>.<table>
This command:
- Reads monitoring settings from metadata.yaml
- Generates appropriate metric definitions in bigconfig.yml
- Adds freshness/volume checks based on configuration
- Uses saved metrics from sql/bigconfig.yml
What gets generated:
- If
freshness.enabled: true→ Adds freshness metric - If
volume.enabled: true→ Adds volume metric - If
blocking: true→ Usesfreshness_fail/volume_failvariants - If
collectionspecified → Groups under that collection
Step 4: Customize bigconfig.yml (Optional)
Manually edit the generated bigconfig.yml for advanced use cases:
Column-level validation: Add tag_deployments section with column_selectors and metrics (is_not_null, is_unique, is_valid_client_id, etc.). See sql/bigconfig.yml for all available saved metrics.
Lookback windows: Adjust how far back Bigeye scans data (0=latest partition, 7=last 7 days, 28=last 28 days). Use longer lookback for tables with sporadic updates.
When to customize: Column-specific validation, custom thresholds, infrequent updates, different notification channels per metric.
See references/monitoring_patterns.md for examples.
Step 5: Add Custom SQL Rules (Optional)
For complex business logic validation (cross-column checks, format validation, business rules), create bigeye_custom_rules.sql in the table directory.
Use template: assets/custom_rules_template.sql contains structure, JSON configuration block, and examples.
Key points:
- Query returns percentage (0-100) or count
- JSON comment block configures name, range, collections, owner, schedule
- Supports Jinja variables:
{{ project_id }},{{ dataset_id }},{{ table_name }}
Step 6: Validate Configuration
Validate bigconfig.yml syntax and configuration:
./bqetl monitoring validate <dataset>.<table>
What it checks:
- Valid YAML syntax
- No duplicate metric deployments
- Saved metric IDs exist
- For views: partition_column is explicitly set in metadata.yaml
Common validation errors:
- "Duplicate deployments" → Consolidate metrics under single deployment
- "Invalid metric" → Check saved_metric_id exists in sql/bigconfig.yml
- "Partition column needs to be configured" → Set
partition_columnandpartition_column_set: truefor views
Step 7: Deploy to Bigeye
Recommended approach: Automatic deployment via Airflow DAG
After validation passes, commit and push your changes to the main branch:
git add sql/<project>/<dataset>/<table>/
git commit -m "Add Bigeye monitoring for <dataset>.<table>"
git push origin main
What happens automatically:
- The
bqetl_artifact_deploymentDAG detects bigconfig.yml changes - The
publish_bigeye_monitorstask deploys all bigConfig files - Bigeye metrics are created/updated based on your configuration
- Custom SQL rules are deployed (if bigeye_custom_rules.sql exists)
This approach is recommended because:
- Ensures all bigconfig.yml files are deployed together (prevents accidental deletions)
- No need to manage
BIGEYE_API_KEYlocally - Consistent with Mozilla's deployment practices
- Deployment history tracked in git
Alternative: Manual deployment (discouraged)
⚠️ CAUTION: Avoid running
./bqetl monitoring deploylocally unless absolutely necessary. Local deployment can accidentally delete metrics if config files are not included. See docs/reference/bigconfig.md for details.
If you must deploy manually (e.g., for testing in non-production):
./bqetl monitoring deploy <dataset>.<table> --dry-run # Review changes first
./bqetl monitoring deploy <dataset>.<table> # Requires BIGEYE_API_KEY
Step 8: Test Monitoring (Optional)
After deployment, you can manually trigger monitoring checks to verify configuration:
./bqetl monitoring run <dataset>.<table> # Requires BIGEYE_API_KEY
What it does:
- Triggers all metric checks for the table
- Runs custom SQL rules
- Returns success/failure status
- Provides links to Bigeye UI for details
When to test:
- After automatic deployment via DAG completes
- After modifying monitoring configuration
- Debugging false positives/negatives
Alternative: Wait for Bigeye's scheduled runs or check results in the Bigeye UI
Common Monitoring Patterns
Standard workflow for all patterns:
- Add/update
monitoringsection in metadata.yaml - Run:
./bqetl monitoring update <dataset>.<table> - Run:
./bqetl monitoring validate <dataset>.<table> - Commit and push to main branch (automatic deployment)
Pattern 1: Basic Daily Table
Use assets/metadata_monitoring_basic.yaml template. Enables freshness and volume checks, non-blocking.
Pattern 2: Critical Production Table
Use assets/metadata_monitoring_critical.yaml template. Sets blocking: true and assigns to "Operational Checks" collection.
Pattern 3: View with Monitoring
Use assets/metadata_monitoring_view.yaml template. Must set partition_column and partition_column_set: true.
Pattern 4: Column-Level Validation
After generating basic bigconfig.yml, manually edit to add column-specific metrics. See sql/bigconfig.yml for available saved metrics (is_not_null, is_unique, is_valid_client_id, etc.).
Pattern 5: Custom Business Logic
Create bigeye_custom_rules.sql using assets/custom_rules_template.sql. Query must return percentage (0-100) or count. Configure via JSON comment block.
Integration with Other Skills
Works with metadata-manager
When metadata-manager creates new tables:
- metadata-manager should ask: "Would you like to enable Bigeye monitoring?"
- If yes, metadata-manager invokes bigconfig-generator skill
- bigconfig-generator adds monitoring configuration to metadata.yaml
- Generates bigconfig.yml via bqetl CLI
Workflow:
- metadata-manager creates schema.yaml, metadata.yaml
- metadata-manager asks about monitoring
- If yes → invoke bigconfig-generator
- bigconfig-generator adds monitoring section to metadata.yaml
- bigconfig-generator runs
./bqetl monitoring update - User validates, commits, and pushes to main (automatic deployment via DAG)
Works with bigquery-etl-core
- Uses project structure conventions
- Follows naming patterns (dataset.table)
- References common partitioning strategies (submission_date)
Troubleshooting
Deployment Errors
Deployment delays:
- Deployment happens automatically after merge to main via
bqetl_artifact_deploymentDAG - Check DAG status in Airflow UI if deployment seems delayed
- Typical deployment time: within 1 hour of merge
"Table does not exist in Bigeye"
- Table not yet ingested by Bigeye
- Wait for next schema sync or manually sync in Bigeye UI
- Check with Data Engineering if table is not appearing
"Partition column does not exist"
- Verify
partition_columnmatches actual column in schema.yaml - Check for typos in column name
Manual deployment errors (if using ./bqetl monitoring deploy): "Bigeye API token needs to be set"
- Set
BIGEYE_API_KEYenvironment variable - Note: Manual deployment is discouraged; prefer automatic DAG deployment
Validation Errors
"Duplicate deployments"
- Same column selector appears multiple times
- Consolidate metrics under single deployment
"Invalid metric"
- Referencing non-existent saved_metric_id
- Check sql/bigconfig.yml for available metrics
"Partition column needs to be configured"
- For views with monitoring enabled
- Add
partition_columnandpartition_column_set: trueto metadata.yaml
False Positives
Freshness checks failing:
- Verify table actually updated (query BigQuery)
- Check partition_column is correct
- Verify Bigeye's schedule aligns with table update schedule
- Consider longer lookback window
Volume checks failing:
- Normal for tables with varying row counts
- Consider disabling volume checks
- Use longer lookback window
- Adjust thresholds in bigconfig.yml
Best Practices
When to Enable Monitoring
Always enable:
- Production tables in dashboards/reports
- Tables with SLAs or freshness requirements
- Critical pipeline outputs
Consider enabling:
- Development/staging tables (for testing configs)
- Tables with known data quality issues
Skip monitoring:
- Temporary/scratch tables
- One-time analysis tables
- Tables with no consumers
Blocking vs Non-Blocking
Use blocking: true when:
- Failures must halt deployments
- Table is production-critical
- False positives are rare and quickly resolved
Use blocking: false when:
- Failures should alert but not block
- Table is still stabilizing
- False positives are expected
Collections
Use consistent naming:
- Group related tables by team/product
- Configure notification channels once per collection
- Makes alert management easier
Common collections:
- Team: "Subscription Platform", "Ads Team", "Growth Team"
- Function: "Operational Checks", "Data Quality"
- Environment: "Test", "Staging"
Custom Rules
Best practices:
- Return percentage (0-100) for "value" alert_conditions
- Return count for "count" alert_conditions
- Use descriptive rule names
- Set appropriate min/max ranges
- Document rule purpose in comments
- Test rules manually before deploying
Reference Documentation
Official Documentation (Always Preferred):
- docs/reference/bigconfig.md - Canonical reference for bigConfig in this repository
- sql/bigconfig.yml - Source of truth for available saved metrics
- https://mozilla.github.io/bigquery-etl/reference/bigconfig/ - Published docs
- https://mozilla.github.io/data-docs/cookbooks/data_monitoring/intro.html - Bigeye intro
- https://docs.bigeye.com/docs/bigconfig - Bigeye official documentation
Quick Reference (This Skill):
references/monitoring_patterns.md- Workflow guidance and common patterns (may be outdated)assets/metadata_monitoring_basic.yaml- Basic monitoring config templateassets/metadata_monitoring_critical.yaml- Critical table config templateassets/metadata_monitoring_view.yaml- View monitoring config templateassets/custom_rules_template.sql- Custom SQL rule template
Priority: When in doubt, read docs/reference/bigconfig.md or use WebFetch on the online docs.
Quick Reference: bqetl Monitoring Commands
# Refresh the collections reference file (run periodically to stay current)
python3 .claude/skills/bigconfig-generator/scripts/extract_collections.py
# Generate/update bigconfig.yml from metadata.yaml
./bqetl monitoring update <dataset>.<table>
# Validate bigconfig.yml syntax and configuration
./bqetl monitoring validate <dataset>.<table>
# ⚠️ DISCOURAGED: Manual deployment (prefer automatic DAG deployment)
./bqetl monitoring deploy <dataset>.<table> --dry-run # Requires BIGEYE_API_KEY
./bqetl monitoring deploy <dataset>.<table> # Requires BIGEYE_API_KEY
# Manually trigger monitoring checks (requires BIGEYE_API_KEY)
./bqetl monitoring run <dataset>.<table>
# Delete deployed monitoring (requires BIGEYE_API_KEY)
./bqetl monitoring delete <dataset>.<table> --metrics --custom-sql
Recommended workflow:
- Check
references/existing_collections.mdfor appropriate collection/channels - Update/create bigconfig.yml using
monitoring update - Validate using
monitoring validate - Commit and push to main branch
bqetl_artifact_deploymentDAG automatically deploys changes