| name | data-quality-test-generator |
| description | Generate comprehensive dbt test suites following FF Analytics data quality standards and dbt 1.10+ syntax. This skill should be used when creating tests for new dbt models, adding tests to existing models, standardizing test coverage, or implementing data quality gates. Covers grain uniqueness, FK relationships, enum validation, and freshness tests. |
Data Quality Test Generator
Generate comprehensive dbt test suites for Fantasy Football Analytics models following Kimball patterns, implementation requirements, and dbt 1.10+ syntax.
When to Use This Skill
Use this skill proactively when:
- Creating tests for new dbt models (staging, facts, dimensions, marts)
- User asks to "add tests for {model}" or "improve test coverage"
- Implementing data quality gates per implementation requirements
- Standardizing existing tests to dbt 1.10+ syntax
- Creating tests alongside models (integration with dbt-model-builder skill)
dbt 1.10+ Syntax Requirements
CRITICAL: FF Analytics uses dbt 1.10+ which requires the new syntax:
- Use
data_tests:nottests: - Wrap test arguments in
arguments:block - Use
config:for test configuration (severity, where clauses)
Correct vs Incorrect Syntax
✅ CORRECT (dbt 1.10+):
data_tests:
- accepted_values:
arguments:
values: ['value1', 'value2']
config:
severity: error
❌ INCORRECT (old syntax):
tests:
- accepted_values:
values: ['value1', 'value2']
severity: error
Test Generation by Model Type
Task 1: Staging Model Tests
Staging models normalize raw provider data and require schema conformance tests.
Test Priorities:
- not_null on all primary key columns
- unique on single-column natural keys
- accepted_values on categorical/enum columns
- Basic data type validation
Example Pattern:
version: 2
models:
- name: stg_nflverse__weekly
description: "NFLverse weekly player stats"
columns:
- name: player_id
description: "gsis_id from nflverse"
data_tests:
- not_null
- name: season
data_tests:
- not_null
- dbt_utils.expression_is_true:
arguments:
expression: ">= 2020" # Reasonable season bounds
- name: week
data_tests:
- not_null
- accepted_values:
arguments:
values: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]
- name: position
data_tests:
- not_null
- accepted_values:
arguments:
values: ['QB', 'RB', 'WR', 'TE', 'K', 'DEF']
See: references/example_staging_tests.yml
Task 2: Fact Table Tests
Fact tables capture measurable events and require grain + FK + measure validation.
Test Priorities:
- Grain uniqueness using
dbt_utils.unique_combination_of_columns - Relationship tests for all foreign keys to dimensions
- not_null on required measures
- accepted_values on 2×2 model enums (measure_domain, stat_kind, horizon)
- Expression tests for business logic (non-negative scores, etc.)
Example Pattern:
version: 2
models:
- name: fact_player_stats
description: "Player statistics fact table (grain: player + game + stat)"
# Grain uniqueness test
data_tests:
- dbt_utils.unique_combination_of_columns:
arguments:
combination_of_columns:
- player_id
- game_id
- stat_name
- measure_domain
- stat_kind
config:
severity: error
columns:
# Foreign key to dim_player
- name: player_id
description: "Foreign key to dim_player (mfl_id)"
data_tests:
- not_null
- relationships:
arguments:
to: ref('dim_player')
field: player_id
# Foreign key to dim_game
- name: game_id
description: "Foreign key to dim_game"
data_tests:
- not_null
- relationships:
arguments:
to: ref('dim_game')
field: game_id
# 2x2 model enum: measure_domain
- name: measure_domain
description: "Real-world vs fantasy scoring dimension"
data_tests:
- not_null
- accepted_values:
arguments:
values: ['real_world', 'fantasy']
# 2x2 model enum: stat_kind
- name: stat_kind
description: "Actuals vs projections dimension"
data_tests:
- not_null
- accepted_values:
arguments:
values: ['actual', 'projection']
# Measure validation
- name: stat_value
description: "Value of the statistical measure"
data_tests:
- not_null
See: references/example_fact_tests.yml
Task 3: Dimension Table Tests
Dimensions provide descriptive context and require uniqueness + SCD validation.
Test Priorities:
- unique + not_null on surrogate key (dimension PK)
- not_null on natural key columns
- SCD Type 2 tests (if applicable):
valid_from <= valid_to(or valid_to IS NULL for current)- No overlapping date ranges for same natural key
- accepted_values on categorical attributes
Example Pattern:
version: 2
models:
- name: dim_player
description: "Player dimension with SCD Type 2 history"
columns:
# Surrogate key
- name: player_sk
description: "Surrogate key for SCD Type 2"
data_tests:
- unique
- not_null
# Natural key
- name: player_id
description: "Natural key (mfl_id canonical)"
data_tests:
- not_null
# SCD Type 2 validity
- name: valid_from
description: "Start of validity period"
data_tests:
- not_null
- name: valid_to
description: "End of validity period (NULL = current)"
# SCD Type 2 date logic test
- name: _scd_validation
description: "Ensure valid_from <= valid_to when valid_to is not null"
data_tests:
- dbt_utils.expression_is_true:
arguments:
expression: "valid_from <= COALESCE(valid_to, CURRENT_DATE + INTERVAL '100 years')"
# Categorical attributes
- name: position
description: "Player position"
data_tests:
- accepted_values:
arguments:
values: ['QB', 'RB', 'WR', 'TE', 'K', 'DEF']
- name: status
description: "Active/Inactive/Retired"
data_tests:
- accepted_values:
arguments:
values: ['active', 'inactive', 'retired', 'unknown']
See: references/example_dimension_tests.yml
Task 4: Mart Tests
Marts are analytics-ready views requiring business logic validation.
Test Priorities:
- Row count thresholds (dbt_utils.expression_is_true on table)
- Metric validation (non-negative fantasy points, reasonable ranges)
- Partition completeness (all weeks/seasons represented)
- Aggregate consistency (totals match source facts)
Example Pattern:
version: 2
models:
- name: mart_fantasy_actuals_weekly
description: "Weekly fantasy points by player (analytics-ready)"
data_tests:
# Minimum row count threshold
- dbt_utils.expression_is_true:
arguments:
expression: "(SELECT COUNT(*) FROM {{ ref('mart_fantasy_actuals_weekly') }}) > 10000"
config:
severity: warn
columns:
- name: player_name
data_tests:
- not_null
- name: fantasy_points
description: "Total fantasy points for the week"
data_tests:
- not_null
- dbt_utils.expression_is_true:
arguments:
expression: ">= 0" # Fantasy points can't be negative
- name: games_played
data_tests:
- accepted_values:
arguments:
values: [0, 1] # Weekly grain, max 1 game per week
Task 5: Source Freshness Tests
Implement freshness monitoring for critical data sources.
Example Freshness Policies (always check latest documentation for the most up to date policies)
- NFLverse: warn_after 24h, error_after 48h (during season)
- Commissioner Sheets: warn_after 12h, error_after 24h
- KTC: warn_after 48h, error_after 96h (less time-sensitive)
Example Pattern:
version: 2
sources:
- name: nflverse
description: "NFLverse data provider"
freshness:
warn_after: {count: 24, period: hour}
error_after: {count: 48, period: hour}
tables:
- name: weekly
description: "Weekly player statistics"
identifier: "dt=*/**" # Partitioned by date
# Freshness test on max snapshot_date
freshness:
warn_after: {count: 24, period: hour}
error_after: {count: 48, period: hour}
Data Quality Requirements
The following tests are required in general:
- Accepted values on enums - All categorical columns must have accepted_values tests
- Freshness tests - All source tables must have warn/error thresholds
- Grain uniqueness - All fact tables must test grain with unique_combination_of_columns
- FK relationships - All foreign keys must have relationship tests to dimensions
Best Practices
Test Coverage Targets
- Staging models: >80% column coverage (focus on PKs and enums)
- Fact tables: 100% grain + FK coverage
- Dimensions: 100% PK/NK coverage
- Marts: Business logic validation (thresholds, non-negative, etc.)
Test Naming
Tests are auto-named by dbt. For custom tests, use descriptive names:
- name: _custom_business_rule
description: "Validate that revenue >= cost for all transactions"
Severity Levels
- error: Blocking failures (grain violations, missing FKs)
- warn: Non-blocking issues (freshness delays, low row counts)
config:
severity: warn # Use for non-critical tests
Where Clauses
Use where clauses to filter test scope:
data_tests:
- relationships:
arguments:
to: ref('dim_player')
field: player_id
config:
where: "player_id IS NOT NULL AND season >= 2020"
Integration with Other Skills
- dbt-model-builder: Create tests alongside model creation
Resources
assets/
test_suite_template.yml- Base template with dbt 1.10+ syntax for common test patterns
references/
example_fact_tests.yml- Complete fact table test suite (fact_player_projections)example_staging_tests.yml- Staging model test patternsexample_dimension_tests.yml- Dimension table test patterns with SCD Type 2 validation
Common Test Patterns
Testing 2×2 Model Enums
The 2×2 stat model uses consistent enums across all facts:
# measure_domain: real_world vs fantasy
- name: measure_domain
data_tests:
- accepted_values:
arguments:
values: ['real_world', 'fantasy']
# stat_kind: actual vs projection
- name: stat_kind
data_tests:
- accepted_values:
arguments:
values: ['actual', 'projection']
# horizon: projection timeframe (only for stat_kind='projection')
- name: horizon
data_tests:
- accepted_values:
arguments:
values: ['weekly', 'full_season', 'rest_of_season']
config:
where: "stat_kind = 'projection'"
Testing Date Ranges
- name: game_date
data_tests:
- dbt_utils.expression_is_true:
arguments:
expression: ">= '2020-01-01'" # Reasonable lower bound
- dbt_utils.expression_is_true:
arguments:
expression: "<= CURRENT_DATE + INTERVAL '1 year'" # No future dates beyond 1 year
Testing Aggregations
# Ensure weekly totals match season totals
- name: _season_total_consistency
data_tests:
- dbt_utils.expression_is_true:
arguments:
expression: |
(
SELECT SUM(weekly_points)
FROM {{ ref('mart_weekly_points') }}
WHERE season = 2024
) = (
SELECT season_total
FROM {{ ref('mart_season_totals') }}
WHERE season = 2024
)