Claude Code Plugins

Community-maintained marketplace

Feedback

data-quality-test-generator

@zazu-22/ff_data_analytics
0
0

Generate comprehensive dbt test suites following FF Analytics data quality standards and dbt 1.10+ syntax. This skill should be used when creating tests for new dbt models, adding tests to existing models, standardizing test coverage, or implementing data quality gates. Covers grain uniqueness, FK relationships, enum validation, and freshness tests.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name data-quality-test-generator
description Generate comprehensive dbt test suites following FF Analytics data quality standards and dbt 1.10+ syntax. This skill should be used when creating tests for new dbt models, adding tests to existing models, standardizing test coverage, or implementing data quality gates. Covers grain uniqueness, FK relationships, enum validation, and freshness tests.

Data Quality Test Generator

Generate comprehensive dbt test suites for Fantasy Football Analytics models following Kimball patterns, implementation requirements, and dbt 1.10+ syntax.

When to Use This Skill

Use this skill proactively when:

  • Creating tests for new dbt models (staging, facts, dimensions, marts)
  • User asks to "add tests for {model}" or "improve test coverage"
  • Implementing data quality gates per implementation requirements
  • Standardizing existing tests to dbt 1.10+ syntax
  • Creating tests alongside models (integration with dbt-model-builder skill)

dbt 1.10+ Syntax Requirements

CRITICAL: FF Analytics uses dbt 1.10+ which requires the new syntax:

  • Use data_tests: not tests:
  • Wrap test arguments in arguments: block
  • Use config: for test configuration (severity, where clauses)

Correct vs Incorrect Syntax

✅ CORRECT (dbt 1.10+):

data_tests:
  - accepted_values:
      arguments:
        values: ['value1', 'value2']
      config:
        severity: error

❌ INCORRECT (old syntax):

tests:
  - accepted_values:
      values: ['value1', 'value2']
      severity: error

Test Generation by Model Type

Task 1: Staging Model Tests

Staging models normalize raw provider data and require schema conformance tests.

Test Priorities:

  1. not_null on all primary key columns
  2. unique on single-column natural keys
  3. accepted_values on categorical/enum columns
  4. Basic data type validation

Example Pattern:

version: 2

models:
  - name: stg_nflverse__weekly
    description: "NFLverse weekly player stats"

    columns:
      - name: player_id
        description: "gsis_id from nflverse"
        data_tests:
          - not_null

      - name: season
        data_tests:
          - not_null
          - dbt_utils.expression_is_true:
              arguments:
                expression: ">= 2020"  # Reasonable season bounds

      - name: week
        data_tests:
          - not_null
          - accepted_values:
              arguments:
                values: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]

      - name: position
        data_tests:
          - not_null
          - accepted_values:
              arguments:
                values: ['QB', 'RB', 'WR', 'TE', 'K', 'DEF']

See: references/example_staging_tests.yml

Task 2: Fact Table Tests

Fact tables capture measurable events and require grain + FK + measure validation.

Test Priorities:

  1. Grain uniqueness using dbt_utils.unique_combination_of_columns
  2. Relationship tests for all foreign keys to dimensions
  3. not_null on required measures
  4. accepted_values on 2×2 model enums (measure_domain, stat_kind, horizon)
  5. Expression tests for business logic (non-negative scores, etc.)

Example Pattern:

version: 2

models:
  - name: fact_player_stats
    description: "Player statistics fact table (grain: player + game + stat)"

    # Grain uniqueness test
    data_tests:
      - dbt_utils.unique_combination_of_columns:
          arguments:
            combination_of_columns:
              - player_id
              - game_id
              - stat_name
              - measure_domain
              - stat_kind
          config:
            severity: error

    columns:
      # Foreign key to dim_player
      - name: player_id
        description: "Foreign key to dim_player (mfl_id)"
        data_tests:
          - not_null
          - relationships:
              arguments:
                to: ref('dim_player')
                field: player_id

      # Foreign key to dim_game
      - name: game_id
        description: "Foreign key to dim_game"
        data_tests:
          - not_null
          - relationships:
              arguments:
                to: ref('dim_game')
                field: game_id

      # 2x2 model enum: measure_domain
      - name: measure_domain
        description: "Real-world vs fantasy scoring dimension"
        data_tests:
          - not_null
          - accepted_values:
              arguments:
                values: ['real_world', 'fantasy']

      # 2x2 model enum: stat_kind
      - name: stat_kind
        description: "Actuals vs projections dimension"
        data_tests:
          - not_null
          - accepted_values:
              arguments:
                values: ['actual', 'projection']

      # Measure validation
      - name: stat_value
        description: "Value of the statistical measure"
        data_tests:
          - not_null

See: references/example_fact_tests.yml

Task 3: Dimension Table Tests

Dimensions provide descriptive context and require uniqueness + SCD validation.

Test Priorities:

  1. unique + not_null on surrogate key (dimension PK)
  2. not_null on natural key columns
  3. SCD Type 2 tests (if applicable):
    • valid_from <= valid_to (or valid_to IS NULL for current)
    • No overlapping date ranges for same natural key
  4. accepted_values on categorical attributes

Example Pattern:

version: 2

models:
  - name: dim_player
    description: "Player dimension with SCD Type 2 history"

    columns:
      # Surrogate key
      - name: player_sk
        description: "Surrogate key for SCD Type 2"
        data_tests:
          - unique
          - not_null

      # Natural key
      - name: player_id
        description: "Natural key (mfl_id canonical)"
        data_tests:
          - not_null

      # SCD Type 2 validity
      - name: valid_from
        description: "Start of validity period"
        data_tests:
          - not_null

      - name: valid_to
        description: "End of validity period (NULL = current)"

      # SCD Type 2 date logic test
      - name: _scd_validation
        description: "Ensure valid_from <= valid_to when valid_to is not null"
        data_tests:
          - dbt_utils.expression_is_true:
              arguments:
                expression: "valid_from <= COALESCE(valid_to, CURRENT_DATE + INTERVAL '100 years')"

      # Categorical attributes
      - name: position
        description: "Player position"
        data_tests:
          - accepted_values:
              arguments:
                values: ['QB', 'RB', 'WR', 'TE', 'K', 'DEF']

      - name: status
        description: "Active/Inactive/Retired"
        data_tests:
          - accepted_values:
              arguments:
                values: ['active', 'inactive', 'retired', 'unknown']

See: references/example_dimension_tests.yml

Task 4: Mart Tests

Marts are analytics-ready views requiring business logic validation.

Test Priorities:

  1. Row count thresholds (dbt_utils.expression_is_true on table)
  2. Metric validation (non-negative fantasy points, reasonable ranges)
  3. Partition completeness (all weeks/seasons represented)
  4. Aggregate consistency (totals match source facts)

Example Pattern:

version: 2

models:
  - name: mart_fantasy_actuals_weekly
    description: "Weekly fantasy points by player (analytics-ready)"

    data_tests:
      # Minimum row count threshold
      - dbt_utils.expression_is_true:
          arguments:
            expression: "(SELECT COUNT(*) FROM {{ ref('mart_fantasy_actuals_weekly') }}) > 10000"
          config:
            severity: warn

    columns:
      - name: player_name
        data_tests:
          - not_null

      - name: fantasy_points
        description: "Total fantasy points for the week"
        data_tests:
          - not_null
          - dbt_utils.expression_is_true:
              arguments:
                expression: ">= 0"  # Fantasy points can't be negative

      - name: games_played
        data_tests:
          - accepted_values:
              arguments:
                values: [0, 1]  # Weekly grain, max 1 game per week

Task 5: Source Freshness Tests

Implement freshness monitoring for critical data sources.

Example Freshness Policies (always check latest documentation for the most up to date policies)

  • NFLverse: warn_after 24h, error_after 48h (during season)
  • Commissioner Sheets: warn_after 12h, error_after 24h
  • KTC: warn_after 48h, error_after 96h (less time-sensitive)

Example Pattern:

version: 2

sources:
  - name: nflverse
    description: "NFLverse data provider"
    freshness:
      warn_after: {count: 24, period: hour}
      error_after: {count: 48, period: hour}

    tables:
      - name: weekly
        description: "Weekly player statistics"
        identifier: "dt=*/**"  # Partitioned by date

        # Freshness test on max snapshot_date
        freshness:
          warn_after: {count: 24, period: hour}
          error_after: {count: 48, period: hour}

Data Quality Requirements

The following tests are required in general:

  1. Accepted values on enums - All categorical columns must have accepted_values tests
  2. Freshness tests - All source tables must have warn/error thresholds
  3. Grain uniqueness - All fact tables must test grain with unique_combination_of_columns
  4. FK relationships - All foreign keys must have relationship tests to dimensions

Best Practices

Test Coverage Targets

  • Staging models: >80% column coverage (focus on PKs and enums)
  • Fact tables: 100% grain + FK coverage
  • Dimensions: 100% PK/NK coverage
  • Marts: Business logic validation (thresholds, non-negative, etc.)

Test Naming

Tests are auto-named by dbt. For custom tests, use descriptive names:

- name: _custom_business_rule
  description: "Validate that revenue >= cost for all transactions"

Severity Levels

  • error: Blocking failures (grain violations, missing FKs)
  • warn: Non-blocking issues (freshness delays, low row counts)
config:
  severity: warn  # Use for non-critical tests

Where Clauses

Use where clauses to filter test scope:

data_tests:
  - relationships:
      arguments:
        to: ref('dim_player')
        field: player_id
      config:
        where: "player_id IS NOT NULL AND season >= 2020"

Integration with Other Skills

  • dbt-model-builder: Create tests alongside model creation

Resources

assets/

  • test_suite_template.yml - Base template with dbt 1.10+ syntax for common test patterns

references/

  • example_fact_tests.yml - Complete fact table test suite (fact_player_projections)
  • example_staging_tests.yml - Staging model test patterns
  • example_dimension_tests.yml - Dimension table test patterns with SCD Type 2 validation

Common Test Patterns

Testing 2×2 Model Enums

The 2×2 stat model uses consistent enums across all facts:

# measure_domain: real_world vs fantasy
- name: measure_domain
  data_tests:
    - accepted_values:
        arguments:
          values: ['real_world', 'fantasy']

# stat_kind: actual vs projection
- name: stat_kind
  data_tests:
    - accepted_values:
        arguments:
          values: ['actual', 'projection']

# horizon: projection timeframe (only for stat_kind='projection')
- name: horizon
  data_tests:
    - accepted_values:
        arguments:
          values: ['weekly', 'full_season', 'rest_of_season']
      config:
        where: "stat_kind = 'projection'"

Testing Date Ranges

- name: game_date
  data_tests:
    - dbt_utils.expression_is_true:
        arguments:
          expression: ">= '2020-01-01'"  # Reasonable lower bound
    - dbt_utils.expression_is_true:
        arguments:
          expression: "<= CURRENT_DATE + INTERVAL '1 year'"  # No future dates beyond 1 year

Testing Aggregations

# Ensure weekly totals match season totals
- name: _season_total_consistency
  data_tests:
    - dbt_utils.expression_is_true:
        arguments:
          expression: |
            (
              SELECT SUM(weekly_points)
              FROM {{ ref('mart_weekly_points') }}
              WHERE season = 2024
            ) = (
              SELECT season_total
              FROM {{ ref('mart_season_totals') }}
              WHERE season = 2024
            )