Claude Code Plugins

Community-maintained marketplace

Feedback

Define data quality rules, profiling strategies, validation frameworks, and quality metrics.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name data-quality-planning
description Define data quality rules, profiling strategies, validation frameworks, and quality metrics.
allowed-tools Read, Write, Glob, Grep, Task

Data Quality Planning

When to Use This Skill

Use this skill when:

  • Data Quality Planning tasks - Working on define data quality rules, profiling strategies, validation frameworks, and quality metrics
  • Planning or design - Need guidance on Data Quality Planning approaches
  • Best practices - Want to follow established patterns and standards

Overview

Data quality planning establishes rules, processes, and metrics to ensure data is fit for its intended purpose. Quality dimensions help categorize and measure data health.

Data Quality Dimensions

Six Core Dimensions

Dimension Definition Example Metric
Accuracy Data correctly represents reality % records matching source of truth
Completeness All required data is present % non-null required fields
Consistency Data agrees across systems % matching cross-system values
Timeliness Data is available when needed Avg latency from source to target
Uniqueness No duplicate records % unique on key columns
Validity Data conforms to rules % records passing validation

Quality Dimension Matrix

# Quality Assessment: Customer Domain

| Dimension | Weight | Target | Current | Gap |
|-----------|--------|--------|---------|-----|
| Accuracy | 25% | 99% | 97% | -2% |
| Completeness | 25% | 98% | 95% | -3% |
| Consistency | 20% | 99% | 94% | -5% |
| Timeliness | 10% | 99% | 99% | 0% |
| Uniqueness | 15% | 100% | 98% | -2% |
| Validity | 5% | 99% | 96% | -3% |
| **Overall** | 100% | 98.6% | 96.2% | -2.4% |

Data Profiling

Profiling Types

Type Purpose Output
Column Understand data distribution Min, Max, Null %, Distinct count
Cross-column Find relationships Correlations, functional dependencies
Cross-table Validate referential integrity Orphan records, FK violations
Cross-system Compare across sources Discrepancies, sync issues

Profiling Template

# Column Profile: customers.email

## Statistics
| Metric | Value |
|--------|-------|
| Total Rows | 1,250,000 |
| Distinct Values | 1,180,000 |
| Null Count | 12,500 (1%) |
| Empty String | 2,340 (0.2%) |
| Min Length | 5 |
| Max Length | 254 |
| Avg Length | 24.3 |

## Pattern Analysis
| Pattern | Count | Example |
|---------|-------|---------|
| \w+@\w+\.\w+ | 1,185,000 | user@domain.com |
| NULL | 12,500 | NULL |
| Invalid format | 50,160 | user@domain, @domain.com |

## Value Distribution
| Domain | Count | % |
|--------|-------|---|
| gmail.com | 312,000 | 26% |
| outlook.com | 187,500 | 15% |
| company.com | 125,000 | 10% |
| Other | 612,500 | 49% |

## Issues Found
- 4% invalid email format
- 1% null values (required field)
- 5.6% duplicate emails

Validation Rules

Rule Categories

Category Description Example
Format Pattern matching Email regex, phone format
Range Value boundaries Age 0-120, price > 0
Referential FK constraints Order.customer_id exists
Business Domain logic Discount <= 50%
Cross-field Field relationships End date >= Start date
Aggregate Group-level Daily sales > $1000

Rule Definition Template

# Validation Rule: VR-CUST-001

## Metadata
- Name: Valid Email Format
- Domain: Customer
- Severity: Error
- Owner: Customer Data Steward

## Rule Definition
- Field: email
- Type: Format
- Pattern: ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

## Threshold
- Error: < 99% compliance
- Warning: < 99.5% compliance
- Target: 99.9% compliance

## Remediation
1. Flag invalid emails for review
2. Attempt auto-correction (lowercase, trim)
3. Queue for customer contact if unresolvable

SQL Validation Examples

-- Completeness check
SELECT
    'email' AS column_name,
    COUNT(*) AS total_rows,
    COUNT(email) AS non_null_rows,
    CAST(COUNT(email) * 100.0 / COUNT(*) AS DECIMAL(5,2)) AS completeness_pct
FROM customers;

-- Uniqueness check
SELECT
    'customer_id' AS column_name,
    COUNT(*) AS total_rows,
    COUNT(DISTINCT customer_id) AS unique_values,
    CAST(COUNT(DISTINCT customer_id) * 100.0 / COUNT(*) AS DECIMAL(5,2)) AS uniqueness_pct
FROM customers;

-- Format validation
SELECT
    COUNT(*) AS total_rows,
    SUM(CASE WHEN email LIKE '%_@_%.__%' THEN 1 ELSE 0 END) AS valid_format,
    SUM(CASE WHEN email NOT LIKE '%_@_%.__%' THEN 1 ELSE 0 END) AS invalid_format
FROM customers;

-- Referential integrity
SELECT
    o.order_id,
    o.customer_id
FROM orders o
LEFT JOIN customers c ON o.customer_id = c.customer_id
WHERE c.customer_id IS NULL;

-- Business rule
SELECT *
FROM orders
WHERE discount_percentage > 50
   OR total_amount < 0
   OR order_date > GETDATE();

C# Validation Framework

public interface IValidationRule<T>
{
    string RuleName { get; }
    ValidationSeverity Severity { get; }
    ValidationResult Validate(T entity);
}

public class EmailFormatRule : IValidationRule<Customer>
{
    private static readonly Regex EmailPattern = new(
        @"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$",
        RegexOptions.Compiled);

    public string RuleName => "VR-CUST-001";
    public ValidationSeverity Severity => ValidationSeverity.Error;

    public ValidationResult Validate(Customer entity)
    {
        if (string.IsNullOrEmpty(entity.Email))
        {
            return ValidationResult.Fail(RuleName, "Email is required");
        }

        if (!EmailPattern.IsMatch(entity.Email))
        {
            return ValidationResult.Fail(RuleName,
                $"Invalid email format: {entity.Email}");
        }

        return ValidationResult.Pass(RuleName);
    }
}

public class DataQualityValidator<T>
{
    private readonly IEnumerable<IValidationRule<T>> _rules;
    private readonly ILogger _logger;

    public async Task<QualityReport> ValidateAsync(
        IEnumerable<T> records,
        CancellationToken ct)
    {
        var report = new QualityReport
        {
            StartTime = DateTime.UtcNow,
            TotalRecords = 0
        };

        var ruleResults = _rules.ToDictionary(
            r => r.RuleName,
            r => new RuleResult { RuleName = r.RuleName });

        foreach (var record in records)
        {
            report.TotalRecords++;

            foreach (var rule in _rules)
            {
                var result = rule.Validate(record);
                var ruleResult = ruleResults[rule.RuleName];

                if (result.IsValid)
                {
                    ruleResult.PassCount++;
                }
                else
                {
                    ruleResult.FailCount++;
                    ruleResult.Failures.Add(result);
                }
            }
        }

        report.RuleResults = ruleResults.Values.ToList();
        report.EndTime = DateTime.UtcNow;
        report.OverallScore = CalculateOverallScore(ruleResults.Values);

        return report;
    }
}

Quality Metrics Dashboard

KPIs Template

# Data Quality Dashboard

## Overall Score
| Domain | Score | Trend | Status |
|--------|-------|-------|--------|
| Customer | 96.2% | ↑ +0.5% | 🟡 Warning |
| Product | 98.5% | ↑ +0.1% | 🟢 Healthy |
| Order | 99.1% | → 0% | 🟢 Healthy |
| Inventory | 94.8% | ↓ -1.2% | 🔴 Critical |

## Rule Compliance
| Rule ID | Rule Name | Target | Actual | Status |
|---------|-----------|--------|--------|--------|
| VR-CUST-001 | Valid Email | 99% | 95.4% | 🔴 |
| VR-CUST-002 | Unique SSN | 100% | 99.9% | 🟢 |
| VR-ORD-001 | Valid Total | 100% | 100% | 🟢 |
| VR-INV-001 | Positive Qty | 100% | 97.2% | 🔴 |

## Trend (Last 30 Days)
| Week | Customer | Product | Order | Inventory |
|------|----------|---------|-------|-----------|
| W1 | 95.7% | 98.4% | 99.1% | 96.0% |
| W2 | 95.9% | 98.5% | 99.1% | 95.5% |
| W3 | 96.0% | 98.5% | 99.1% | 95.0% |
| W4 | 96.2% | 98.5% | 99.1% | 94.8% |

Data Cleansing Patterns

Cleansing Operations

Operation Description Example
Standardization Consistent format "USA" → "United States"
Deduplication Remove duplicates Merge customer records
Enrichment Add missing data Append geocodes
Correction Fix known errors Typo correction
Imputation Fill missing values Mean/median substitution

Cleansing Pipeline

public class CustomerCleansingPipeline
{
    public Customer Cleanse(Customer raw)
    {
        var cleansed = new Customer
        {
            CustomerId = raw.CustomerId,

            // Standardization
            Email = raw.Email?.Trim().ToLowerInvariant(),
            Phone = StandardizePhone(raw.Phone),

            // Formatting
            FirstName = ToTitleCase(raw.FirstName?.Trim()),
            LastName = ToTitleCase(raw.LastName?.Trim()),

            // Enrichment
            State = LookupStateFromZip(raw.PostalCode) ?? raw.State,

            // Validation
            PostalCode = ValidatePostalCode(raw.PostalCode)
                ? raw.PostalCode
                : null  // Mark for review
        };

        return cleansed;
    }

    private string StandardizePhone(string phone)
    {
        if (string.IsNullOrEmpty(phone)) return null;

        var digits = new string(phone.Where(char.IsDigit).ToArray());

        return digits.Length switch
        {
            10 => $"+1-{digits[..3]}-{digits[3..6]}-{digits[6..]}",
            11 when digits[0] == '1' => $"+{digits[0]}-{digits[1..4]}-{digits[4..7]}-{digits[7..]}",
            _ => phone  // Return original if non-standard
        };
    }
}

Quality Monitoring

Alerting Rules

# Quality Alert Configuration

## Critical Alerts (Immediate)
| Condition | Action |
|-----------|--------|
| Overall score < 90% | Page on-call, Slack #data-critical |
| Rule failure > 10% | Email data steward, create incident |
| New duplicate rate > 1% | Slack #data-ops |

## Warning Alerts (Daily Digest)
| Condition | Action |
|-----------|--------|
| Score decrease > 2% | Include in daily report |
| Approaching threshold | Email data steward |
| Trend declining 3+ days | Escalate to owner |

## Informational (Weekly Report)
| Condition | Action |
|-----------|--------|
| All scores stable | Include in weekly summary |
| Improvements noted | Celebrate in report |

Validation Checklist

  • Quality dimensions defined and weighted
  • Data profiling completed for critical fields
  • Validation rules documented with thresholds
  • Severity levels assigned to rules
  • Remediation procedures defined
  • Quality metrics and KPIs established
  • Monitoring and alerting configured
  • Cleansing procedures documented
  • Data stewards assigned for remediation

Integration Points

Inputs from:

  • data-governance skill → Quality standards
  • conceptual-modeling skill → Business rules
  • er-modeling skill → Constraints and relationships

Outputs to:

  • migration-planning skill → Validation steps
  • ETL/ELT pipelines → Quality gates
  • Data catalog → Quality metadata
  • Dashboards → Quality metrics