Claude Code Plugins

Community-maintained marketplace

Feedback

inspect-county-data-health

@afrojuju1/county_scraper_2
1
0

Inspect data health for a given county and tax year by checking MongoDB records and data/<county>/<year> artifacts for completeness and obvious anomalies.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name inspect-county-data-health
description Inspect data health for a given county and tax year by checking MongoDB records and data/<county>/<year> artifacts for completeness and obvious anomalies.
license MIT
compatibility opencode
metadata [object Object]

What I do

I provide a data health check for a given county and tax year.

For a specific county + tax_year, I:

  • Analyze MongoDB records for that slice:
    • Does data exist?
    • Rough row counts (and how they compare to expectations if available)
    • Obvious field-level anomalies (e.g. missing owners, empty mailing addresses, zero or null critical fields)
  • Inspect filesystem artifacts under data/<county>/<year>/... (the current layout)
    • Presence of any logs or auxiliary files used by the pipeline
  • Cross-check run metadata in Mongo (e.g. processing_runs) for that county/year
  • Produce a short health report:
    • Is this county/year probably OK, suspicious, or clearly broken?
    • What kinds of issues show up (coverage, schema, or value-level)?

This skill is aligned with the current design where the pipeline syncs directly to MongoDB and writes per-county/year artifacts under data/<county>/<year>/, not data_unified or data_raw.


When to use me

Use this skill when:

  • You’ve run the pipeline for a county/year and want a quick sanity check
  • You suspect there might be silent data issues (e.g. low coverage, malformed addresses)
  • You’re deciding which counties to trust for downstream consumers (exports, APIs, analytics)

This is a read-only health check, not a fixer.


Inputs I expect

  • county (required, string)
    • e.g. "travis"
  • tax_year (required, int)
    • e.g. 2025
  • expected_min_records (optional, int)
    • If provided, I’ll compare actual record counts to this threshold.

If expected_min_records is not provided, I’ll still report counts, but I won’t assert strict pass/fail on coverage.


Project assumptions

I assume:

  • This skill is run from the project root.
  • Core property data lives in MongoDB, with collections that can be filtered by county and tax_year.
  • Per-county/year artifacts live under:
    • data/<county>/<tax_year>/...
  • There is run metadata (e.g. processing_runs) that can be used to see whether this county/year has a completed or failed run.

If these assumptions do not hold (e.g. MongoDB is unreachable or the expected collections are missing), I’ll report that explicitly.


How I work (high level)

  1. Normalize input

    • Confirm county and tax_year are present.
  2. Check MongoDB connectivity

    • Use MongoDB tools to ensure the DB is reachable.
  3. Gather run metadata

    • Look up processing_runs (or equivalent) entries for this county/year:
      • Has a run completed?
      • Was it marked success/failed/partial?
      • Are there any stored error messages?
  4. Inspect main data collections

    • Query the main property data collection(s) for records matching this county/year.
    • Compute simple metrics, for example:
      • Total record count
      • % of records with missing owner names
      • % of records with missing or malformed mailing addresses
      • % of records with obviously invalid numeric fields (e.g. non-positive improvement values when they should be > 0, if that pattern exists)
    • If expected_min_records is provided:
      • Compare actual count vs expected and flag if significantly lower.
  5. Inspect data/// artifacts

    • Check whether data/<county>/<tax_year>/ exists.
    • Note presence of any logs or sidecar files that might indicate warnings or partial processing.
  6. Summarize health

    I’ll produce a concise report with sections like:

    • RUN STATUS
      • Based on processing_runs (if available).
    • COVERAGE
      • Record counts, and whether they meet expected_min_records (if provided).
    • FIELD ANOMALIES
      • High-level stats on missing or malformed key fields.
    • FILESYSTEM ARTIFACTS
      • Basic info on data/<county>/<tax_year>/ presence.
    • OVERALL ASSESSMENT
      • healthy, suspicious, or broken, with a short rationale.

Safety and scope rules

  • I am strictly read-only; I do not modify MongoDB or any files.
  • I do not attempt any repairs; I only surface issues and possible causes.
  • If schema details differ from expectations, I’ll adjust the analysis to what is actually present instead of forcing a specific shape.

Example usage

"Use inspect-county-data-health for county travis in 2025, expecting at least 200k records."

I will:

  • Check MongoDB for travis/2025 processing runs
  • Count travis/2025 records in the main property data collection(s)
  • Compute simple anomaly metrics on key fields
  • Look for data/travis/2025/ on disk
  • Return a short health report, including whether the ~200k expectation was met.