name	inspect-county-data-health
description	Inspect data health for a given county and tax year by checking MongoDB records and data/<county>/<year> artifacts for completeness and obvious anomalies.
license	MIT
compatibility	opencode
metadata	[object Object]

What I do

I provide a data health check for a given county and tax year.

For a specific county + tax_year, I:

Analyze MongoDB records for that slice:
- Does data exist?
- Rough row counts (and how they compare to expectations if available)
- Obvious field-level anomalies (e.g. missing owners, empty mailing addresses, zero or null critical fields)
Inspect filesystem artifacts under data/<county>/<year>/... (the current layout)
- Presence of any logs or auxiliary files used by the pipeline
Cross-check run metadata in Mongo (e.g. processing_runs) for that county/year
Produce a short health report:
- Is this county/year probably OK, suspicious, or clearly broken?
- What kinds of issues show up (coverage, schema, or value-level)?

This skill is aligned with the current design where the pipeline syncs directly to MongoDB and writes per-county/year artifacts under data/<county>/<year>/, not data_unified or data_raw.

When to use me

Use this skill when:

You’ve run the pipeline for a county/year and want a quick sanity check
You suspect there might be silent data issues (e.g. low coverage, malformed addresses)
You’re deciding which counties to trust for downstream consumers (exports, APIs, analytics)

This is a read-only health check, not a fixer.

Inputs I expect

county (required, string)
- e.g. "travis"
tax_year (required, int)
- e.g. 2025
expected_min_records (optional, int)
- If provided, I’ll compare actual record counts to this threshold.

If expected_min_records is not provided, I’ll still report counts, but I won’t assert strict pass/fail on coverage.

Project assumptions

I assume:

This skill is run from the project root.
Core property data lives in MongoDB, with collections that can be filtered by county and tax_year.
Per-county/year artifacts live under:
- data/<county>/<tax_year>/...
There is run metadata (e.g. processing_runs) that can be used to see whether this county/year has a completed or failed run.

If these assumptions do not hold (e.g. MongoDB is unreachable or the expected collections are missing), I’ll report that explicitly.

How I work (high level)

Normalize input
- Confirm county and tax_year are present.
Check MongoDB connectivity
- Use MongoDB tools to ensure the DB is reachable.
Gather run metadata
- Look up processing_runs (or equivalent) entries for this county/year:
  - Has a run completed?
  - Was it marked success/failed/partial?
  - Are there any stored error messages?
Inspect main data collections
- Query the main property data collection(s) for records matching this county/year.
- Compute simple metrics, for example:
  - Total record count
  - % of records with missing owner names
  - % of records with missing or malformed mailing addresses
  - % of records with obviously invalid numeric fields (e.g. non-positive improvement values when they should be > 0, if that pattern exists)
- If expected_min_records is provided:
  - Compare actual count vs expected and flag if significantly lower.
Inspect data/// artifacts
- Check whether data/<county>/<tax_year>/ exists.
- Note presence of any logs or sidecar files that might indicate warnings or partial processing.
Summarize health

I’ll produce a concise report with sections like:
- RUN STATUS
  - Based on processing_runs (if available).
- COVERAGE
  - Record counts, and whether they meet expected_min_records (if provided).
- FIELD ANOMALIES
  - High-level stats on missing or malformed key fields.
- FILESYSTEM ARTIFACTS
  - Basic info on data/<county>/<tax_year>/ presence.
- OVERALL ASSESSMENT
  - healthy, suspicious, or broken, with a short rationale.

Safety and scope rules

I am strictly read-only; I do not modify MongoDB or any files.
I do not attempt any repairs; I only surface issues and possible causes.
If schema details differ from expectations, I’ll adjust the analysis to what is actually present instead of forcing a specific shape.

Example usage

"Use inspect-county-data-health for county travis in 2025, expecting at least 200k records."

I will:

Check MongoDB for travis/2025 processing runs
Count travis/2025 records in the main property data collection(s)
Compute simple anomaly metrics on key fields
Look for data/travis/2025/ on disk
Return a short health report, including whether the ~200k expectation was met.

inspect-county-data-health

Install Skill

SKILL.md