name	validate-parser-output
description	Validate parser output after county processing completes. Compares source file counts against MongoDB, checks sub-file enrichment coverage (owners, land, improvements), and detects type mismatches. Use after running `inv core.rqp` to verify data integrity, or when debugging why enrichment data is missing.

Sub-agent

This skill uses the parser-output-validator sub-agent for validation.

Spawn via Task tool:

Task: "Use parser-output-validator to validate <county> <year>"

When to Use

After inv core.rqp <county> completes successfully
When debugging empty owners/land/improvements arrays
When triage-failed-counties finds "success_but_suspicious" status
After fixing parser code and reprocessing

Validation Checks

The sub-agent performs:

Record count comparison - Source files vs MongoDB (±1% tolerance)
Sub-file enrichment coverage - owners, land_details, improvements, mailing_address
Type consistency - tax_year is int, no Decimal128 values
Required fields - county_id and tax_year not null

Coverage Thresholds

Field	Minimum	Red Flag
owners	95%	0% = streaming failed
mailing_address	95%	0% = streaming failed
land_details	50%	0% = streaming failed
improvements	40%	0% = streaming failed

0% coverage = bulk updates didn't match any records. Check:

tax_year type (string vs int)
county_id format in query vs stored document

Common Failures

Symptom	Root Cause	Fix
0% enrichment	tax_year passed as string	`rq_tasks.py`: `tax_year = int(tax_year)`
0% enrichment	Query field mismatch	Check `_stream_*_subfile()` query
Decimal errors	Not converted to float	Wrap `_safe_decimal()` in `float()`
0% owners (Travis-like)	Parser only checks `py_owner_name`/`appr_owner_name`	Add fallback to `owner_name` field in base.py
0% owners (Harris-like)	county_id format mismatch in enrichment query	Check `zfill()` padding matches stored format
100% null valuation	Setting nested field on null parent	Set entire `valuation` object, not dot notation
Stale records with no raw_property_data	Records from failed prior runs	Delete via MongoDB before reprocess

New Patterns from Session (Dec 2024)

Empty Owners - Field Name Mismatch

Travis pattern: AppraisalInfoParser base class only checked py_owner_name and appr_owner_name, but Travis uses owner_name.

Detection:

// Check owner field names in sample record
db.parcels.findOne({county: "<county>"}, {raw_property_data: 1})
// Look for owner_name vs py_owner_name vs appr_owner_name

Fix location: etl/parsers/appraisal_info_parser/base.py lines ~299 and ~682

Empty Owners - County ID Format Mismatch

Harris pattern: Enrichment queries used zfill(13) padding but parcels stored IDs as str(int(acct)) without padding.

Detection:

// Compare county_id formats
db.parcels.findOne({county: "harris"}, {county_id: 1})
// Check if IDs have leading zeros or not

Fix location: _build_parcel_query_from_record() in county-specific parser

Null Valuation Object - MongoDB Nested Field Limitation

Dallas pattern: Two-phase processing set valuation: null in phase 1, then tried to use dot notation $set: {'valuation.market_value': X} in phase 2.

Critical: MongoDB cannot set nested fields when parent is null.

Detection:

db.parcels.countDocuments({county: "dallas", valuation: null})
// If > 0, valuation streaming failed

Fix: Set entire valuation object at once:

valuation_obj = {
    'market_value': set_fields.get('market_value'),
    'total_value': set_fields.get('market_value'),
    'land_value': set_fields.get('land_value'),
    'improvement_value': set_fields.get('improvement_value'),
}
set_fields['valuation'] = valuation_obj

Stale Records Cleanup

Dallas pattern: Prior failed runs left records with no raw_property_data.

Detection & Fix:

db.parcels.deleteMany({county: "dallas", raw_property_data: {$exists: false}})

Integration

Called by data-quality skill (Phase 6)

Task: "Use parser-output-validator to validate fort_bend 2025"

Called by triage-failed-counties

When diagnosing success_but_suspicious category.

parser-output-validator agent - The sub-agent
data-quality skill - Orchestrates validation in Phase 6
inspect-raw-data skill - Deep-dive when validation fails

validate-parser-output

Install Skill

SKILL.md

Sub-agent

When to Use

Validation Checks

Coverage Thresholds

Common Failures

New Patterns from Session (Dec 2024)

Empty Owners - Field Name Mismatch

Empty Owners - County ID Format Mismatch

Null Valuation Object - MongoDB Nested Field Limitation

Stale Records Cleanup

Integration

Called by data-quality skill (Phase 6)

Called by triage-failed-counties

Related