| name | validate-parser-output |
| description | Validate parser output after county processing completes. Compares source file counts against MongoDB, checks sub-file enrichment coverage (owners, land, improvements), and detects type mismatches. Use after running `inv core.rqp` to verify data integrity, or when debugging why enrichment data is missing. |
Sub-agent
This skill uses the parser-output-validator sub-agent for validation.
Spawn via Task tool:
Task: "Use parser-output-validator to validate <county> <year>"
When to Use
- After
inv core.rqp <county>completes successfully - When debugging empty owners/land/improvements arrays
- When triage-failed-counties finds "success_but_suspicious" status
- After fixing parser code and reprocessing
Validation Checks
The sub-agent performs:
- Record count comparison - Source files vs MongoDB (±1% tolerance)
- Sub-file enrichment coverage - owners, land_details, improvements, mailing_address
- Type consistency - tax_year is int, no Decimal128 values
- Required fields - county_id and tax_year not null
Coverage Thresholds
| Field | Minimum | Red Flag |
|---|---|---|
| owners | 95% | 0% = streaming failed |
| mailing_address | 95% | 0% = streaming failed |
| land_details | 50% | 0% = streaming failed |
| improvements | 40% | 0% = streaming failed |
0% coverage = bulk updates didn't match any records. Check:
- tax_year type (string vs int)
- county_id format in query vs stored document
Common Failures
| Symptom | Root Cause | Fix |
|---|---|---|
| 0% enrichment | tax_year passed as string | rq_tasks.py: tax_year = int(tax_year) |
| 0% enrichment | Query field mismatch | Check _stream_*_subfile() query |
| Decimal errors | Not converted to float | Wrap _safe_decimal() in float() |
| 0% owners (Travis-like) | Parser only checks py_owner_name/appr_owner_name |
Add fallback to owner_name field in base.py |
| 0% owners (Harris-like) | county_id format mismatch in enrichment query | Check zfill() padding matches stored format |
| 100% null valuation | Setting nested field on null parent | Set entire valuation object, not dot notation |
| Stale records with no raw_property_data | Records from failed prior runs | Delete via MongoDB before reprocess |
New Patterns from Session (Dec 2024)
Empty Owners - Field Name Mismatch
Travis pattern: AppraisalInfoParser base class only checked py_owner_name and appr_owner_name, but Travis uses owner_name.
Detection:
// Check owner field names in sample record
db.parcels.findOne({county: "<county>"}, {raw_property_data: 1})
// Look for owner_name vs py_owner_name vs appr_owner_name
Fix location: etl/parsers/appraisal_info_parser/base.py lines ~299 and ~682
Empty Owners - County ID Format Mismatch
Harris pattern: Enrichment queries used zfill(13) padding but parcels stored IDs as str(int(acct)) without padding.
Detection:
// Compare county_id formats
db.parcels.findOne({county: "harris"}, {county_id: 1})
// Check if IDs have leading zeros or not
Fix location: _build_parcel_query_from_record() in county-specific parser
Null Valuation Object - MongoDB Nested Field Limitation
Dallas pattern: Two-phase processing set valuation: null in phase 1, then tried to use dot notation $set: {'valuation.market_value': X} in phase 2.
Critical: MongoDB cannot set nested fields when parent is null.
Detection:
db.parcels.countDocuments({county: "dallas", valuation: null})
// If > 0, valuation streaming failed
Fix: Set entire valuation object at once:
valuation_obj = {
'market_value': set_fields.get('market_value'),
'total_value': set_fields.get('market_value'),
'land_value': set_fields.get('land_value'),
'improvement_value': set_fields.get('improvement_value'),
}
set_fields['valuation'] = valuation_obj
Stale Records Cleanup
Dallas pattern: Prior failed runs left records with no raw_property_data.
Detection & Fix:
db.parcels.deleteMany({county: "dallas", raw_property_data: {$exists: false}})
Integration
Called by data-quality skill (Phase 6)
Task: "Use parser-output-validator to validate fort_bend 2025"
Called by triage-failed-counties
When diagnosing success_but_suspicious category.
Related
- parser-output-validator agent - The sub-agent
- data-quality skill - Orchestrates validation in Phase 6
- inspect-raw-data skill - Deep-dive when validation fails