Claude Code Plugins

Community-maintained marketplace

Feedback

validate-parser-output

@afrojuju1/county_scraper_2
1
0

|

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name validate-parser-output
description Validate parser output after county processing completes. Compares source file counts against MongoDB, checks sub-file enrichment coverage (owners, land, improvements), and detects type mismatches. Use after running `inv core.rqp` to verify data integrity, or when debugging why enrichment data is missing.

Sub-agent

This skill uses the parser-output-validator sub-agent for validation.

Spawn via Task tool:

Task: "Use parser-output-validator to validate <county> <year>"

When to Use

  • After inv core.rqp <county> completes successfully
  • When debugging empty owners/land/improvements arrays
  • When triage-failed-counties finds "success_but_suspicious" status
  • After fixing parser code and reprocessing

Validation Checks

The sub-agent performs:

  1. Record count comparison - Source files vs MongoDB (±1% tolerance)
  2. Sub-file enrichment coverage - owners, land_details, improvements, mailing_address
  3. Type consistency - tax_year is int, no Decimal128 values
  4. Required fields - county_id and tax_year not null

Coverage Thresholds

Field Minimum Red Flag
owners 95% 0% = streaming failed
mailing_address 95% 0% = streaming failed
land_details 50% 0% = streaming failed
improvements 40% 0% = streaming failed

0% coverage = bulk updates didn't match any records. Check:

  • tax_year type (string vs int)
  • county_id format in query vs stored document

Common Failures

Symptom Root Cause Fix
0% enrichment tax_year passed as string rq_tasks.py: tax_year = int(tax_year)
0% enrichment Query field mismatch Check _stream_*_subfile() query
Decimal errors Not converted to float Wrap _safe_decimal() in float()
0% owners (Travis-like) Parser only checks py_owner_name/appr_owner_name Add fallback to owner_name field in base.py
0% owners (Harris-like) county_id format mismatch in enrichment query Check zfill() padding matches stored format
100% null valuation Setting nested field on null parent Set entire valuation object, not dot notation
Stale records with no raw_property_data Records from failed prior runs Delete via MongoDB before reprocess

New Patterns from Session (Dec 2024)

Empty Owners - Field Name Mismatch

Travis pattern: AppraisalInfoParser base class only checked py_owner_name and appr_owner_name, but Travis uses owner_name.

Detection:

// Check owner field names in sample record
db.parcels.findOne({county: "<county>"}, {raw_property_data: 1})
// Look for owner_name vs py_owner_name vs appr_owner_name

Fix location: etl/parsers/appraisal_info_parser/base.py lines ~299 and ~682

Empty Owners - County ID Format Mismatch

Harris pattern: Enrichment queries used zfill(13) padding but parcels stored IDs as str(int(acct)) without padding.

Detection:

// Compare county_id formats
db.parcels.findOne({county: "harris"}, {county_id: 1})
// Check if IDs have leading zeros or not

Fix location: _build_parcel_query_from_record() in county-specific parser

Null Valuation Object - MongoDB Nested Field Limitation

Dallas pattern: Two-phase processing set valuation: null in phase 1, then tried to use dot notation $set: {'valuation.market_value': X} in phase 2.

Critical: MongoDB cannot set nested fields when parent is null.

Detection:

db.parcels.countDocuments({county: "dallas", valuation: null})
// If > 0, valuation streaming failed

Fix: Set entire valuation object at once:

valuation_obj = {
    'market_value': set_fields.get('market_value'),
    'total_value': set_fields.get('market_value'),
    'land_value': set_fields.get('land_value'),
    'improvement_value': set_fields.get('improvement_value'),
}
set_fields['valuation'] = valuation_obj

Stale Records Cleanup

Dallas pattern: Prior failed runs left records with no raw_property_data.

Detection & Fix:

db.parcels.deleteMany({county: "dallas", raw_property_data: {$exists: false}})

Integration

Called by data-quality skill (Phase 6)

Task: "Use parser-output-validator to validate fort_bend 2025"

Called by triage-failed-counties

When diagnosing success_but_suspicious category.

Related