| name | data-quality |
| description | Audit and fix data quality issues across all counties. Validates against UnifiedPropertyRecord model. Use when asked to: audit data quality, fix correctness/completeness issues, or run validation.
|
References
Workflow
Phase 1: Audit
python -m county_parser.cli.validate --county all --size 500
Rank by correctness/completeness, identify top 5 worst performers.
Note: PDF counties show 5-10% completeness (normal - only ~15 of 130 fields available). Check field-level coverage instead.
Phase 2: Diagnose
| Diagnosis |
Signal |
Action |
| Parser bug |
Issues in current year |
Fix parser code |
| Needs reprocess |
Parser fixed but data stale |
inv core.rqp <county> -r |
| Source limitation |
Field not in source |
Cannot fix |
Lookup issue type in field-issues.md.
Phase 3: Fix
Apply fix from fix-patterns.md. Common issues:
| Issue |
Fix Location |
zip_float, state_case |
csv_parser_base.py or county parser |
pdf_label_pollution |
County parser _parse_property() - add skip_prefixes |
pdf_entity_pollution |
County parser - filter ^\d{2,3}\s*- pattern |
missing_value_mapping |
County parser map_to_unified() |
Phase 4: Reprocess
inv core.rqp <county> -r --tax-year 2024
docker logs parcelum-worker-1 --tail 30
Phase 5: Verify
ruff check <file>
python -m county_parser.cli.validate --county <name> --size 500
Parser Types
| Type |
Identify By |
Fix Location |
| explicit |
*<county>*.py exists |
county_parser/parsers/<county>_county_parser.py |
| pdf |
In PDF_ONLY_COUNTIES |
county_parser/parsers/pdf_parser_base.py |
| csv |
Has .csv source files |
county_parser/parsers/csv_parser_base.py |
ls county_parser/parsers/*<county>*.py
grep "<county>" county_parser/parsers/pdf_parser_registry.py
Safety
- Verify with
ruff check <file>
- Re-validate after fix
- Do NOT commit - leave to user