Claude Code Plugins

Community-maintained marketplace

Feedback
1
0

|

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name inspect-raw-data
description Inspect raw data for debugging - MongoDB records and source files. Use before fixing issues to understand root cause. Can inspect parsed records, raw source files (CSV, TXT, PDF, JSON), compare them, and produce health assessments.

Overview

Two inspection targets:

  1. MongoDB - Parsed records in county_data.parcels collection
  2. Source Files - Raw data in data/<county>/<year>/ directory

Workflow

1. Check Source Data Availability

Before diagnosing parsing issues, verify source files exist:

ls -la data/<county>/<year>/
Result Diagnosis Action
Directory doesn't exist Data not downloaded Download data first
Directory empty Download failed/incomplete Re-download
Has CSV/TXT/PDF files Source available Continue to parsing check

2. Check MongoDB Records

Query for county/year data and compute metrics:

  • Total record count
  • % with missing owners
  • % with missing/malformed addresses
  • % with zero valuations

3. Compare Source vs Parsed

If issues found, compare source file values to parsed MongoDB values to identify parser bugs.

4. Summarize Health

Produce assessment: healthy, suspicious, or broken.


Source File Inspection

Directory Structure

data/<county>/<year>/
├── *.csv              # CSV exports
├── *.TXT              # Fixed-width or tab-delimited
├── *.pdf              # PDF appraisal rolls
├── *.json             # JSON exports
└── snapshots/         # Web scrape snapshots (if applicable)

Finding Source Files

# List all files for a county/year
ls -la data/<county>/<year>/

# Find all source files
find data/<county>/<year>/ -type f \( -name "*.csv" -o -name "*.TXT" -o -name "*.json" -o -name "*.pdf" \)

# Check file size
du -h data/<county>/<year>/*

Inspecting Source Files

# CSV - view headers and first rows
head -5 data/<county>/<year>/*.csv

# TXT - view structure (often fixed-width)
head -20 data/<county>/<year>/*.TXT

# Count lines
wc -l data/<county>/<year>/*

# Find specific record in source
grep "R000001" data/<county>/<year>/*.csv

Parser Type Detection

# Look for county-specific parser
ls county_parser/parsers/*<county>*.py

# Check if county has a spec file (appraisal parser)
ls county_parser/parsers/appraisal_info_parser/<county>*.json

# Check if in PDF registry
grep "<county>" county_parser/parsers/pdf_parser_registry.py
Parser Type How to identify Target File
explicit Has <county>_county_parser.py county_parser/parsers/<county>_county_parser.py
appraisal Has spec in appraisal_info_parser/ county_parser/parsers/appraisal_info_parser/base.py
pdf Listed in PDF_ONLY_COUNTIES county_parser/parsers/pdf_parser_base.py
csv Has .csv source files county_parser/parsers/csv_parser_base.py

MongoDB Inspection

Sample Records

from pymongo import MongoClient

client = MongoClient()
db = client["county_data"]

# Sample random records
records = list(db.parcels.aggregate([
    {"$match": {"county": "archer"}},
    {"$sample": {"size": 5}}
]))

# Get a specific record by county_id
record = db.parcels.find_one({"county": "archer", "county_id": "R000001"})

# View specific fields
records = list(db.parcels.find(
    {"county": "archer"},
    {"county_id": 1, "property_address": 1, "mailing_address": 1, "tax_year": 1}
).limit(10))

Tax Year Breakdown

pipeline = [
    {"$match": {"county": "archer"}},
    {"$group": {"_id": "$tax_year", "count": {"$sum": 1}}},
    {"$sort": {"_id": -1}}
]
list(db.parcels.aggregate(pipeline))

Find Bad Records

import re

ZIP_PATTERN = re.compile(r"^\d{5}(-\d{4})?$")

# Find records with malformed zip codes
records = list(db.parcels.find(
    {"county": "archer", "mailing_address.zip_code": {"$exists": True}},
    {"county_id": 1, "mailing_address.zip_code": 1}
).limit(100))

for r in records:
    zip_code = r.get("mailing_address", {}).get("zip_code")
    if zip_code and not ZIP_PATTERN.match(str(zip_code)):
        print(f"{r['county_id']}: {zip_code}")

Find Incomplete Records

# Property city missing but mailing city exists
records = list(db.parcels.find({
    "county": "archer",
    "mailing_address.city": {"$nin": [None, ""]},
    "$or": [
        {"property_address": None},
        {"property_address.city": None},
        {"property_address.city": ""}
    ]
}, {"county_id": 1, "property_address": 1, "mailing_address": 1}).limit(10))

Data Quality Metrics

# Count owners and valuations
pipeline = [
    {"$match": {"county": "<county>", "tax_year": 2024}},
    {"$facet": {
        "total": [{"$count": "count"}],
        "with_owner": [
            {"$addFields": {"first_owner": {"$arrayElemAt": ["$owners", 0]}}},
            {"$match": {"first_owner.name": {"$exists": True, "$nin": [None, ""]}}},
            {"$count": "count"}
        ],
        "with_valuation": [
            {"$match": {"$or": [
                {"valuation.market_value": {"$gt": 0}},
                {"valuation.assessed_value": {"$gt": 0}}
            ]}},
            {"$count": "count"}
        ]
    }}
]

Useful Field Projections

Scenario Projection
Address issues {"county_id": 1, "property_address": 1, "mailing_address": 1}
Owner issues {"county_id": 1, "owners": 1, "january_1_owner": 1}
Value issues {"county_id": 1, "valuation": 1}
Full record {} (no projection)

Common Inspection Patterns

Pattern 1: Why is this field wrong?

  1. Sample records with issues from MongoDB
  2. Note the county_id of a bad record
  3. Find it in source file: grep "<county_id>" data/<county>/<year>/*.csv
  4. Compare source value vs parsed value

Pattern 2: Is this legacy or current data?

# Get tax year breakdown - issues only in old years = legacy data problem
pipeline = [
    {"$match": {"county": "archer"}},
    {"$group": {"_id": "$tax_year", "count": {"$sum": 1}}},
    {"$sort": {"_id": -1}}
]
years = list(db.parcels.aggregate(pipeline))

Pattern 3: What fields are populated?

records = list(db.parcels.find({"county": "archer"}).limit(10))
for r in records:
    prop = r.get("property_address") or {}
    mail = r.get("mailing_address") or {}
    print(f"{r['county_id']}: prop_city={prop.get('city')}, mail_city={mail.get('city')}")

Health Report Template

After inspection, summarize findings:

## <County> <Year> Health Report

### SOURCE FILES
- Directory: data/<county>/<year>/
- Files: [list files present]
- Status: Available / Missing / Empty

### RECORD COUNTS
- Total: X records
- With owners: X% 
- With valuation: X%

### FIELD ANOMALIES
- [List any issues found]

### OVERALL ASSESSMENT
- Status: healthy / suspicious / broken
- Rationale: [brief explanation]

Related