Claude Code Plugins

Community-maintained marketplace

Feedback

check-metadata-typos

@owid/etl
129
0

Check .meta.yml and snapshot .dvc files for spelling typos using codespell. Use when user mentions typos, spelling errors, metadata quality, or wants to check metadata files for mistakes.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name check-metadata-typos
description Check .meta.yml and snapshot .dvc files for spelling typos using codespell. Use when user mentions typos, spelling errors, metadata quality, or wants to check metadata files for mistakes.

Check Metadata Typos

Check metadata files for spelling typos using comprehensive spell checking.

Scope Options

Ask the user which scope they want to check:

  1. Current step only - Ask the user to specify the step path (e.g., etl/steps/data/garden/energy/2025-06-27/electricity_mix)
  2. All ETL metadata - Check all active .meta.yml files in etl/steps/data/{garden,meadow,grapher}/ (automatically excludes ~3,570 archived steps)
  3. Snapshot metadata - Check all snapshot .dvc files in snapshots/ (~7,915 files)
  4. All metadata - Check both ETL steps and snapshot metadata files

Note: Archived steps and snapshots (defined in dag/archive/*.yml) are automatically excluded from checking as they are no longer actively maintained.


Implementation Strategy

0. Check codespell installation

IMPORTANT: Check if codespell is installed before attempting to use it. Since codespell is now a dev dependency in the project, it should already be installed, but verify first to avoid reinstalling unnecessarily.

# Check if codespell is installed
if ! .venv/bin/codespell --version &> /dev/null; then
    echo "codespell not found, installing..."
    uv add --dev codespell
else
    echo "codespell is already installed"
fi

If codespell is not installed and uv add --dev codespell fails, explain to the user how to install it manually.

1. Exclude archived steps and snapshots

IMPORTANT: Do not check archived steps and snapshots as they are no longer in use.

Archived steps and snapshots are defined in dag/archive/*.yml files:

  • ~3,570 deprecated steps (garden, meadow, grapher)
  • ~736 deprecated snapshots

To exclude them, extract their paths and create a list of active files:

# Extract archived step paths to a file
for step_type in garden meadow grapher; do
  grep -h "data://${step_type}/" dag/archive/*.yml 2>/dev/null | \
    grep -o "data://${step_type}/[^:]*" | \
    sed 's|data://|etl/steps/data/|' | \
    sed 's|$|.meta.yml|'
done > /tmp/archived_files.txt

# Extract archived snapshots
grep -rh "snapshot://" dag/archive/*.yml 2>/dev/null | \
  grep -o "snapshot://[^:]*" | \
  sed 's|snapshot://|snapshots/|' | \
  sed 's|$|.dvc|' | \
  sort -u >> /tmp/archived_files.txt

# Create list of all metadata files
find etl/steps/data/garden -name "*.meta.yml" > /tmp/all_meta_files.txt
find etl/steps/data/meadow -name "*.meta.yml" >> /tmp/all_meta_files.txt
find etl/steps/data/grapher -name "*.meta.yml" >> /tmp/all_meta_files.txt
find snapshots -name "*.dvc" >> /tmp/all_meta_files.txt

# Filter out archived files
grep -vFf /tmp/archived_files.txt /tmp/all_meta_files.txt > /tmp/active_meta_files.txt

echo "Total files to check: $(wc -l < /tmp/active_meta_files.txt)"

2. Run codespell with ignore list and exclusions

Use the existing .codespell-ignore.txt file to filter out domain-specific terms:

For option 1 (current step only):

  1. Ask the user to provide the step path (e.g., etl/steps/data/garden/energy/2025-06-27/electricity_mix)
  2. Construct the full path to the metadata file: <step_path>/*.meta.yml
  3. Run codespell on that specific path:
# For specific step (option 1)
STEP_PATH="<user_provided_path>"  # e.g., etl/steps/data/garden/energy/2025-06-27/electricity_mix
.venv/bin/codespell "${STEP_PATH}"/*.meta.yml \
  --ignore-words=.codespell-ignore.txt

For option 2 (all ETL metadata - garden, meadow, grapher):

# For all ETL step metadata (option 2)
find etl/steps/data/garden -name "*.meta.yml" > /tmp/all_step_files.txt
find etl/steps/data/meadow -name "*.meta.yml" >> /tmp/all_step_files.txt
find etl/steps/data/grapher -name "*.meta.yml" >> /tmp/all_step_files.txt
grep -vFf /tmp/archived_files.txt /tmp/all_step_files.txt > /tmp/active_step_files.txt

cat /tmp/active_step_files.txt | xargs .venv/bin/codespell \
  --ignore-words=.codespell-ignore.txt

Note: Excluding archived steps reduces the scope by ~3,570 files and focuses on actively maintained metadata.

For option 3 (snapshot metadata):

# For all snapshot metadata (option 3)
find snapshots -name "*.dvc" > /tmp/all_snapshot_files.txt
grep -vFf /tmp/archived_files.txt /tmp/all_snapshot_files.txt > /tmp/active_snapshot_files.txt

cat /tmp/active_snapshot_files.txt | xargs .venv/bin/codespell \
  --ignore-words=.codespell-ignore.txt

Note: Snapshot .dvc files contain metadata in the meta.source.description and meta.source.published_by fields. ~736 archived snapshots are excluded.

For option 4 (all metadata):

# For all metadata - ETL and snapshots (option 4)
# Use the active_meta_files.txt created in step 1
cat /tmp/active_meta_files.txt | xargs .venv/bin/codespell \
  --ignore-words=.codespell-ignore.txt

3. Parse and present results

Extract typos from codespell output and present them in a structured format:

  • Group by typo type (e.g., all instances of "seperate" → "separate")
  • Show file paths (as clickable links when possible)
  • Show line numbers
  • Show suggested corrections

Example output format:

Found 15 typos across 8 files:

Most common:
- "inmigrant" → "immigrant" (5 occurrences in 2 files)
- "seperate" → "separate" (3 occurrences in 1 file)
- "accomodation" → "accommodation" (2 occurrences in 1 file)

Detailed list:
[file.meta.yml:123] inmigrant → immigrant
[file.meta.yml:456] seperate → separate
...

4. Offer to fix typos

After presenting results, ask the user:

  • Fix all automatically? - Apply all suggested fixes
  • Review each typo? - Go through typos one by one for confirmation
  • Cancel - Exit without making changes

5. Apply fixes (if user confirms)

For automatic fixes:

# Use sed or Python script to replace typos in files
# Example: sed -i '' 's/seperate/separate/g' file.meta.yml

For reviewed fixes, confirm each change before applying.

6. Verify fixes

After applying fixes, re-run codespell to verify all typos were corrected:

.venv/bin/codespell <path> --ignore-words=.codespell-ignore.txt

Should return 0 results.

7. Clean up

IMPORTANT: Delete any temporary files created during the check:

rm -f /tmp/archived_files.txt /tmp/all_meta_files.txt /tmp/active_meta_files.txt \
      /tmp/all_step_files.txt /tmp/active_step_files.txt \
      /tmp/all_snapshot_files.txt /tmp/active_snapshot_files.txt \
      /tmp/codespell_output.txt

The only persistent files should be:

- The `.codespell-ignore.txt` whitelist (if it doesn't exist, create it)
- Modified `.meta.yml` files (if fixes were applied)

**Do NOT create new persistent files in the repo like:**

- ❌ `TYPO_CHECK_REPORT.md`
- ❌ `scripts/analyze_typos.py`
- ❌ `scripts/advanced_spell_checker.py`

All analysis logic should be embedded in this command execution, not saved as separate files.

---

## Error Handling

- Check if codespell is installed first (see step 0). If not installed and `uv add --dev codespell` fails, explain to the user how to install it manually with `uv sync` or check their Python environment
- If no `.meta.yml` or `.dvc` files are found in the specified scope, inform the user
- If codespell finds no typos, congratulate the user on clean metadata!
- If file modification fails, report which files couldn't be updated

---

## Notes

- Always use American English spelling (e.g., "combating" not "combatting")
- Technical field names (like variable names with underscores) are typically safe to ignore
- **Acronyms in ALL CAPS should be ignored** - they are almost always legitimate acronyms (e.g., TE, INE, DIEA)
- **URLs and domain names should be ignored** - codespell may flag parts of URLs (e.g., "ine.es", "corona.fo") but these are correct
- When in doubt about a flagged word, ask the user before fixing