name	csv-cleaner
description	Clean and normalize CSV data by analyzing structure, detecting issues (missing values, duplicates, type inconsistencies), and applying transformations. Use when users need to prepare messy CSV files for analysis or import.

CSV Cleaner Skill

You are a data cleaning specialist. Use this skill to clean and normalize CSV data.

Setup

Before running scripts, install dependencies:

pip install -r requirements.txt

How to Use This Skill

Start: Read knowledge/index.md for overview
Analyze: Run python scripts/analyze.py <input.csv> to get data profile
Learn: Based on issues found, read relevant knowledge files
Clean: Run cleaning operations using scripts/clean.py
Output: Generate cleaned CSV, report, and schema

Available Scripts

analyze.py

python scripts/analyze.py input.csv [--output analysis.json]

Returns JSON with:

Column names, types, stats
Missing value counts
Duplicate detection
Semantic type inference (email, phone, date, etc.)

clean.py

python scripts/clean.py input.csv output.csv --operations ops.json

Operations file format:

{
  "operations": [
    {"type": "fill_missing", "column": "age", "strategy": "median"},
    {"type": "normalize_strings", "column": "name", "ops": ["trim", "lowercase"]},
    {"type": "standardize_dates", "column": "created_at", "format": "%Y-%m-%d"}
  ]
}

validate.py

python scripts/validate.py input.csv --schema schema.json

Validates data against JSON Schema, reports violations.

Workflow

Run analyze.py on input CSV
Review output, identify issues
Read knowledge files for relevant topics:
- Missing values → knowledge/operations/missing-values.md
- Duplicates → knowledge/operations/duplicates.md
- String issues → knowledge/types/strings.md
- Date parsing → knowledge/types/dates.md
Build operations JSON based on knowledge
Run clean.py with operations
Generate report and schema

Decision Making

When unsure which strategy to use, consult the knowledge files. They contain decision trees and best practices for each scenario.

Available Operations

Operation	Description	Required Params
`fill_missing`	Fill null values	`column`, `strategy` (mean/median/mode/constant/forward/backward)
`drop_missing`	Drop rows with nulls	`columns` (list), `how` (any/all)
`remove_duplicates`	Remove duplicate rows	`columns` (optional), `keep` (first/last/none)
`normalize_strings`	Clean string columns	`column`, `ops` (trim/lowercase/uppercase/remove_special)
`standardize_dates`	Parse and format dates	`column`, `format` (strftime format)
`normalize_phones`	Convert to E.164 format	`column`, `country` (default: US)
`cap_outliers`	Cap extreme values	`column`, `method` (iqr/zscore), `multiplier`

Knowledge Base Structure

knowledge/
├── index.md                 # Start here
├── operations/
│   ├── missing-values.md    # Handling nulls
│   ├── duplicates.md        # Deduplication
│   ├── outliers.md          # Outlier detection
│   └── normalization.md     # General patterns
├── types/
│   ├── strings.md           # Text cleaning
│   ├── numbers.md           # Numeric formatting
│   ├── dates.md             # Date parsing
│   ├── emails.md            # Email validation
│   └── phones.md            # Phone normalization
├── validation/
│   └── index.md             # JSON Schema rules
└── csv/
    └── edge-cases.md        # Encoding, quoting

Read only what you need based on detected issues.

csv-cleaner

Install Skill

SKILL.md