| name | data-cleaning |
| description | Data cleaning, preprocessing, and quality assurance techniques |
| version | 2.0.0 |
| sasmp_version | 2.0.0 |
| bonded_agent | 05-programming-expert |
| bond_type | SECONDARY_BOND |
| config | [object Object] |
| parameters | [object Object] |
| observability | [object Object] |
Data Cleaning Skill
Overview
Master data cleaning and preprocessing techniques essential for reliable analytics.
Topics Covered
- Missing value handling (imputation, deletion)
- Outlier detection and treatment
- Data type conversion and validation
- Duplicate identification and removal
- String cleaning and normalization
Learning Outcomes
- Clean messy datasets
- Handle missing data appropriately
- Detect and treat outliers
- Ensure data quality
Error Handling
| Error Type | Cause | Recovery |
|---|---|---|
| Memory error | Dataset too large | Use chunking or sampling |
| Type conversion failed | Invalid data format | Apply preprocessing first |
| Encoding issues | Wrong character encoding | Detect and specify encoding |
| Validation failure | Data doesn't meet schema | Review and adjust validation rules |
Related Skills
- programming (for automation)
- foundations (for data quality concepts)
- databases-sql (for SQL-based cleaning)