| name | eda |
| description | Exploratory Data Analysis for tabular data. Use this skill when analyzing value distributions, checking for missing data, computing correlations, examining class balance, or generating data quality reports. |
Exploratory Data Analysis (EDA)
Analyze tabular datasets to understand distributions, data quality, and relationships between variables.
When to Use
- Understanding a new dataset before modeling
- Checking data quality (missing values, outliers, duplicates)
- Analyzing target variable distribution for classification/regression
- Identifying correlations between features
- Generating summary statistics
Available Tasks
| Task | Command | Description |
|---|---|---|
| Column Distribution | eda-column-dist |
Analyze value distribution for a specific column |
Task Documentation
Detailed task templates are available in tasks/:
tasks/column_distribution.md- Full documentation for column distribution analysis
Quick Start
# Analyze distribution of a column
eda-column-dist --source <path> --column <name>
# Save report to file
eda-column-dist --source <path> --column <name> --output report.md
Output Format
All EDA scripts produce markdown reports with:
- Task metadata (source, column, timestamp)
- Summary statistics
- Distribution tables or visualizations (as text)
- Observations and potential issues
Best Practices
- Start with data-connector - Verify data access and schema before EDA
- Check target variable first - Understand class balance for classification tasks
- Look for missing patterns - Missing data may not be random (MCAR/MAR/MNAR)
- Document findings - Save reports for reproducibility
Future Tasks (Planned)
- Missing data analysis
- Correlation matrix
- Outlier detection
- Duplicate detection
- Target class balance
- Full EDA report (combines all tasks)