| name | csv-data-visualizer |
| description | This skill should be used when working with CSV files to create interactive data visualizations, generate statistical plots, analyze data distributions, create dashboards, or perform automatic data profiling. It provides comprehensive tools for exploratory data analysis using Plotly for interactive visualizations. |
CSV Data Visualizer
Overview
This skill enables comprehensive data visualization and analysis for CSV files. It provides three main capabilities: (1) creating individual interactive visualizations using Plotly, (2) automatic data profiling with statistical summaries, and (3) generating multi-plot dashboards. The skill is optimized for exploratory data analysis, statistical reporting, and creating presentation-ready visualizations.
When to Use This Skill
Invoke this skill when users request:
- "Visualize this CSV data"
- "Create a histogram/scatter plot/box plot from this data"
- "Show me the distribution of [column]"
- "Generate a dashboard for this dataset"
- "Profile this CSV file" or "Analyze this data"
- "Create a correlation heatmap"
- "Show trends over time"
- "Compare [variable] across [categories]"
Core Capabilities
1. Individual Visualizations
Create specific chart types for detailed analysis using the visualize_csv.py script.
Available Chart Types:
Statistical Plots:
# Histogram - distribution of numeric data
python3 scripts/visualize_csv.py data.csv --histogram column_name --bins 30
# Box plot - show quartiles and outliers
python3 scripts/visualize_csv.py data.csv --boxplot column_name
# Box plot grouped by category
python3 scripts/visualize_csv.py data.csv --boxplot salary --group-by department
# Violin plot - distribution with probability density
python3 scripts/visualize_csv.py data.csv --violin column_name --group-by category
Relationship Analysis:
# Scatter plot with automatic trend line
python3 scripts/visualize_csv.py data.csv --scatter height weight
# Scatter plot with color and size encoding
python3 scripts/visualize_csv.py data.csv --scatter x y --color category --size value
# Correlation heatmap for all numeric columns
python3 scripts/visualize_csv.py data.csv --correlation
Time Series:
# Line chart for single variable
python3 scripts/visualize_csv.py data.csv --line date sales
# Multiple variables on same chart
python3 scripts/visualize_csv.py data.csv --line date "sales,revenue,profit"
Categorical Data:
# Bar chart (counts categories automatically)
python3 scripts/visualize_csv.py data.csv --bar category
# Pie chart for composition
python3 scripts/visualize_csv.py data.csv --pie region
Output Formats: Specify output file with desired format extension:
# Interactive HTML (default)
python3 scripts/visualize_csv.py data.csv --histogram age -o output.html
# Static image formats
python3 scripts/visualize_csv.py data.csv --scatter x y -o plot.png
python3 scripts/visualize_csv.py data.csv --correlation -o heatmap.pdf
python3 scripts/visualize_csv.py data.csv --bar category -o chart.svg
2. Automatic Data Profiling
Generate comprehensive data quality and statistical reports using the data_profile.py script.
Text Report (default):
python3 scripts/data_profile.py data.csv
HTML Report:
python3 scripts/data_profile.py data.csv -f html -o report.html
JSON Report:
python3 scripts/data_profile.py data.csv -f json -o profile.json
What the Profiler Provides:
- File information (size, dimensions)
- Dataset overview (shape, memory usage, duplicates)
- Column-by-column analysis (types, missing data, unique values)
- Missing data patterns and completeness
- Statistical summary for numeric columns (mean, std, quartiles, skewness, kurtosis)
- Categorical column analysis (frequency counts, most/least common values)
- Data quality checks (high missing data, duplicate rows, constant columns, high cardinality)
When to Use Profiling: Always recommend running data profiling BEFORE creating visualizations when:
- User is unfamiliar with the dataset
- Data quality is unknown
- Need to identify appropriate visualization types
- Exploring a new dataset for the first time
3. Multi-Plot Dashboards
Create comprehensive dashboards with multiple visualizations using the create_dashboard.py script.
Automatic Dashboard: Analyzes data types and automatically creates appropriate visualizations:
python3 scripts/create_dashboard.py data.csv
Custom output location:
python3 scripts/create_dashboard.py data.csv -o my_dashboard.html
Control number of plots:
python3 scripts/create_dashboard.py data.csv --max-plots 9
Custom Dashboard from Config: Create a JSON configuration file specifying exact plots:
python3 scripts/create_dashboard.py data.csv --config config.json
Dashboard Config Format:
{
"title": "Sales Analysis Dashboard",
"plots": [
{"type": "histogram", "column": "revenue"},
{"type": "box", "column": "revenue", "group_by": "region"},
{"type": "scatter", "column": "advertising", "group_by": "revenue"},
{"type": "bar", "column": "product_category"},
{"type": "correlation"}
]
}
Dashboard Plot Types:
histogram: Distribution of numeric columnbox: Box plot, optionally grouped by categoryscatter: Relationship between two numeric columnsbar: Count of categorical valuescorrelation: Heatmap of numeric correlations
Workflow Decision Tree
Use this decision tree to determine the appropriate approach:
User provides CSV file
│
├─ "Profile this data" / "Analyze this data" / Unfamiliar dataset
│ └─> Run data_profile.py first
│ Then offer visualization options based on findings
│
├─ "Create dashboard" / "Overview of the data" / Multiple visualizations needed
│ ├─ User knows exact plots wanted
│ │ └─> Create JSON config → run create_dashboard.py with config
│ └─ User wants automatic dashboard
│ └─> Run create_dashboard.py (auto mode)
│
└─ Specific visualization requested ("histogram", "scatter plot", etc.)
└─> Use visualize_csv.py with appropriate flag
Best Practices
Starting Analysis
- Always profile first for unfamiliar datasets:
python3 scripts/data_profile.py data.csv - Review the profiling output to understand:
- Column data types and ranges
- Missing data patterns
- Data quality issues
- Statistical distributions
Choosing Visualizations
Consult references/visualization_guide.md for detailed guidance. Quick reference:
- Distribution: Histogram, box plot, violin plot
- Relationship: Scatter plot, correlation heatmap
- Time series: Line chart
- Categories: Bar chart (preferred) or pie chart (use sparingly)
- Comparison: Box plot grouped by category
Creating Dashboards
- Automatic dashboard: Good for initial exploration
- Custom dashboard: Better for presentations or specific analysis goals
- Limit plots: Keep to 6-9 plots maximum for readability
- Logical grouping: Group related visualizations together
Output Considerations
- HTML: Best for interactive exploration (zoom, pan, hover tooltips)
- PNG/PDF: Best for reports and presentations
- SVG: Best for publications requiring vector graphics
Dependencies
The scripts require these Python packages:
pip install pandas plotly numpy
For static image export (PNG, PDF, SVG), also install:
pip install kaleido
Example Workflows
Exploratory Data Analysis
# 1. Profile the data
python3 scripts/data_profile.py sales_data.csv -f html -o profile.html
# 2. Create automatic dashboard
python3 scripts/create_dashboard.py sales_data.csv -o dashboard.html
# 3. Dive deeper with specific plots
python3 scripts/visualize_csv.py sales_data.csv --scatter price sales --color region
python3 scripts/visualize_csv.py sales_data.csv --boxplot revenue --group-by product
Report Generation
# Create specific visualizations for report
python3 scripts/visualize_csv.py data.csv --histogram age -o fig1_distribution.png
python3 scripts/visualize_csv.py data.csv --scatter income age -o fig2_correlation.png
python3 scripts/visualize_csv.py data.csv --bar category -o fig3_categories.png
# Generate data summary
python3 scripts/data_profile.py data.csv -f html -o data_summary.html
Interactive Dashboard
# Create custom dashboard for presentation
# 1. First, create config.json with desired plots
# 2. Generate dashboard
python3 scripts/create_dashboard.py data.csv --config config.json -o presentation_dashboard.html
Troubleshooting
"Column not found" errors:
- Run data profiling to see exact column names
- CSV columns are case-sensitive
- Check for leading/trailing spaces in column names
Empty or incorrect visualizations:
- Verify data types (numeric vs categorical)
- Check for missing data in plotted columns
- Ensure sufficient non-null values exist
Script execution errors:
- Verify dependencies are installed:
pip list | grep plotly - Check Python version: Python 3.6+ required
- For image export issues, install kaleido:
pip install kaleido
Resources
scripts/
visualize_csv.py: Main visualization script with all chart typesdata_profile.py: Automatic data profiling and quality analysiscreate_dashboard.py: Multi-plot dashboard generator
references/
visualization_guide.md: Comprehensive guide for choosing appropriate chart types, best practices, and common patterns