| name | data-analysis |
| description | Analyze data patterns, create visualizations, and generate insights from datasets using statistical methods and data science techniques |
Data Analysis Skill
Transform raw data into actionable insights. This skill helps you explore datasets, identify patterns, create visualizations, and generate statistical reports.
Purpose
This skill enables you to:
- Load and explore datasets of various formats (CSV, JSON, Parquet)
- Perform exploratory data analysis (EDA)
- Create statistical summaries and distributions
- Generate data visualizations and charts
- Identify correlations and trends
- Detect anomalies and outliers
- Build predictive models
- Export analysis reports
When to Use
Use this skill when you need to:
- Understand a new dataset
- Find trends and patterns in data
- Create reports with visualizations
- Identify data quality issues
- Compare groups or time periods
- Forecast future values
- Build summary dashboards
- Share insights with stakeholders
Key Features
- EDA Tools - Automated exploratory analysis
- Visualizations - Charts, graphs, and heatmaps
- Statistical Analysis - Descriptive stats, hypothesis testing, correlation
- Data Cleaning - Handle missing values, outliers, duplicates
- Time Series - Seasonal decomposition and forecasting
- Machine Learning - Clustering, classification, regression
- Reports - Professional analysis documents with code
- Export Options - Save to HTML, PDF, or interactive dashboards
Instructions
When using this skill:
- Load Data - Provide dataset path or CSV/JSON content
- Explore - Generate summary statistics and visualizations
- Analyze - Identify patterns, trends, and relationships
- Validate - Check data quality and handle issues
- Visualize - Create meaningful charts and graphs
- Model - Build predictive models if needed
- Report - Document findings and recommendations
Guidelines
- Start Simple: Begin with univariate analysis before multivariate
- Visualize First: Always look at the data before statistics
- Question Assumptions: Don't assume patterns are significant
- Document Methods: Explain your analytical approach
- Consider Context: Interpret results within business context
- Validate Results: Confirm findings with domain experts
- Communicate Clearly: Use simple language and visual metaphors
Examples
Example 1: Customer Purchase Analysis
Dataset: Customer transactions with 10,000 records
Analysis Steps:
- Load purchase data (date, customer_id, amount, category)
- Calculate summary statistics (total spend, average order value)
- Visualize purchase distribution by category
- Analyze seasonal trends
- Identify top customers
- Detect purchase anomalies
Output:
# Customer Analysis Report
## Summary Statistics
- Total Revenue: $2.5M
- Average Order Value: $125
- Number of Customers: 3,450
- Date Range: 2023-01-01 to 2024-01-15
## Key Findings
1. Electronics category drives 42% of revenue
2. Top 20% of customers generate 80% of revenue (Pareto principle)
3. Strong seasonal pattern with peak in Q4
4. Average customer lifetime value: $1,200
## Recommendations
- Focus retention efforts on high-value customers
- Increase inventory for Q4 seasonal demand
- Cross-sell opportunities in Electronics + Home categories
Example 2: Website Traffic Analysis
Dataset: Daily pageviews, bounce rate, session duration
Key Metrics Analyzed:
- Traffic trends over time
- Device type distribution
- Top pages and conversion rates
- User behavior funnels
- Mobile vs. desktop comparison
Visualizations Generated:
- Line chart: Daily pageviews over 12 months
- Bar chart: Traffic by device type
- Funnel chart: User conversion flow
- Heatmap: Day/hour traffic patterns
Analysis Patterns
| Scenario | Analysis Type | Key Metrics |
|---|---|---|
| Sales Data | Trend & Seasonal | Growth rate, Seasonality index |
| Customer Data | Segmentation | RFM score, Cohort analysis |
| Website Data | Behavior | Bounce rate, Conversion funnel |
| Time Series | Forecasting | Trend, Seasonality, Residuals |
| A/B Testing | Hypothesis Test | P-value, Effect size |
Tools and Libraries
This skill uses:
- pandas - Data manipulation and analysis
- numpy - Numerical computations
- matplotlib/seaborn - Visualizations
- scipy - Statistical tests
- scikit-learn - Machine learning
- plotly - Interactive visualizations
Data Quality Checks
The skill automatically:
- Identifies missing values
- Detects duplicate records
- Flags outliers
- Validates data types
- Checks for referential integrity
- Reports data completeness
Common Analyses
Descriptive Analysis
- Data summaries
- Distribution analysis
- Correlation matrices
- Group comparisons
Predictive Analysis
- Trend forecasting
- Anomaly detection
- Classification models
- Regression models
Diagnostic Analysis
- Root cause analysis
- Cohort analysis
- Segmentation
- Attribution modeling
Related Resources
- Data Analysis Best Practices
- Python Data Science Cheatsheet
- Visualization Gallery
- Sample Datasets
- Analysis Scripts
Support
For data analysis help:
- Review the examples above
- Check sample datasets in
assets/examples/datasets/ - Use helper scripts in
scripts/ - Consult the detailed guide in
references/