| name | data-visualization |
| description | EDA, dashboards, Matplotlib, Seaborn, Plotly, and BI tools. Use for creating visualizations, exploratory analysis, or dashboards. |
| sasmp_version | 1.3.0 |
| bonded_agent | 05-visualization-communication |
| bond_type | PRIMARY_BOND |
Data Visualization
Create compelling visualizations to explore and communicate data insights.
Quick Start
Matplotlib Basics
import matplotlib.pyplot as plt
# Line plot
plt.figure(figsize=(10, 6))
plt.plot(x, y, marker='o', linestyle='-', color='blue', label='Series 1')
plt.xlabel('X Label')
plt.ylabel('Y Label')
plt.title('Title')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()
# Bar chart
plt.bar(categories, values, color='skyblue', edgecolor='black')
plt.xlabel('Categories')
plt.ylabel('Values')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
Seaborn for Statistical Plots
import seaborn as sns
# Set style
sns.set_style("whitegrid")
# Distribution
sns.histplot(data=df, x='value', kde=True, bins=30)
# Box plot
sns.boxplot(data=df, x='category', y='value')
# Violin plot
sns.violinplot(data=df, x='category', y='value')
# Heatmap
corr = df.corr()
sns.heatmap(corr, annot=True, cmap='coolwarm', center=0)
# Pairplot
sns.pairplot(df, hue='target', diag_kind='kde')
Exploratory Data Analysis
# Quick overview
df.info()
df.describe()
# Missing values
df.isnull().sum()
# Value counts
df['category'].value_counts().plot(kind='bar')
# Distribution
df.hist(figsize=(12, 10), bins=30)
plt.tight_layout()
plt.show()
# Correlation matrix
plt.figure(figsize=(10, 8))
sns.heatmap(df.corr(), annot=True, cmap='coolwarm',
center=0, square=True)
plt.title('Correlation Matrix')
plt.show()
Interactive Visualizations with Plotly
import plotly.express as px
import plotly.graph_objects as go
# Interactive scatter
fig = px.scatter(df, x='feature1', y='target',
color='category', size='value',
hover_data=['name', 'date'],
title='Interactive Scatter Plot')
fig.show()
# Time series
fig = px.line(df, x='date', y='value', color='category',
title='Time Series')
fig.update_xaxes(rangeslider_visible=True)
fig.show()
# 3D scatter
fig = px.scatter_3d(df, x='x', y='y', z='z',
color='category', size='value')
fig.show()
Dashboard with Plotly Dash
import dash
from dash import dcc, html
from dash.dependencies import Input, Output
app = dash.Dash(__name__)
app.layout = html.Div([
html.H1('Sales Dashboard'),
dcc.Dropdown(
id='category-dropdown',
options=[{'label': cat, 'value': cat}
for cat in df['category'].unique()],
value=df['category'].unique()[0]
),
dcc.Graph(id='sales-graph'),
dcc.RangeSlider(
id='year-slider',
min=df['year'].min(),
max=df['year'].max(),
value=[df['year'].min(), df['year'].max()],
marks={str(year): str(year)
for year in df['year'].unique()}
)
])
@app.callback(
Output('sales-graph', 'figure'),
[Input('category-dropdown', 'value'),
Input('year-slider', 'value')]
)
def update_graph(selected_category, year_range):
filtered_df = df[
(df['category'] == selected_category) &
(df['year'] >= year_range[0]) &
(df['year'] <= year_range[1])
]
fig = px.line(filtered_df, x='date', y='sales')
return fig
if __name__ == '__main__':
app.run_server(debug=True)
Subplots
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
# Top left
axes[0, 0].hist(data1, bins=30)
axes[0, 0].set_title('Histogram')
# Top right
axes[0, 1].scatter(x, y)
axes[0, 1].set_title('Scatter')
# Bottom left
axes[1, 0].plot(x, y)
axes[1, 0].set_title('Line Plot')
# Bottom right
axes[1, 1].boxplot([data1, data2, data3])
axes[1, 1].set_title('Box Plot')
plt.tight_layout()
plt.show()
Visualization Best Practices
Choose the right chart type:
- Comparison: Bar chart
- Distribution: Histogram, box plot
- Relationship: Scatter plot
- Time series: Line chart
- Composition: Pie chart, stacked bar
Design principles:
- Clear labels and titles
- Appropriate color schemes
- Remove chart junk
- Consistent formatting
- Accessibility (color-blind friendly)
Common pitfalls to avoid:
- Misleading axes (non-zero baseline)
- Too many colors
- 3D charts (distort perception)
- Pie charts with many categories
- Dual y-axes (confusing)
Color Palettes
# Seaborn palettes
sns.color_palette("viridis", as_cmap=True)
sns.color_palette("coolwarm", as_cmap=True)
sns.color_palette("Set2")
# Custom colors
colors = ['#FF6B6B', '#4ECDC4', '#45B7D1', '#FFA07A']
Export Figures
# High-resolution PNG
plt.savefig('figure.png', dpi=300, bbox_inches='tight')
# Vector format (PDF, SVG)
plt.savefig('figure.pdf', bbox_inches='tight')
plt.savefig('figure.svg', bbox_inches='tight')