| name | creating-visualizations |
| description | Component skill for creating effective visualizations (terminal-based and image-based) in DataPeeker analysis sessions |
Creating Visualizations
Purpose
This component skill guides creation of clear, effective visualizations for analytics documentation. Use it when:
- Presenting query results in a more visual format
- Need to reveal patterns that are hard to see in raw numbers
- Creating reports or documentation that will be read by stakeholders
- Documenting data workflows, lineage, or database schemas
- Referenced by process skills requiring data visualization
Supports two approaches:
- Terminal-based (plotext, sparklines, etc.) - For interactive analysis
- Image-based (Kroki: Mermaid, GraphViz, Vega-Lite) - For reports and complex diagrams
Prerequisites
- Query results obtained and interpreted
- Understanding of patterns to highlight (use
interpreting-resultsskill) - Analysis documented in markdown files
- Clear communication goal for the visualization
Visualization Creation Process
Create a TodoWrite checklist for the 4-phase visualization process:
Phase 1: Choose Visualization Type
Phase 2: Structure Data for Display
Phase 3: Create Visualization
Phase 4: Annotate with Context
Mark each phase as you complete it. Include visualizations in numbered markdown files alongside queries and interpretations.
Phase 1: Choose Visualization Type
Goal: Select the right visualization format for your data and communication goal.
Visualization Selection Decision Tree
Ask these questions in order:
1. What type of data am I visualizing?
- Single summary statistic → Callout box or highlighted metric
- List of values → Table or ranked list
- Distribution across categories → Bar chart (ASCII or markdown)
- Time series → Line chart (sparkline) or time table
- Comparison between groups → Side-by-side table or grouped bars
- Part-to-whole relationship → Percentage table or ASCII pie chart
- Correlation or relationship → Scatter (character plot) or correlation matrix
2. What is my primary communication goal?
- Show exact values → Table with clear formatting
- Show relative magnitudes → Bar chart or ranked list
- Show trends over time → Sparkline or time series table
- Show distribution shape → Histogram (ASCII)
- Show ranking → Ordered list or horizontal bars
- Show proportions → Percentage table with bars
3. How many data points?
- 1-5 values → Callout boxes or simple list
- 6-20 values → Table or bar chart
- 21-50 values → Grouped table or histogram
- 50+ values → Summary statistics + histogram, or top/bottom N
4. Who is the audience?
- Technical analysts → Full tables with precision
- Business stakeholders → Simplified visuals with key takeaways
- Mixed audience → Visual summary + detailed table
Available Visualization Types
DataPeeker supports two complementary approaches:
Terminal-Based Formats (Primary for analysis):
- Markdown Tables - Structured data with alignment
- ASCII Bar Charts - Visual magnitude comparison (plotext, termgraph)
- Sparklines - Compact trend indicators (sparklines library)
- ASCII Histograms - Distribution visualization (plotext)
- Callout Boxes - Highlighting key metrics
- Ranked Lists - Ordered items with context
- Comparison Tables - Side-by-side metrics
- Line Plots - Time series (plotext, asciichartpy)
Image-Based Formats (For reports and complex diagrams):
- Mermaid - Flowcharts, Gantt charts, workflows
- GraphViz - Network graphs, data lineage, hierarchies
- Vega-Lite - Statistical charts (bar, line, scatter)
- ERD/DBML - Database schemas
Choose based on:
- What pattern you want to communicate
- Where the output will be viewed (terminal vs report)
- Complexity of the visualization needed
Phase 2: Structure Data for Display
Goal: Organize and format data for effective visualization.
Data Preparation Checklist
Before creating visualization:
1. Sort appropriately:
For ranked data:
- Sort by the metric you want to emphasize (descending for "top N")
- Consider: Alphabetical only if order doesn't matter
For time series:
- Sort chronologically (oldest to newest, or newest first if recent matters)
For categorical:
- Sort by frequency, magnitude, or logical grouping
- Avoid: Random or database-default ordering
2. Round to appropriate precision:
Examples:
- Revenue: Round to thousands or whole dollars (not $1,234.56789)
- Percentages: 1-2 decimal places (14.3%, not 14.285714%)
- Counts: Whole numbers only (1,234 not 1234.0)
- Ratios: 2-3 significant figures (2.4x not 2.3567x)
Rule: Show precision that matches the certainty of your data
3. Add calculated columns:
Useful additions:
- Percentage of total
- Difference from average/baseline
- Rank or percentile
- Running totals or moving averages
- Year-over-year change
4. Consider grouping:
For large datasets:
- Show Top N + "Other" row
- Group by logical categories
- Use ranges/buckets for continuous data
- Separate outliers from main distribution
5. Format for readability:
Best practices:
- Add thousand separators (1,234 not 1234)
- Use consistent decimal places within columns
- Align numbers right, text left
- Include units in headers ($, %, units)
Phase 3: Create Visualization
Goal: Build the actual visualization using appropriate format and tools.
Two Visualization Approaches
DataPeeker supports two complementary visualization approaches:
1. Terminal-Based Visualizations (Primary)
Use for:
- Interactive terminal/Jupyter notebook analysis
- Quick data exploration
- Markdown documentation that stays in terminal
- Fast iteration without external dependencies
Available formats:
- Markdown Tables - Structured data with multiple columns, exact values
- ASCII Bar Charts - Visual magnitude comparison, relative sizes
- Sparklines - Compact trend indicators with Unicode characters
- ASCII Histograms - Distribution visualization, shape and spread
- Callout Boxes - Highlighting key metrics or insights
- Ranked Lists - Top/bottom N items with narrative context
- Comparison Tables - Side-by-side metrics across segments or time
- Line Plots - Time series and trends
→ See terminal-formats.md for implementation
2. Image-Based Visualizations (via Kroki)
Use for:
- Reports and presentations (embedded images)
- Complex diagrams (workflows, data lineage, relationships)
- Database schemas and architecture
- Documentation that needs to be viewed outside terminal
- High-quality charts for stakeholder communication
Available formats:
- Mermaid - Flowcharts, Gantt charts, sequence diagrams
- GraphViz - Network graphs, data lineage, hierarchies
- Vega-Lite - Statistical charts (bar, line, scatter, histograms)
- D2 - Modern diagrams, architecture, data models
- ERD/DBML - Database schemas and relationships
→ See image-formats.md for implementation
Choosing Between Terminal and Image Formats
Use Terminal formats when:
- Working interactively in analysis session
- Output stays in markdown/terminal
- Quick iteration and exploration
- Simple charts and tables
Use Image formats when:
- Creating final reports or presentations
- Visualizing complex relationships (data lineage, workflows)
- Documenting database schemas
- Output needs to be embedded in documents/web
- Audience views outside terminal environment
Can use both:
- Terminal for exploration → Image for final report
- Tables (terminal) + Diagrams (image) in same document
⚠️ CRITICAL: Tool Usage Requirements
MANDATORY: All visualizations (bar charts, line plots, histograms, sparklines, scatter plots) MUST use established visualization tools. NEVER create these manually.
✅ ALLOWED - Manual Creation:
- Markdown tables with exact values
- Callout boxes and formatted text
- Ranked lists with exact numbers
❌ PROHIBITED - Manual Creation:
- Bar charts (no manual █ characters)
- Line plots or time series (no manual * or - characters)
- Histograms
- Sparklines (no manual ▁▂▃▄▅▆▇█ characters)
- Any visualization requiring scaling or positioning
Implementation Details
📄 For visualization implementations, use these guides:
Terminal-Based Visualizations
This document provides:
- Mandatory tool usage principles (read this first!)
- Quick Start guide with tool installation (plotext, asciichartpy, termgraph, sparklines)
- Complete code examples for each visualization type using proper tools
- SQLite integration examples for generating visualizations from query results
The rule: If it visualizes relative magnitudes, trends, or distributions → USE A TOOL. If it's exact numbers in a table → Manual creation is fine.
Image-Based Visualizations
This document provides:
- Kroki overview - Unified API for generating diagrams from text
- Quick Start guide with Python examples and API usage
- Format selection guide - When to use Mermaid vs GraphViz vs Vega-Lite
- Complete implementation guides for each format in
formats/directory: - DataPeeker integration examples - Visualizing data workflows and schemas
Phase 4: Annotate with Context
Goal: Add context and guidance so visualization is self-explanatory.
Annotation Checklist
Every visualization should include:
1. Title/Caption:
## [Clear, descriptive title that states what is being shown]
Example:
✓ Good: "Monthly Revenue by Product Category (Jan-Dec 2024)"
✗ Bad: "Revenue Chart"
2. Data source and date:
**Data source:** analytics.db, orders table
**Time period:** Q4 2024 (Oct 1 - Dec 31)
**Last updated:** 2025-11-18
3. Key takeaway (above or below visualization):
**Key Finding:** Electronics drove 42.5% of Q4 revenue despite representing
only 15% of order volume, indicating premium product performance.
4. Units and scale:
- Include $ or % symbols
- Clarify if values are in thousands: ($000s)
- Note if values are indexed or normalized
- Specify timezone for timestamps
5. Context for interpretation:
**Context notes:**
- Q4 includes Black Friday/Cyber Monday (Nov 24-27)
- New product line launched Oct 15, affecting Electronics category
- Shipping delays in December may have suppressed orders
6. Limitations and caveats:
**Caveats:**
- Data excludes returns and cancellations
- International orders converted to USD at average quarterly exchange rate
- First week of October had incomplete data due to system migration
7. What to look for:
**What to notice:**
- Electronics peak in November (holiday season)
- Clothing shows consistent decline (investigate seasonality)
- Sports category smallest but growing fastest (+45% QoQ)
Visualization Best Practices
DO:
Choose format based on communication goal, not convenience
- Ask: "What do I want the reader to notice first?"
- Match visualization to insight you're highlighting
Make visualizations self-contained
- Reader should understand without reading entire document
- Include title, units, source, key takeaway
Use consistent formatting within analysis
- Same bar width for all bar charts
- Same precision for similar metrics
- Consistent color/symbol conventions (if using)
Highlight what matters
- Use bold for most important values
- Put key finding at top or bottom
- Add 🔥, ⚠️, ✓ symbols sparingly for emphasis
Test readability
- View in markdown preview (not just raw markdown)
- Check alignment and spacing
- Ensure visualization works in different font sizes
Layer detail progressively
- Summary visualization first (bar chart, key metrics)
- Detailed table second (full data)
- Technical notes third (methodology, caveats)
Combine formats when helpful
- Bar chart + exact values table
- Sparkline + summary statistics
- Visualization + narrative interpretation
DON'T:
Don't create visualizations for their own sake
- If a simple table is clearer, use the table
- Visualization should reveal patterns, not obscure them
Don't use excessive precision
- Revenue in dollars, not cents ($1,234 not $1,234.56)
- Percentages to 1 decimal place (14.3% not 14.285714%)
Don't hide important caveats
- Data quality issues must be visible
- Exclusions and filters must be noted
- Sample size and time period must be clear
Don't use misleading scales
- Bar charts should start at zero (not truncated y-axis)
- Be explicit if using non-zero baseline
Don't over-format
- Too many symbols/colors creates visual noise
- Keep it simple and professional
Don't assume reader knows context
- Define abbreviations
- Explain what metrics mean
- Note if using non-standard calculations
Don't forget the "so what?"
- Every visualization needs an interpretation
- State implications, not just observations
Common Visualization Patterns
Pattern 1: Before/After Comparison
## Impact of Pricing Change (Oct 15, 2024)
### Before Pricing Change (Oct 1-14)
- Average Order Value: **$145.67**
- Daily Orders: **234**
- Daily Revenue: **$34,087**
### After Pricing Change (Oct 15-31)
- Average Order Value: **$127.23** (↓ $18.44, -12.7%)
- Daily Orders: **289** (↑ 55, +23.5%)
- Daily Revenue: **$36,769** (↑ $2,682, +7.9%)
**Net effect:** Lower prices increased volume enough to grow total revenue.
Pattern 2: Distribution Summary
⚠️ Use plotext to create histograms - DO NOT create manually
Show distribution with summary statistics:
import plotext as plt
import statistics
# Customer LTV values from query
ltv_values = [423, 687, 892, 2145, ...] # Your data
plt.hist(ltv_values, bins=7)
plt.title('Customer Lifetime Value Distribution')
plt.xlabel('Customer LTV ($)')
plt.ylabel('Number of Customers')
plt.show()
# Show summary statistics
print(f"\nSummary Statistics:")
print(f"Median LTV: ${statistics.median(ltv_values):,.0f}")
print(f"Mean LTV: ${statistics.mean(ltv_values):,.0f}")
print(f"75th percentile: ${statistics.quantiles(ltv_values, n=4)[2]:,.0f}")
See terminal-formats.md Format 4 for complete histogram examples.
Pattern 3: Segmentation Analysis
✅ Tables are fine for exact values, use plotext/termgraph for visual breakdown
## Customer Segmentation by Purchase Behavior
| Segment | Customers | Avg Orders | Avg LTV | % of Revenue | Strategy |
|:----------------|----------:|-----------:|--------:|-------------:|:--------------|
| **Champions** | 234 | 18.3 | $2,145 | 18.2% | VIP treatment |
| **Loyal** | 1,456 | 8.7 | $892 | 47.3% | Retain & grow |
| **Potential** | 3,678 | 2.4 | $287 | 38.5% | Nurture |
| **At Risk** | 892 | 1.2 | $156 | 5.1% | Win-back |
| **Lost** | 2,134 | 1.0 | $87 | 6.8% | Low priority |
**Key insight:** Top two segments (Champions + Loyal) are only 18% of customer
base but generate 66% of revenue. These 1,690 customers should receive majority
of retention investment.
For visual breakdown, use plotext:
import plotext as plt
segments = ['Champions', 'Loyal', 'Potential', 'At Risk', 'Lost']
revenue = [501030, 1299552, 1055586, 139152, 185658]
plt.simple_bar(segments, revenue, title='Revenue by Customer Segment')
plt.xlabel('Segment')
plt.ylabel('Revenue ($)')
plt.show()
See terminal-formats.md Format 2 for complete bar chart examples.
Pattern 4: Time Series with Annotations
⚠️ Use plotext or asciichartpy - DO NOT create manually
import plotext as plt
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
revenue = [1.0, 1.1, 1.2, 1.3, 1.4, 1.5,
1.5, 1.6, 1.7, 1.7, 1.9, 2.0] # Revenue in millions
plt.plot(months, revenue)
plt.title('Monthly Revenue Trend with Key Events')
plt.xlabel('Month')
plt.ylabel('Revenue ($M)')
plt.show()
print("\nKey Events:")
print("- Oct 1: Q4 begins, seasonal uptick expected")
print("- Oct 15: Pricing change (-10% on popular items)")
print("- Nov 1: New product line launched (premium segment)")
print("- Nov 24-27: Black Friday/Cyber Monday surge")
print("\nAnalysis: Revenue growth accelerated after new product launch (Nov),")
print("suggesting demand for premium options. Pricing change impact unclear due to")
print("seasonal overlap.")
See terminal-formats.md Format 8 for complete line plot examples.
Pattern 5: Funnel Analysis
✅ Tables for exact values, use plotext for visualization
## Purchase Funnel Conversion Rates
| Step | Count | Conversion | Drop-off | Notes |
|:------------------|--------:|-----------:|---------:|:------|
| 1. Site Visitors | 100,000 | 100.0% | — | |
| 2. Product Viewers| 45,000 | 45.0% | 55.0% | High bounce rate |
| 3. Add to Cart | 12,000 | 26.7% | 73.3% | |
| 4. Begin Checkout | 8,500 | 70.8% | 29.2% | Cart abandonment |
| 5. Complete | 3,200 | 37.6% | 62.4% | Payment issues? |
**Overall Conversion:** 3.2%
**Problem areas:**
1. **Bounce rate (55%):** Half of visitors leave without viewing products
- Action: Improve landing page, clearer value proposition
2. **Cart abandonment (29%):** Losing 3,500 potential customers at checkout
- Action: Simplify checkout, add progress indicator
3. **Checkout failure (62%):** Massive drop-off at payment
- Action: URGENT — investigate payment gateway, error messages
**Quick win:** Fixing checkout issues could 2.6x conversion (3.2% → 8.4%)
For funnel visualization, use plotext:
import plotext as plt
steps = ['Visitors', 'Viewers', 'Cart', 'Checkout', 'Purchase']
counts = [100000, 45000, 12000, 8500, 3200]
plt.simple_bar(steps, counts, title='Purchase Funnel')
plt.xlabel('Funnel Step')
plt.ylabel('Count')
plt.show()
See terminal-formats.md Format 2 for complete bar chart examples.
Integration with Process Skills
Process skills reference this component skill with:
Use the `creating-visualizations` component skill to present query results
visually, making patterns and insights more accessible to stakeholders.
When creating visualizations during analysis:
- Choose format based on communication goal (Phase 1)
- Structure data for clarity (Phase 2)
- Build visualization with appropriate text format (Phase 3)
- Annotate with context and interpretation (Phase 4)
This ensures analysis outputs are not just technically correct but also effectively communicated and actionable.
When to Visualize
Visualize when:
- Pattern is easier to see visually than in raw numbers
- Presenting to stakeholders who need quick understanding
- Comparing multiple segments, time periods, or metrics
- Distribution shape matters (histograms)
- Trend direction matters (sparklines, time series)
Use tables when:
- Exact values are critical
- Reader needs to reference specific numbers
- Data is already structured and scannable
- Audience is technical and prefers precision
Use both when:
- Visualization reveals pattern, table provides detail
- Different audiences (executive summary + appendix)
- Building progressive disclosure (overview → detail)
Quality Checklist
Before finalizing any visualization, verify:
- Visualization has clear, descriptive title
- Units are labeled ($ , %, etc.)
- Data source and time period documented
- Key takeaway stated explicitly
- Appropriate precision (not over-rounded or over-precise)
- Scale is appropriate (bars from zero, etc.)
- Annotations explain what to notice
- Caveats and limitations noted
- Visualization renders correctly in markdown preview
- Numbers match source query results
- Format matches communication goal
- Audience can understand without additional context
If any checklist item fails, revise before including in analysis.