| name | architecture-documentation-creator |
| description | Create comprehensive technical documentation for code systems including data flow diagrams, architecture overviews, algorithm documentation, cheat sheets, and multi-file documentation sets. Use when documenting pipelines, algorithms, system architecture, data flow, multi-stage processes, similarity algorithms, or creating developer onboarding materials. Covers Mermaid diagrams, progressive disclosure, critical patterns, JSON schemas, Pydantic models, and print-friendly reference materials. |
Architecture Documentation Creator
Purpose
This skill provides a structured approach to creating comprehensive technical documentation for complex code systems, including data flow diagrams, algorithm documentation, architecture overviews, and quick reference materials. Based on proven patterns from successful documentation projects.
When to Use This Skill
Use this skill when you need to:
- Document multi-stage pipelines or data flow systems
- Create architecture documentation for complex systems
- Document algorithms with multiple phases or layers
- Create onboarding materials for new developers
- Build comprehensive documentation sets (README + detailed docs + cheat sheets)
- Document systems that bridge multiple languages or technologies
- Create visual data flow diagrams with Mermaid
- Document critical implementation patterns
- Build troubleshooting guides for complex systems
Trigger Keywords: document architecture, create documentation, data flow diagram, document pipeline, document algorithm, architecture overview, technical documentation, developer documentation, onboarding docs, cheat sheet, mermaid diagram, document system
Documentation Structure
The Three-File Pattern
For comprehensive system documentation, create three complementary files:
1. README.md (Navigation Hub)
- Overview of the system
- Quick reference guide
- Links to detailed documentation
- Getting started section
- Critical patterns at-a-glance
- Target length: 400-500 lines
2. Detailed Technical Documentation (Deep Dive)
- Complete component breakdown
- Data flow diagrams (Mermaid)
- JSON schemas and data models
- Stage-by-stage or component-by-component analysis
- Performance characteristics
- Error handling
- Target length: 800-1,200 lines
3. CHEAT-SHEET.md (Quick Reference)
- One-page print-friendly reference
- Critical patterns with ✅/❌ examples
- Quick reference tables
- Common commands
- Troubleshooting guide
- File location references
- Target length: 200-300 lines
File Naming Conventions
docs/architecture/
├── README.md # Start here
├── {system-name}-data-flow.md # Detailed pipeline/flow docs
├── {algorithm-name}.md # Algorithm deep dives
├── CHEAT-SHEET.md # Print-friendly reference
└── diagrams/ # Optional: separate diagram files
Creating Data Flow Diagrams
Mermaid Diagram Best Practices
1. Use Appropriate Diagram Types:
- Flowchart: Sequential processes, decision trees
- Sequence Diagram: Component interactions over time
- State Diagram: State machines, lifecycle flows
- Graph: Data flow, dependencies
2. Progressive Detail Pattern:
## High-Level Overview
[Simple 5-10 node diagram showing major components]
## Component Breakdown
[Detailed diagrams for each component/stage]
## Complete Flow
[Comprehensive end-to-end diagram]
3. Mermaid Flowchart Example:
\`\`\`mermaid
flowchart TD
A[Stage 1: Input] --> B[Stage 2: Process]
B --> C{Decision}
C -->|Yes| D[Stage 3a: Path A]
C -->|No| E[Stage 3b: Path B]
D --> F[Stage 4: Output]
E --> F
\`\`\`
4. Labeling Best Practices:
- Use clear, concise labels
- Include data format in transitions (e.g., "JSON via stdin")
- Show error paths with different colors/styles
- Add notes for complex logic
Data Flow Documentation Pattern
For each stage/component document:
- Purpose: What this stage does
- Input Format: JSON schema, examples
- Processing: Key logic and algorithms
- Output Format: JSON schema, examples
- Dependencies: What it requires
- Performance: Typical processing time
- Error Handling: How failures are handled
Documenting Algorithms
Algorithm Documentation Template
## Algorithm Name
### Overview
[1-2 paragraph high-level explanation]
### Architecture
[Describe phases, layers, or steps]
### Phase-by-Phase Breakdown
#### Phase 1: [Name]
**Purpose**: [What this phase does]
**Input**: [What it receives]
**Output**: [What it produces]
**Key Logic**: [Important details]
[Repeat for each phase]
### Implementation Examples
[4+ concrete examples showing edge cases]
### Performance Characteristics
| Metric | Value | Notes |
|--------|-------|-------|
### Accuracy Metrics
[If applicable: precision, recall, F1, etc.]
### Common Pitfalls
[✅/❌ patterns showing correct vs incorrect usage]
Critical Pattern Documentation
Always document critical patterns with:
- Why it matters explanation
- ✅ CORRECT code example
- ❌ WRONG code example
- File location reference (file.py:line-range)
Example:
### Pattern: Extract Before Normalize
**Why it matters**: Normalization removes formatting that contains semantic information. Extracting features first preserves original meaning.
\`\`\`python
# ✅ CORRECT: Extract features BEFORE normalization
features = extract_semantic_features(code) # Phase 1
normalized = normalize_code(code) # Phase 2
penalty = calculate_penalty(features) # Phase 3
# ❌ WRONG: Normalizing first destroys semantic info
normalized = normalize_code(code)
features = extract_semantic_features(normalized) # Too late!
\`\`\`
**Location**: `lib/algorithm.py:45-67`
Creating Cheat Sheets
Cheat Sheet Structure
A print-friendly one-page reference should include:
1. Header:
# System Name - Quick Reference Cheat Sheet
**Version**: 1.0 | **Last Updated**: YYYY-MM-DD | **Print This Page**
2. Visual Overview:
- ASCII diagram of system architecture
- Component relationship diagram
3. Critical Patterns (⚠️ Section):
- Top 5-7 patterns that must be followed
- ✅/❌ code comparisons
- File location references
4. Quick Reference Tables:
- Commands and their usage
- Configuration options
- Data models (condensed)
- File locations
5. Troubleshooting Quick Reference:
| Issue | Cause | Solution |
|---|
6. Key Metrics (if applicable): | Metric | Value | Meaning |
Table Best Practices
Use tables for:
- Penalty/multiplier systems
- Configuration options
- Component locations
- Command references
- Performance benchmarks
- Accuracy metrics
Keep tables concise (3-5 columns max for printability).
Data Model Documentation
Documenting JSON Schemas
For each data structure, provide:
1. Schema Definition:
### DataStructureName
\`\`\`json
{
"field_name": "type", // Description
"required_field": "string", // What it contains
"optional_field?": "number" // When it's used
}
\`\`\`
2. Field Descriptions Table:
| Field | Type | Required | Description |
|---|
3. Example:
{
"field_name": "example_value",
"required_field": "actual data",
"optional_field": 42
}
Documenting Pydantic Models
### ModelName (Pydantic)
**Definition**:
\`\`\`python
class ModelName(BaseModel):
field_name: str
count: int = 0
tags: List[str] = []
\`\`\`
**Fields**:
- `field_name` (str): Description
- `count` (int): Description (default: 0)
- `tags` (List[str]): Description (default: empty list)
**Example**:
\`\`\`python
model = ModelName(
field_name="example",
count=5,
tags=["tag1", "tag2"]
)
\`\`\`
Performance Documentation
Benchmark Table Pattern
### Performance Benchmarks
| Operation | Small (<100) | Medium (100-1k) | Large (1k+) |
|-----------|--------------|-----------------|-------------|
| Scan | 50ms | 500ms | 5s |
| Process | 100ms | 1s | 10s |
| Total | 150ms | 1.5s | 15s |
**Bottlenecks**:
1. [Component name] - [Why it's slow]
2. [Component name] - [Why it's slow]
**Optimization Strategies**:
- Strategy 1: [Description]
- Strategy 2: [Description]
Troubleshooting Guide Pattern
Create troubleshooting sections with:
1. Table Format (for cheat sheets):
| Issue | Cause | Solution |
|-------|-------|----------|
| Error message | Why it happens | How to fix |
2. Detailed Format (for full docs):
### Issue: [Problem Description]
**Symptoms**:
- Observable behavior 1
- Observable behavior 2
**Root Cause**:
[Explanation of why this happens]
**Solution**:
1. Step 1
2. Step 2
3. Step 3
**Verification**:
[How to confirm it's fixed]
**Related**: See [Component Name] documentation
Cross-Referencing Strategy
Internal References
Link related sections within documentation:
See [Component Interactions](#component-interactions) for details.
For algorithm specifics, see [similarity-algorithm.md](similarity-algorithm.md).
File Location References
Always include file:line references:
**Location**: `lib/extractor.py:45-67`
**See**: `config/settings.json:12-15`
Navigation Aids
In README.md, provide clear navigation:
## Documentation Structure
- **[README.md](README.md)** - Start here
- **[pipeline-data-flow.md](pipeline-data-flow.md)** - Pipeline details
- **[algorithm.md](algorithm.md)** - Algorithm deep dive
- **[CHEAT-SHEET.md](CHEAT-SHEET.md)** - Quick reference
Code Example Guidelines
Example Best Practices
- Show Complete Context: Include imports, setup
- Add Comments: Explain non-obvious parts
- Show Output: Include expected results
- Use Real Data: Avoid "foo", "bar" when possible
- Highlight Key Lines: Use comments to draw attention
Example Template
### Example: [What This Demonstrates]
\`\`\`python
# Setup
from module import Class
# The key pattern being demonstrated
result = Class.method(
param1="value", # ← This parameter is critical
param2=42
)
# Expected output
# {'status': 'success', 'count': 42}
\`\`\`
**Explanation**:
[What's happening and why it matters]
Documentation Metadata
Version Tracking
Add to every documentation file:
**Version**: 1.0
**Last Updated**: 2025-11-17
**Author**: [Name or "Auto-generated"]
**Related**: [Links to related docs]
Change Log (Optional)
For living documentation:
## Change Log
### 2025-11-17 - v1.0
- Initial documentation creation
- Added data flow diagrams
- Created cheat sheet
### 2025-11-18 - v1.1
- Updated algorithm section
- Fixed typos in examples
Quality Checklist
Before finalizing documentation, verify:
Content Quality
- Overview explains what the system does
- All stages/components documented
- Data flow is clear with diagrams
- Critical patterns highlighted with ✅/❌
- Code examples are tested and accurate
- File references include line numbers
- Troubleshooting guide is comprehensive
Structure Quality
- Clear hierarchy with headers
- Table of contents for files >100 lines
- Cross-references work correctly
- Navigation is intuitive
- Progressive disclosure used appropriately
Technical Quality
- JSON schemas are valid
- Code examples are syntax-correct
- Mermaid diagrams render properly
- Performance numbers are realistic
- File paths are accurate
Usability Quality
- New developers can find what they need
- Cheat sheet fits on one printed page
- Search-friendly (good keywords in headers)
- Examples cover common use cases
- Troubleshooting covers real issues
File Organization
Recommended Directory Structure
docs/
├── architecture/
│ ├── README.md # Navigation hub
│ ├── {system}-overview.md # High-level architecture
│ ├── {system}-data-flow.md # Pipeline/data flow
│ ├── {algorithm}.md # Algorithm details
│ ├── CHEAT-SHEET.md # Quick reference
│ └── diagrams/ # Optional: separate diagrams
│ ├── overview.mmd
│ └── data-flow.mmd
├── api/ # API documentation
├── guides/ # How-to guides
└── reference/ # Reference materials
Progressive Disclosure Pattern
Follow Anthropic's progressive disclosure pattern:
1. Start Simple (README.md):
- What it does (2-3 sentences)
- Key concepts (bullet list)
- How to get started (3-5 steps)
- Links to detailed docs
2. Add Detail (Detailed docs):
- Complete component breakdown
- Full data flow diagrams
- All configuration options
- Performance characteristics
3. Provide Reference (Cheat sheet):
- Print-friendly one-pager
- Critical patterns only
- Quick lookup tables
- Troubleshooting guide
Common Pitfalls to Avoid
❌ Anti-Patterns
- Too Much Detail Too Soon: Don't put everything in README
- Missing Visuals: Text-only documentation is hard to scan
- No Examples: Abstract explanations without code
- Stale References: File paths that don't exist
- No Troubleshooting: Doesn't help with real problems
- Missing "Why": Only shows "what" and "how", not "why"
- No Cheat Sheet: Developers have to search every time
- Inconsistent Structure: Each doc uses different format
✅ Best Practices
- Visual First: Start with a diagram
- Progressive Disclosure: README → Detailed → Reference
- Show, Don't Tell: Code examples for everything
- Stay Current: Reference actual file:line locations
- Solve Real Problems: Document actual troubleshooting
- Explain Rationale: Always include "why it matters"
- One-Page Reference: Create printable cheat sheet
- Consistent Templates: Use same structure across docs
Template: Complete Documentation Set
See TEMPLATES.md for ready-to-use templates for:
- README.md structure
- Detailed documentation structure
- Cheat sheet structure
- Algorithm documentation
- Data flow documentation
- Troubleshooting guide
- API reference
Examples
Example 1: Multi-Stage Pipeline
Context: 7-stage code consolidation pipeline bridging JavaScript and Python
Documentation Created:
- README.md (448 lines) - Architecture overview, critical patterns, navigation
- pipeline-data-flow.md (1,191 lines) - Complete stage-by-stage breakdown with 8 Mermaid diagrams
- similarity-algorithm.md (857 lines) - Algorithm deep dive with examples
- CHEAT-SHEET.md (250 lines) - One-page print reference
Key Features:
- Visual data flow for all 7 stages
- JSON schemas for inter-stage communication
- Critical pattern documentation (✅/❌ examples)
- Performance benchmarks and bottleneck analysis
- Troubleshooting guide for common issues
Example 2: Algorithm Documentation
Context: Two-phase similarity algorithm with penalty system
Documentation Approach:
- High-level overview (what it does)
- Architecture diagram (3 phases)
- Phase-by-phase breakdown
- 4 complete examples (identical, HTTP mismatch, operator mismatch, multiple penalties)
- Penalty multiplier table
- Common pitfalls with ✅/❌
- Performance and accuracy metrics
Related Skills
- session-report: For documenting work sessions
- backend-dev-guidelines: For backend architecture patterns
- frontend-dev-guidelines: For frontend architecture patterns
References
- Anthropic Best Practices: Progressive disclosure, 500-line rule
- Mermaid Documentation: https://mermaid.js.org/
- Markdown Guide: https://www.markdownguide.org/
Skill Status: COMPLETE ✅ Version: 1.0 Last Updated: 2025-11-17