name

system-design

description

Comprehensive system design skill for creating professional software architecture specifications. Use this skill when asked to design systems (e.g., "Design a chat application", "Design an e-commerce platform", "Create system architecture for X"). Generates complete technical specifications with architecture diagrams, database schemas, API designs, scalability plans, security considerations, and deployment strategies. Creates organized spec folders with all documentation following professional software engineering standards, from high-level overview down to detailed implementation specifications.

System Design

Overview

This skill helps you create comprehensive, production-ready system design specifications. When a user asks you to design a system, use this skill to generate a complete spec/ folder containing professional documentation covering all aspects of the system architecture.

Workflow

Step 1: Gather Requirements

Before generating the spec folder, understand the system requirements:

Key Questions:

What is the system's purpose?
Who are the users?
What are the core features?
What is the expected scale (users, requests, data)?
What are the constraints (budget, timeline, technology)?
Are there specific non-functional requirements (performance, security, compliance)?

If requirements are unclear, ask the user for clarification using specific questions based on the system type.

Step 2: Initialize Spec Folder

Use the init_spec.py script to create the specification folder structure:

python scripts/init_spec.py <system-name> --path ./spec

What this creates:

Complete folder structure with template markdown files
All standard sections (overview, requirements, architecture, data model, API design, scalability, security, monitoring, deployment)
diagrams/ folder for architecture diagrams
README with navigation and status tracking

The script generates 10 comprehensive template files:

README.md - Document overview and navigation
01-overview.md - Executive summary, problem statement, goals
02-requirements.md - Functional and non-functional requirements
03-architecture.md - System architecture and design decisions
04-data-model.md - Database schemas and data design
05-api-design.md - API specifications and contracts
06-scalability.md - Scaling strategy and performance
07-security.md - Security architecture and threat model
08-monitoring.md - Observability and operational monitoring
09-deployment.md - Deployment strategy and CI/CD

Step 3: Complete the Specification

Work through each template file systematically, filling in details based on the system requirements. Use the reference files for guidance:

3.1 Overview and Requirements (Files 01-02)

Fill in:

Problem statement and goals
Functional requirements (features, user stories)
Non-functional requirements (performance, scalability, security, availability)
Constraints and assumptions

Tip: Be specific with numbers (e.g., "Support 100,000 concurrent users" not "Support many users")

3.2 Architecture Design (File 03)

Reference: See references/architectural-patterns.md for pattern guidance

Choose appropriate architecture style:

Simple systems: Monolithic architecture
Complex systems: Microservices
Variable traffic: Serverless
Real-time systems: Event-driven

Document:

System components and responsibilities
Communication patterns (sync vs async)
Design decisions with rationale
Architecture diagrams (use Mermaid)

Example Mermaid Diagram:

graph TB
    Client[Client Apps]
    API[API Gateway]
    Auth[Auth Service]
    Core[Core Service]
    DB[(Database)]
    Cache[(Cache)]

    Client --> API
    API --> Auth
    API --> Core
    Core --> Cache
    Core --> DB

3.3 Data Model (File 04)

Design:

Database schema with tables and relationships
Entity-Relationship Diagrams (ERD)
Indexes for performance
Partitioning/sharding strategy

Include:

SQL CREATE TABLE statements
Index definitions
Relationships and foreign keys
Data access patterns

3.4 API Design (File 05)

Specify:

API style (REST, GraphQL, gRPC)
All endpoints with request/response examples
Authentication and authorization
Error handling
Rate limiting

Be comprehensive: Include actual JSON examples, error codes, and edge cases

3.5 Scalability (File 06)

Reference: See references/system-design-workflow.md for scalability planning

Plan:

Horizontal and vertical scaling strategies
Caching strategy (CDN, application cache, database cache)
Load balancing approach
Database scaling (read replicas, sharding)
Capacity planning

Include numbers: Current capacity, growth projections, scaling thresholds

3.6 Security (File 07)

Design:

Threat model (assets, actors, attack vectors)
Authentication and authorization mechanisms
Data encryption (at rest, in transit)
Network security (VPC, security groups)
Compliance requirements

Be specific: Name actual technologies (e.g., "JWT tokens with 15-minute expiry")

3.7 Monitoring (File 08)

Define:

Logging strategy (what to log, format)
Metrics to track (Golden Signals: latency, traffic, errors, saturation)
Distributed tracing setup
Alerting rules
SLIs and SLOs

3.8 Deployment (File 09)

Plan:

Deployment strategy (blue-green, canary, rolling)
CI/CD pipeline
Infrastructure as code
Rollback procedures
Disaster recovery

Step 4: Add Diagrams

Create architecture diagrams in the diagrams/ folder:

Essential diagrams:

High-level architecture
Component diagram
Data flow diagrams
Sequence diagrams for key operations
ERD (Entity-Relationship Diagram)
Deployment diagram

Use Mermaid for markdown-based diagrams (can be embedded in markdown files or saved as .mmd files)

Step 5: Technology Selection

Reference: See references/tech-stack-guide.md for technology choices

Choose technologies for:

Frontend framework
Backend language/framework
Database (relational vs NoSQL)
Cache
Message queue
Cloud provider
Container orchestration
Monitoring tools

Document rationale for each choice in the architecture section.

Step 6: Validate Completeness

Use the validation script to check for completeness:

python scripts/validate_spec.py ./spec/<system-name>

What it checks:

All required files present
Required sections in each file
No TODOs or placeholders remaining
Diagrams folder populated

Address any errors or warnings before finalizing.

Step 7: Review and Finalize

Review all sections for consistency
Ensure all design decisions have rationale
Verify numbers are realistic
Check that diagrams match text descriptions
Update README status (Draft → In Review → Approved)

Reference Files

This skill includes comprehensive reference guides to consult during system design:

`architectural-patterns.md`

When to read: Choosing architecture style (Step 3.2)

Covers:

Monolithic, Microservices, Serverless, Event-Driven architectures
Layered, Hexagonal, CQRS, Event Sourcing patterns
When to use each pattern
Pros, cons, and trade-offs
Pattern selection guidance

`tech-stack-guide.md`

When to read: Selecting technologies (Step 5)

Covers:

Frontend frameworks (React, Vue, Angular, Svelte)
Backend languages (Node.js, Python, Go, Java, Rust)
Databases (PostgreSQL, MySQL, MongoDB, DynamoDB)
Message queues (RabbitMQ, Kafka, SQS)
Cloud providers (AWS, GCP, Azure)
Technology decision framework

`system-design-workflow.md`

When to read: Understanding the overall process (Step 0)

Covers:

Complete system design workflow
Phase-by-phase guidance
Best practices and pitfalls
Checklists for completeness
Common mistakes to avoid

Example Usage

User Request:

"Design a scalable chat application system"

Your Process:

Gather Requirements (ask clarifying questions):
- How many concurrent users? (e.g., 100,000)
- What features? (e.g., 1-on-1 chat, group chat, file sharing)
- Any special requirements? (e.g., end-to-end encryption)

Initialize Spec:

python scripts/init_spec.py chat-application --path ./spec

Fill in Requirements (01-02):
- Problem: Real-time messaging for 100,000 users
- Features: 1-on-1 chat, group chat, file sharing, read receipts
- Performance: <100ms message delivery, 99.9% uptime
- Security: End-to-end encryption, OAuth authentication
Design Architecture (03):
- Event-driven architecture (WebSocket + message queue)
- Components: API Gateway, Chat Service, Message Queue (Kafka), Database (PostgreSQL), Cache (Redis)
- Diagrams: High-level architecture, message flow
Design Data Model (04):
- Tables: users, conversations, messages, participants
- Indexes: message_timestamp, conversation_id
- Sharding strategy: By conversation_id
Design APIs (05):
- WebSocket for real-time messages
- REST for user management
- Endpoints: POST /conversations, GET /messages, etc.
Plan Scalability (06):
- Horizontal scaling of chat services
- Redis for online user presence
- Kafka for message distribution
- Read replicas for message history
Design Security (07):
- OAuth 2.0 authentication
- End-to-end encryption for messages
- Rate limiting to prevent spam
Plan Monitoring (08):
- Metrics: Message delivery time, WebSocket connections
- Alerts: High message queue lag, connection drops
Plan Deployment (09):
- Kubernetes on AWS
- Blue-green deployment
- Auto-scaling based on connection count

Validate:

python scripts/validate_spec.py ./spec/chat-application

Deliver: Present the complete spec/chat-application/ folder to the user

Tips for Effective System Design

Do's

✅ Start with requirements - Understand what you're building before designing ✅ Be specific with numbers - Use actual metrics (100,000 users, <200ms latency) ✅ Document trade-offs - Explain why you chose option A over option B ✅ Use diagrams - Visual representations are clearer than text ✅ Think about failure - Design for component failures and degradation ✅ Keep it realistic - Don't over-engineer or under-estimate ✅ Reference best practices - Use the reference files for guidance ✅ Validate completeness - Use the validation script

Don'ts

❌ Don't be vague - "Handle many users" → "Support 100,000 concurrent users" ❌ Don't skip sections - Complete all 9 specification files ❌ Don't copy-paste without customization - Adapt to specific requirements ❌ Don't forget diagrams - Architecture diagrams are essential ❌ Don't ignore non-functional requirements - Performance, security, scalability matter ❌ Don't leave placeholders - Replace all TODOs with actual content ❌ Don't design in isolation - Consider the user's constraints and context

Common System Design Patterns

Small Application (MVP)

Architecture: Monolithic Stack: Next.js + PostgreSQL + Vercel Scale: <10,000 users

Medium Application (Growing Startup)

Architecture: Modular Monolith → Microservices transition Stack: Node.js/Python + PostgreSQL + Redis + AWS Scale: 10,000-500,000 users

Large Application (Enterprise)

Architecture: Microservices + Event-Driven Stack: Polyglot (Go/Java/Node.js) + PostgreSQL + Kafka + Kubernetes Scale: 500,000+ users

Real-Time Application

Architecture: Event-Driven + WebSockets Stack: Node.js + Redis + Kafka + PostgreSQL Examples: Chat, Live Dashboard, Collaborative Editing

High-Traffic Application

Architecture: Microservices + CDN + Multi-Region Stack: CDN + Load Balancer + Horizontal Services + Database Replicas Examples: E-commerce, Social Media, Video Streaming

Output Format

Always create a folder structure like this:

spec/
└── <system-name>/
    ├── README.md
    ├── 01-overview.md
    ├── 02-requirements.md
    ├── 03-architecture.md
    ├── 04-data-model.md
    ├── 05-api-design.md
    ├── 06-scalability.md
    ├── 07-security.md
    ├── 08-monitoring.md
    ├── 09-deployment.md
    └── diagrams/
        ├── architecture-overview.mmd
        ├── data-flow.mmd
        └── erd.mmd

All files should be comprehensive, professional, and production-ready. Each section should contain specific, actionable information rather than placeholders or generic descriptions.

Summary

This skill enables you to create complete, professional system design specifications covering:

Requirements (functional and non-functional)
Architecture (components, patterns, decisions)
Data modeling (schemas, relationships, indexing)
API design (endpoints, contracts, authentication)
Scalability (caching, load balancing, capacity planning)
Security (threat model, encryption, access control)
Monitoring (logging, metrics, alerting, SLOs)
Deployment (CI/CD, infrastructure, disaster recovery)

Use the scripts to initialize and validate, and reference the guides for best practices. Always tailor the design to the specific requirements and constraints provided by the user.

system-design

Install Skill

SKILL.md

System Design

Overview

Workflow

Step 1: Gather Requirements

Step 2: Initialize Spec Folder

Step 3: Complete the Specification

3.1 Overview and Requirements (Files 01-02)

3.2 Architecture Design (File 03)

3.3 Data Model (File 04)

3.4 API Design (File 05)

3.5 Scalability (File 06)

3.6 Security (File 07)

3.7 Monitoring (File 08)

3.8 Deployment (File 09)

Step 4: Add Diagrams

Step 5: Technology Selection

Step 6: Validate Completeness

Step 7: Review and Finalize

Reference Files

`architectural-patterns.md`

`tech-stack-guide.md`

`system-design-workflow.md`

Example Usage

Tips for Effective System Design

Do's

Don'ts

Common System Design Patterns

Small Application (MVP)

Medium Application (Growing Startup)

Large Application (Enterprise)

Real-Time Application

High-Traffic Application

Output Format

Summary