name	documentation-structure
description	Templates for ADRs, API docs, runbooks, READMEs - consistent, complete, findable documentation

Documentation Structure

Overview

Proven documentation patterns for common technical content. Use these templates to create consistent, complete, findable documentation.

Core Principle: Structure determines findability. Well-structured docs get used; poorly structured docs get ignored.

When to Use

Load this skill when:

Creating new documentation (ADR, API docs, runbook, README)
Choosing documentation format
Organizing existing scattered documentation
User mentions: "document decision", "API reference", "runbook", "README"

ADR (Architecture Decision Record)

When to Use ADRs

Use ADRs for:

Technology choices (database, framework, library)
Architecture patterns (microservices vs monolith, REST vs GraphQL)
Design decisions with long-term consequences
Trade-off decisions (performance vs simplicity)

Don't use ADRs for:

Implementation details (how to write a function)
Temporary decisions (which bug to fix first)
Obvious choices (use version control, write tests)

Complete ADR Template

# ADR-NNN: [Short Title of Decision]

**Status**: [Proposed | Accepted | Deprecated | Superseded by ADR-XXX]
**Date**: YYYY-MM-DD
**Deciders**: [Names or roles of people who made decision]
**Context**: [What prompted this decision]

## Summary

[One-paragraph summary of the decision and its impact]

## Context

[Describe the problem you're solving]

- What constraints exist? (technical, business, time, people)
- What requirements must be met?
- What assumptions are we making?
- What's the current state (if replacing something)?

## Decision

[State the decision clearly and concisely]

We will [decision statement].

## Alternatives Considered

### Alternative 1: [Name]

**Description**: [What this alternative involves]

**Pros**:
- [Advantage 1]
- [Advantage 2]

**Cons**:
- [Disadvantage 1]
- [Disadvantage 2]

**Why rejected**: [Specific reason this wasn't chosen]

### Alternative 2: [Name]

[Same format as Alternative 1]

## Consequences

### Positive

- [Good outcome 1]
- [Good outcome 2]

### Negative

- [Trade-off 1]
- [Trade-off 2]

### Neutral

- [Change that's neither good nor bad, just different]

## Implementation Notes

[Optional: Technical details, migration steps, timeline]

## Related Decisions

- **Supersedes**: ADR-XXX (if applicable)
- **Superseded by**: ADR-YYY (if applicable)
- **Related to**: ADR-ZZZ, ADR-AAA (decisions that interact with this one)

## References

- [Links to relevant documentation, RFCs, blog posts, research papers]

ADR Numbering Convention

Sequential numbering: ADR-001, ADR-002, etc.
Never reuse numbers (even if decision is deprecated)
Pad with zeros: ADR-007 not ADR-7 (sorts correctly)

ADR Location

docs/architecture/decisions/
├── README.md (index of all ADRs)
├── ADR-001-use-postgresql.md
├── ADR-002-mls-enforcement.md
├── ADR-003-plugin-registry.md
└── ADR-004-abc-over-protocol.md

Example: Real ADR (BasePlugin ABC)

# ADR-004: Use Abstract Base Class Instead of Protocol for Plugin System

**Status**: Accepted
**Date**: 2025-10-28
**Deciders**: Security Architecture Team
**Context**: Multi-Level Security enforcement requires reliable type checking

## Summary

We will use Abstract Base Class (ABC) instead of Protocol for the BasePlugin interface
to enable runtime type verification critical for security level enforcement.

## Context

The plugin system requires security level validation before plugins can execute. We need
to verify that all plugins inherit from BasePlugin to ensure they implement mandatory
security methods and properties.

Constraints:
- Security level must be immutable and verifiable at runtime
- Plugin registration must confirm plugin type before allowing execution
- Need to prevent duck-typed plugins from bypassing security checks

Python offers two approaches for defining plugin interfaces:
1. Protocol (PEP 544) - structural subtyping (duck typing)
2. Abstract Base Class - nominal typing with inheritance

## Decision

We will use Abstract Base Class (ABC) with @abstractmethod for the BasePlugin interface.

## Alternatives Considered

### Alternative 1: Protocol-based Interface

**Description**: Define BasePlugin as a Protocol, allowing any class implementing
the required methods to be considered a valid plugin.

**Pros**:
- More flexible - no inheritance required
- Easier for third-party plugins
- More "Pythonic" for general use

**Cons**:
- isinstance() checks don't work reliably with Protocol
- Security bypass risk: attacker creates duck-typed plugin without BasePlugin
- Can't seal security-critical methods
- Type checking is structural, not nominal

**Why rejected**: Security level verification requires isinstance() to confirm plugin
inheritance. Protocol duck typing allows security bypasses (see threat model THREAT-003).

### Alternative 2: Manual Registration Without Type Checks

**Description**: Don't enforce type at all - rely on plugin registry and runtime checks.

**Pros**:
- Maximum flexibility
- No inheritance requirements

**Cons**:
- No compile-time safety
- Easy to bypass registration
- Higher runtime overhead for checks

**Why rejected**: Defense-in-depth principle requires type system + registry + runtime.
Single-layer validation is insufficient.

## Consequences

### Positive

- isinstance(plugin, BasePlugin) provides reliable runtime type checking
- Sealed methods prevent subclasses from overriding security-critical code
- Nominal typing makes security boundaries explicit
- Compile-time type safety via mypy

### Negative

- Third-party plugins must inherit from BasePlugin (less flexible)
- Tighter coupling between plugins and framework
- Slightly more boilerplate for plugin authors

### Neutral

- Plugins must be registered AND inherit from BasePlugin (defense-in-depth)

## Implementation Notes

- BasePlugin declared as ABC with frozen dataclass
- security_level property marked as @abstractmethod
- Plugin factory verifies isinstance() before instantiation
- Mypy configured to require nominal types for plugins

## Related Decisions

- **Related to**: ADR-002 (MLS enforcement - requires type checking)
- **Related to**: ADR-003 (Plugin registry - ABC + registry = defense-in-depth)
- **Related to**: ADR-005 (Frozen plugin capability - ABC enables sealed methods)

## References

- PEP 544: Protocols - https://peps.python.org/pep-0544/
- Bell-LaPadula MLS model requirements
- Threat model THREAT-003: Type system bypass via duck typing

API Reference Documentation

When to Use API Reference Pattern

Use for:

REST APIs
GraphQL APIs
Library/SDK public interfaces
Internal service APIs

Complete API Documentation Structure

# [Service/API Name] API Reference

## Overview

**Base URL**: `https://api.example.com/v1`
**Protocol**: HTTPS only
**Format**: JSON

[One-paragraph description of what this API does]

## Authentication

### Method

[OAuth 2.0 | API Key | JWT | etc.]

### Obtaining Credentials

[How to get API key/token]

### Using Authentication

**Header Format**:

Authorization: Bearer {token}


**Example**:
```bash
curl -H "Authorization: Bearer abc123..." https://api.example.com/v1/users

Token Expiration

Access tokens: 1 hour
Refresh tokens: 30 days

Rate Limiting

Limit: 1000 requests per hour per API key
Headers:
- X-RateLimit-Limit: Your rate limit ceiling
- X-RateLimit-Remaining: Requests remaining in window
- X-RateLimit-Reset: UTC epoch seconds when limit resets

Example Response (429 Too Many Requests):

{
  "error": "rate_limit_exceeded",
  "message": "Rate limit of 1000 requests per hour exceeded",
  "retry_after": 1800
}

Pagination

Parameters

page: Page number (default: 1)
limit: Items per page (default: 20, max: 100)

Response Format

{
  "data": [ /* items */ ],
  "pagination": {
    "page": 1,
    "limit": 20,
    "total": 157,
    "pages": 8
  }
}

Navigation Links

{
  "links": {
    "first": "https://api.example.com/v1/users?page=1",
    "prev": null,
    "next": "https://api.example.com/v1/users?page=2",
    "last": "https://api.example.com/v1/users?page=8"
  }
}

Versioning

URL-based versioning: /v1/, /v2/
Current version: v1
Deprecation policy: 12 months notice before version sunset

Endpoints

[Resource Name]

List [Resources]

Endpoint: GET /[resource]

Description: [What this endpoint does]

Authentication: Required

Query Parameters:

param1 (string, optional): [Description]
param2 (integer, optional): [Description]
page (integer, optional): Page number
limit (integer, optional): Items per page

Example Request:

curl -X GET "https://api.example.com/v1/users?role=admin&page=1&limit=20" \
  -H "Authorization: Bearer abc123..."

Success Response (200 OK):

{
  "data": [
    {
      "id": "550e8400-e29b-41d4-a716-446655440000",
      "username": "jdoe",
      "email": "jdoe@example.com",
      "role": "admin",
      "created_at": "2025-10-28T14:30:00Z"
    }
  ],
  "pagination": {
    "page": 1,
    "limit": 20,
    "total": 1
  }
}

Error Responses:

401 Unauthorized: Missing or invalid authentication token
403 Forbidden: Authenticated but lacks permission
429 Too Many Requests: Rate limit exceeded

Get [Resource]

Endpoint: GET /[resource]/{id}

[Similar format as above]

Create [Resource]

Endpoint: POST /[resource]

[Similar format as above]

Update [Resource]

Endpoint: PUT /[resource]/{id} or PATCH /[resource]/{id}

[Similar format as above]

Delete [Resource]

Endpoint: DELETE /[resource]/{id}

[Similar format as above]

Error Codes

Code	Name	Description
400	Bad Request	Invalid request format or parameters
401	Unauthorized	Missing or invalid authentication
403	Forbidden	Authenticated but lacks permission
404	Not Found	Resource doesn't exist
409	Conflict	Resource already exists or version conflict
422	Unprocessable Entity	Validation failed
429	Too Many Requests	Rate limit exceeded
500	Internal Server Error	Server error (contact support)
503	Service Unavailable	Temporary outage or maintenance

Error Response Format

{
  "error": "error_code_identifier",
  "message": "Human-readable error message",
  "details": {
    "field": "specific_field_with_error",
    "reason": "why_it_failed"
  },
  "request_id": "req_abc123",
  "timestamp": "2025-10-28T14:30:00Z"
}

SDKs and Client Libraries

[Links to official SDKs for different languages]

Webhooks

[If applicable - webhook registration, event types, payload formats]

Changelog

v1.2.0 (2025-10-15)

Added: Webhook support for user events
Changed: Increased rate limit from 500 to 1000 req/hour

v1.1.0 (2025-09-01)

Added: PATCH support for partial updates
Fixed: Pagination links for empty results


---

## Runbook Pattern

### When to Use Runbooks

**Use runbooks for**:
- Deployment procedures
- Incident response playbooks
- Maintenance operations
- Recovery procedures
- Regular operational tasks

### Complete Runbook Template

```markdown
# Runbook: [Operation Name]

**Purpose**: [One-sentence description of what this runbook achieves]
**Owner**: [Team or person responsible]
**Last Updated**: YYYY-MM-DD
**Frequency**: [On-demand | Weekly | Monthly | During incidents]

## Overview

[2-3 sentences describing when to use this runbook and what it accomplishes]

## Prerequisites

### Required Access

- [ ] Production database access (role: `db-operator`)
- [ ] Kubernetes cluster access (namespace: `production`)
- [ ] PagerDuty access (for incident updates)
- [ ] VPN connection to production network

### Required Tools

- [ ] `kubectl` v1.28+
- [ ] `psql` PostgreSQL client
- [ ] `aws-cli` configured with production profile
- [ ] SSH key for bastion host

### Required Knowledge

- Basic Kubernetes concepts
- SQL query syntax
- Understanding of [specific system architecture]

### Verification

Run these commands to verify prerequisites:
```bash
# Check kubectl access
kubectl get nodes

# Check database access
psql -h db.production.example.com -U operator -c "SELECT 1"

# Check AWS access
aws sts get-caller-identity

Safety Checks

STOP if any of these are true:

Active incident in progress (check PagerDuty)
Scheduled maintenance window not started
Change request not approved
Backup not verified (see "Pre-Operation Backup" below)

Procedure

Step 1: Create Backup

Purpose: Ensure rollback is possible if operation fails

# Create database backup
pg_dump -h db.production.example.com -U operator \
  -Fc production_db > backup-$(date +%Y%m%d-%H%M%S).dump

# Verify backup
ls -lh backup-*.dump

Expected Result: Backup file created, size > 0 bytes

If this fails: [What to do if backup fails]

Step 2: [Operation Step]

Purpose: [What this step does]

# Commands to run
command1
command2

Expected Result: [What you should see]

If this fails: [Troubleshooting steps]

[Repeat for each step]

Final Step: Verify Operation

Purpose: Confirm operation succeeded

# Verification commands

Success Criteria:

Service responds with 200 OK
No errors in logs (last 5 minutes)
Metrics show normal traffic

Post-Operation

Update Tracking

Update change request ticket with completion time
Update runbook if procedure changed
Document any deviations from standard procedure

Monitoring

Monitor these for 30 minutes after operation:

Application logs: kubectl logs -f deployment/app -n production
Error rate: [Link to monitoring dashboard]
Response time: [Link to metrics]

Rollback Procedure

When to rollback:

Operation failed at any step
Post-operation verification failed
Unexpected behavior observed

Steps:

# Restore from backup
pg_restore -h db.production.example.com -U operator \
  -d production_db backup-YYYYMMDD-HHMMSS.dump

[Additional rollback steps]

Verification:

Service restored to pre-operation state
No data loss confirmed
Application functioning normally

Troubleshooting

Problem: [Common Issue 1]

Symptoms: [What you see]

Cause: [Why this happens]

Solution:

# Commands to fix

Problem: [Common Issue 2]

[Same format]

Escalation

When to escalate:

Rollback failed
Data integrity concerns
Incident severity increases
Unsure how to proceed

Who to contact:

On-call engineer: [PagerDuty rotation or phone]
Database team: [Contact method]
Security team (if data breach suspected): [Contact method]

References

[Link to architecture diagram]
[Link to related runbooks]
[Link to incident post-mortems]
[Link to system documentation]


---

## README Patterns

### When to Use Each README Type

**Simple README** (<100 lines):
- Single-purpose utilities
- Scripts
- Small libraries

**Standard README** (100-300 lines):
- Applications
- Multi-feature libraries
- Services

**Comprehensive README** (300+ lines):
- Open-source projects
- Complex systems
- Projects with many contributors

### Simple README Template

For small utilities and scripts:

```markdown
# [Project Name]

[One-sentence description of what it does]

## Installation

```bash
pip install project-name

Usage

# Basic example
project-name input.txt output.txt

# With options
project-name --verbose input.txt output.txt

Options

--verbose: Print detailed progress
--output FILE: Specify output file

Requirements

Python 3.8+
No external dependencies

License

MIT


### Standard README Template

For most projects:

```markdown
# [Project Name]

[2-3 sentence description of what the project does and why it exists]

## Features

- Feature 1
- Feature 2
- Feature 3

## Installation

### Prerequisites

- [Dependency 1] version X.Y+
- [Dependency 2]

### Install from Source

```bash
git clone https://github.com/user/project.git
cd project
pip install -r requirements.txt

Install from Package Manager

pip install project-name

Quick Start

# Minimal example to get started
project-name --help

Usage

Basic Usage

# Example 1
project-name command arg1 arg2

# Example 2
project-name --option value

Advanced Usage

[More complex examples]

Configuration

Configuration file location: ~/.project/config.yml

# Example configuration
option1: value1
option2: value2

Documentation

Contributing

See CONTRIBUTING.md

License

[License name] - see LICENSE

Support


### Comprehensive README Template

For open-source and complex projects:

```markdown
# [Project Name]

[![Build Status](badge-url)](build-url)
[![Coverage](badge-url)](coverage-url)
[![License](badge-url)](license-url)

[3-4 sentence description of the project, its purpose, and key benefits]

## Table of Contents

- [Features](#features)
- [Installation](#installation)
- [Quick Start](#quick-start)
- [Usage](#usage)
- [Configuration](#configuration)
- [Architecture](#architecture)
- [API Reference](#api-reference)
- [Contributing](#contributing)
- [Testing](#testing)
- [Deployment](#deployment)
- [Troubleshooting](#troubleshooting)
- [FAQ](#faq)
- [Roadmap](#roadmap)
- [License](#license)
- [Acknowledgements](#acknowledgements)

[Rest of content follows standard README template but with more depth]

## Architecture

High-level overview with diagram:

[ASCII diagram or link to docs/architecture/]


## Performance

- Benchmark results
- Scalability characteristics
- Resource requirements

## Security

See [SECURITY.md](SECURITY.md) for security policy and vulnerability reporting.

## Changelog

See [CHANGELOG.md](CHANGELOG.md) for version history.

## Roadmap

- [x] Feature 1 (completed)
- [ ] Feature 2 (in progress)
- [ ] Feature 3 (planned)

See [full roadmap](ROADMAP.md)

Architecture Documentation Structure

Directory Organization

docs/
├── README.md (navigation hub)
├── architecture/
│   ├── README.md (system overview)
│   ├── decisions/ (ADRs)
│   │   ├── README.md (ADR index)
│   │   └── ADR-NNN-*.md
│   ├── diagrams/
│   │   ├── system-overview.png
│   │   ├── data-flow.png
│   │   └── deployment.png
│   ├── components/
│   │   ├── authentication.md
│   │   ├── database.md
│   │   └── api-gateway.md
│   └── security/
│       ├── threat-model.md
│       ├── access-control.md
│       └── encryption.md
├── api/
│   └── reference.md
├── guides/
│   ├── getting-started.md
│   ├── contributing.md
│   └── deployment.md
└── runbooks/
    ├── deployment.md
    ├── backup-restore.md
    └── incident-response.md

Common Mistakes

❌ Incomplete ADRs

Wrong: ADR with only "We chose X" and no alternatives/consequences Right: Complete ADR with Context, Alternatives Considered, Consequences, Related Decisions

❌ Scattered Documentation

Wrong: Decisions in README, code comments, wiki, Slack Right: Single source of truth - decisions in ADRs, linked from other locations

❌ Missing API Details

Wrong: API docs with only endpoints and examples Right: API docs with auth, rate limiting, pagination, versioning, error codes

❌ Incomplete Runbooks

Wrong: Runbook with only procedure steps Right: Runbook with prerequisites, safety checks, verification, rollback, troubleshooting

❌ Generic README

Wrong: README saying "This is a project that does things" Right: README with concrete features, runnable examples, clear installation steps

Quick Reference

Document Type	Use When	Key Sections
ADR	Architecture/technology decisions with long-term impact	Context, Alternatives, Consequences, Related Decisions
API Reference	Documenting REST/GraphQL APIs	Auth, Rate Limiting, Pagination, Endpoints, Errors
Runbook	Operational procedures	Prerequisites, Safety, Procedure, Verification, Rollback
README (Simple)	Small utilities (<100 lines)	Installation, Usage, Options
README (Standard)	Most projects	Features, Installation, Quick Start, Usage, Config
README (Comprehensive)	Open-source/complex projects	All standard + Architecture, Performance, Roadmap

Real-World Example: Elspeth Documentation Evolution

Before (Scattered narratives):

README: 8 sections explaining architecture decisions
Code comments: "// We chose ABC because..."
No traceability or findability

After (Structured with ADRs):

14 ADRs documenting key decisions
README: Quick start + links to ADRs
Code comments: // See ADR-004 for rationale
Clear decision trail: ADR-002 (MLS) → ADR-003 (Registry) → ADR-004 (ABC) → ADR-005 (Frozen)

Key Improvement: "Can't find why we chose X" → "ADR-004 documents ABC vs Protocol decision with full context"

Summary

Use the right structure for the content type:

ADRs: Architecture decisions → Complete template with alternatives and consequences
API docs: REST/GraphQL → Auth, rate limiting, pagination, versioning, errors
Runbooks: Operations → Prerequisites, safety, procedure, verification, rollback
READMEs: Project overview → Match complexity (simple/standard/comprehensive)

Meta-rule: Good structure makes docs findable. If readers can't find it, you haven't documented it.

Install Skill

SKILL.md

Documentation Structure

Overview

When to Use

ADR (Architecture Decision Record)

When to Use ADRs

Complete ADR Template

ADR Numbering Convention

ADR Location

Example: Real ADR (BasePlugin ABC)

API Reference Documentation

When to Use API Reference Pattern

Complete API Documentation Structure

Token Expiration

Rate Limiting

Pagination

Parameters

Response Format

Navigation Links

Versioning

Endpoints

[Resource Name]

List [Resources]

Get [Resource]

Create [Resource]

Update [Resource]

Delete [Resource]

Error Codes

Error Response Format

SDKs and Client Libraries

Webhooks

Changelog

v1.2.0 (2025-10-15)

v1.1.0 (2025-09-01)

Safety Checks

Procedure

Step 1: Create Backup

Step 2: [Operation Step]

Final Step: Verify Operation

Post-Operation

Update Tracking

Monitoring

Rollback Procedure

Troubleshooting

Problem: [Common Issue 1]

Problem: [Common Issue 2]

Escalation

References

Usage

Options

Requirements

License

Install from Package Manager

Quick Start

Usage

Basic Usage

Advanced Usage

Configuration

Documentation

Contributing

License

Support

Architecture Documentation Structure

Directory Organization

Common Mistakes

❌ Incomplete ADRs

❌ Scattered Documentation

❌ Missing API Details

❌ Incomplete Runbooks

❌ Generic README

Quick Reference

Real-World Example: Elspeth Documentation Evolution

Summary