Claude Code Plugins

Community-maintained marketplace

Feedback

infrastructure-documenter

@eddiebe147/claude-settings
0
0

Expert guide for documenting infrastructure including architecture diagrams, runbooks, system documentation, and operational procedures. Use when creating technical documentation for systems and deployments.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name infrastructure-documenter
description Expert guide for documenting infrastructure including architecture diagrams, runbooks, system documentation, and operational procedures. Use when creating technical documentation for systems and deployments.

Infrastructure Documenter Skill

Overview

This skill helps you create clear, maintainable infrastructure documentation. Covers architecture diagrams, runbooks, system documentation, operational procedures, and documentation-as-code practices.

Documentation Philosophy

Principles

  1. Living documentation: Keep it in sync with reality
  2. Audience-aware: Different docs for different readers
  3. Actionable: Every doc should help someone do something
  4. Version-controlled: Documentation changes tracked with code

Document Types

Type Audience Purpose
Architecture Engineers Understand system design
Runbooks Ops/SRE Handle incidents
API Docs Developers Integrate with system
Onboarding New hires Get up to speed
Decision Records Future you Understand why

Architecture Documentation

System Architecture Overview

# System Architecture

## Overview

[Project Name] is a [type] application that [purpose].

## High-Level Architecture

┌─────────────────────────────────────────────────────────────┐ │ Users │ └─────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ Vercel Edge │ │ ┌─────────────────┐ ┌─────────────────┐ │ │ │ Next.js App │ │ Edge Functions │ │ │ └─────────────────┘ └─────────────────┘ │ └─────────────────────────────────────────────────────────────┘ │ ┌───────────────┼───────────────┐ ▼ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ Supabase │ │ Redis │ │ Stripe │ │ - PostgreSQL │ │ - Session │ │ - Payments │ │ - Auth │ │ - Cache │ │ - Webhooks │ │ - Realtime │ │ │ │ │ │ - Storage │ │ │ │ │ └─────────────────┘ └─────────────────┘ └─────────────────┘


## Components

### Frontend (Next.js App)
- **Location**: Vercel Edge Network
- **Framework**: Next.js 14 (App Router)
- **Styling**: Tailwind CSS + shadcn/ui
- **State**: Zustand + React Query

### Backend Services
| Service | Provider | Purpose |
|---------|----------|---------|
| Database | Supabase | PostgreSQL with RLS |
| Auth | Supabase Auth | User authentication |
| Storage | Supabase Storage | File uploads |
| Cache | Upstash Redis | Session & API cache |
| Payments | Stripe | Subscriptions |
| Email | Resend | Transactional emails |

### Data Flow

1. User request → Vercel Edge
2. SSR/API Route processes request
3. Database queries via Supabase client
4. Response cached at edge (when applicable)
5. Response returned to user

## Security

### Authentication Flow
1. User signs in via Supabase Auth
2. JWT token issued and stored in cookie
3. Server validates token on each request
4. RLS policies enforce data access

### Data Protection
- All data encrypted at rest (AES-256)
- TLS 1.3 for data in transit
- Secrets stored in Vercel environment
- PII fields encrypted in database

Mermaid Diagrams

## Request Flow

```mermaid
sequenceDiagram
    participant U as User
    participant V as Vercel
    participant N as Next.js
    participant S as Supabase
    participant R as Redis

    U->>V: HTTPS Request
    V->>N: Route to App

    alt Cached Response
        N->>R: Check Cache
        R-->>N: Cache Hit
        N-->>U: Return Cached
    else Cache Miss
        N->>S: Query Database
        S-->>N: Data
        N->>R: Store in Cache
        N-->>U: Return Response
    end

Database Schema

erDiagram
    users ||--o{ projects : owns
    users {
        uuid id PK
        text email
        text name
        timestamp created_at
    }
    projects ||--o{ tasks : contains
    projects {
        uuid id PK
        uuid user_id FK
        text name
        text status
    }
    tasks {
        uuid id PK
        uuid project_id FK
        text title
        boolean completed
    }

## Runbooks

### Runbook Template

```markdown
# Runbook: [Service Name] - [Issue Type]

## Overview
Brief description of the issue and when this runbook applies.

## Severity
- **P1 (Critical)**: Complete outage
- **P2 (High)**: Degraded service
- **P3 (Medium)**: Minor impact
- **P4 (Low)**: No user impact

## Detection
How this issue is typically detected:
- [ ] Alert from [monitoring system]
- [ ] User report
- [ ] Automated check failure

## Impact Assessment
- **Users affected**: All / Segment / None
- **Data at risk**: Yes / No
- **Revenue impact**: High / Medium / Low / None

## Prerequisites
- [ ] Access to [system/dashboard]
- [ ] Credentials for [service]
- [ ] Contact info for [team/person]

## Resolution Steps

### Step 1: Verify the Issue
```bash
# Check service status
curl -I https://api.example.com/health

# Check logs
vercel logs --follow

Step 2: Identify Root Cause

Common causes:

  • Database connection pool exhausted
  • Memory limit reached
  • External service down
  • Bad deployment

Step 3: Apply Fix

If Database Issue:

# Check connection count
SELECT count(*) FROM pg_stat_activity;

# Kill idle connections
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state = 'idle' AND query_start < now() - interval '1 hour';

If Bad Deployment:

# Rollback to previous deployment
vercel rollback

Step 4: Verify Fix

# Check service health
curl https://api.example.com/health

# Monitor error rates for 15 minutes

Escalation

If unable to resolve within 30 minutes:

  1. Page on-call engineer: [contact]
  2. Notify stakeholders in #incidents
  3. Update status page

Post-Incident

  • Create incident report
  • Schedule post-mortem (P1/P2 only)
  • Update this runbook if needed

Related Links


### Database Runbooks

```markdown
# Runbook: Database Performance Issues

## Symptoms
- Slow API responses (>1s)
- Timeout errors in logs
- High database CPU in dashboard

## Quick Checks

### 1. Check Active Connections
```sql
SELECT
  state,
  count(*),
  max(now() - query_start) as max_duration
FROM pg_stat_activity
GROUP BY state;

2. Find Long-Running Queries

SELECT
  pid,
  now() - query_start AS duration,
  query
FROM pg_stat_activity
WHERE state = 'active'
  AND now() - query_start > interval '30 seconds'
ORDER BY duration DESC;

3. Check Table Sizes

SELECT
  schemaname,
  tablename,
  pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) as size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC
LIMIT 10;

4. Check Missing Indexes

SELECT
  relname,
  seq_scan,
  idx_scan,
  seq_scan - idx_scan AS difference
FROM pg_stat_user_tables
WHERE seq_scan > idx_scan
ORDER BY difference DESC;

Resolution

Kill Problematic Queries

SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE pid = [PID_FROM_ABOVE];

Add Missing Index

CREATE INDEX CONCURRENTLY idx_table_column
ON table_name (column_name);

## Decision Records (ADRs)

### ADR Template

```markdown
# ADR-001: Choose Supabase for Database

## Status
Accepted

## Context
We need a database solution for [Project Name] that supports:
- PostgreSQL compatibility
- Real-time subscriptions
- Built-in authentication
- Easy local development
- Generous free tier

## Decision
We will use Supabase as our primary database and auth provider.

## Alternatives Considered

### PlanetScale
**Pros:**
- Excellent scaling
- Branching for schema changes
- MySQL compatible

**Cons:**
- No built-in auth
- No real-time subscriptions
- Additional services needed

### Firebase
**Pros:**
- Real-time built-in
- Mature platform
- Good mobile SDKs

**Cons:**
- NoSQL (not ideal for our use case)
- Vendor lock-in concerns
- Complex security rules

## Consequences

### Positive
- Single provider for DB + Auth + Storage
- Great developer experience
- Row Level Security for data protection
- Local development with supabase CLI

### Negative
- PostgreSQL-specific features tie us to provider
- Supabase still maturing (some rough edges)
- Limited to their managed offering

### Risks
- Supabase scaling limitations at high traffic
- Migration cost if we need to move

## References
- [Supabase Documentation](https://supabase.com/docs)
- [Comparison: Supabase vs Firebase](https://...)

API Documentation

Endpoint Documentation

# API Reference

## Base URL

Production: https://api.example.com/v1 Staging: https://staging-api.example.com/v1


## Authentication

All API requests require authentication via Bearer token.

```bash
curl -H "Authorization: Bearer YOUR_TOKEN" \
  https://api.example.com/v1/users

Endpoints

Users

Get Current User

GET /users/me

Response:

{
  "id": "usr_123",
  "email": "user@example.com",
  "name": "John Doe",
  "created_at": "2024-01-01T00:00:00Z"
}

Update User

PATCH /users/me

Request Body:

Field Type Required Description
name string No Display name
avatar_url string No Profile image URL

Example:

curl -X PATCH \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "Jane Doe"}' \
  https://api.example.com/v1/users/me

Error Responses

Status Code Description
400 BAD_REQUEST Invalid request body
401 UNAUTHORIZED Missing or invalid token
403 FORBIDDEN Insufficient permissions
404 NOT_FOUND Resource not found
429 RATE_LIMITED Too many requests
500 INTERNAL_ERROR Server error

Error Response Format:

{
  "error": {
    "code": "NOT_FOUND",
    "message": "User not found"
  }
}

## Environment Documentation

### Environment Matrix

```markdown
# Environments

## Overview

| Environment | URL | Purpose | Deploy |
|-------------|-----|---------|--------|
| Production | https://myapp.com | Live users | Manual (main) |
| Staging | https://staging.myapp.com | Pre-release testing | Auto (main) |
| Preview | https://pr-*.vercel.app | PR review | Auto (PR) |
| Development | http://localhost:3000 | Local dev | Manual |

## Configuration

### Production
```env
NODE_ENV=production
DATABASE_URL=[Supabase Production]
NEXT_PUBLIC_APP_URL=https://myapp.com

Staging

NODE_ENV=production
DATABASE_URL=[Supabase Staging Branch]
NEXT_PUBLIC_APP_URL=https://staging.myapp.com

Development

NODE_ENV=development
DATABASE_URL=[Local Supabase]
NEXT_PUBLIC_APP_URL=http://localhost:3000

Access

Production

  • Vercel: Admin only
  • Database: Read-only for devs, write for admin
  • Logs: All engineers

Staging

  • Vercel: All engineers
  • Database: All engineers
  • Logs: All engineers

Secrets Rotation

Secret Rotation Last Rotated
Database password 90 days 2024-01-15
API keys 90 days 2024-01-15
JWT secret Never Initial setup

## Documentation-as-Code

### Documentation Structure

docs/ ├── README.md # Documentation index ├── architecture/ │ ├── overview.md # System architecture │ ├── data-flow.md # Data flow diagrams │ └── decisions/ # ADRs │ ├── 001-database.md │ └── 002-hosting.md ├── runbooks/ │ ├── README.md # Runbook index │ ├── database.md # Database issues │ ├── deployment.md # Deployment issues │ └── outage.md # Service outage ├── api/ │ └── reference.md # API documentation └── onboarding/ ├── setup.md # Local setup └── contributing.md # How to contribute


### Auto-Generated Documentation

```yaml
# .github/workflows/docs.yml
name: Generate Docs

on:
  push:
    branches: [main]
    paths:
      - 'src/**'
      - 'docs/**'

jobs:
  generate-docs:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Generate API docs from OpenAPI
        run: |
          npx @redocly/cli build-docs openapi.yaml \
            --output docs/api/index.html

      - name: Generate TypeDoc
        run: npx typedoc --out docs/api/typescript

      - name: Deploy to GitHub Pages
        uses: peaceiris/actions-gh-pages@v3
        with:
          github_token: ${{ secrets.GITHUB_TOKEN }}
          publish_dir: ./docs

Documentation Checklist

Architecture Docs

  • System overview diagram
  • Component descriptions
  • Data flow documentation
  • Security architecture
  • Technology decisions (ADRs)

Operational Docs

  • Runbooks for common issues
  • Deployment procedures
  • Monitoring and alerting
  • Incident response plan
  • On-call procedures

Developer Docs

  • Local setup guide
  • API reference
  • Contributing guidelines
  • Code conventions
  • Testing guide

Maintenance

  • Documentation review schedule
  • Ownership assigned
  • Change process defined
  • Versioning strategy

When to Use This Skill

Invoke this skill when:

  • Creating architecture documentation
  • Writing runbooks for operations
  • Documenting decision rationale (ADRs)
  • Setting up documentation structure
  • Creating onboarding materials
  • Building automated documentation
  • Planning incident response procedures