name	graceful-degradation
description	Build tiered fallback systems that degrade performance, not availability. Cache layers, API resilience, and automatic failover patterns for platform reliability.

Graceful Degradation

When to Use This Skill

Graceful degradation is a design principle that ensures systems continue operating when components fail. Rather than crashing or returning errors, the system automatically falls back to slower but working alternatives.

flowchart TD
    subgraph request[Request]
        A[Operation Requested]
    end

    subgraph tiers[Fallback Tiers]
        T1[Tier 1: Optimal]
        T2[Tier 2: Acceptable]
        T3[Tier 3: Guaranteed]
    end

    subgraph result[Result]
        Success[Success]
    end

    A --> T1
    T1 -->|Works| Success
    T1 -->|Fails| T2
    T2 -->|Works| Success
    T2 -->|Fails| T3
    T3 --> Success

    %% Ghostty Hardcore Theme
    style A fill:#65d9ef,color:#1b1d1e
    style T1 fill:#a7e22e,color:#1b1d1e
    style T2 fill:#fd971e,color:#1b1d1e
    style T3 fill:#f92572,color:#1b1d1e
    style Success fill:#a7e22e,color:#1b1d1e

The key insight: degrade performance, not availability.

Implementation

Cache Access Pattern

From From 5 Seconds to 5 Milliseconds:

Volume Mount → API Call → Rebuild Cache
    1-5ms        50ms        5000ms

# Kubernetes volume mount with optional flag
volumes:
  - name: cache-volume
    configMap:
      name: deployment-cache
      optional: true  # Tier 1 can fail gracefully

See examples.md for detailed code examples.

CI/CD Dependency Resolution

Artifact Cache → Dependency Cache → Fresh Install
    seconds          minutes          minutes+

See examples.md for detailed code examples.

API Resilience

Primary Endpoint → Secondary Endpoint → Cached Response → Static Fallback

Authentication

SSO → API Token → Service Account → Anonymous (read-only)

Techniques

Anti-Patterns

1. Silent Degradation

Degrading without logging or alerting means you won't know when Tier 1 is broken.

// Bad: silent fallback
func getData() []byte {
    if data, _ := cache.Get(); data != nil {
        return data
    }
    return fetchFromAPI()  // No indication we're in degraded mode
}

// Good: observable fallback
func getData() []byte {
    if data, err := cache.Get(); err == nil {
        metrics.CacheHit()
        return data
    }
    metrics.CacheMiss()
    log.Warn("cache miss, falling back to API")
    return fetchFromAPI()
}

2. No Guaranteed Tier

Every chain needs a final tier that always succeeds.

// Bad: can fail completely
func getConfig() (*Config, error) {
    if cfg := cache.Get(); cfg != nil {
        return cfg, nil
    }
    return api.FetchConfig()  // What if API is also down?
}

// Good: guaranteed fallback
func getConfig() *Config {
    if cfg := cache.Get(); cfg != nil {
        return cfg
    }
    if cfg, err := api.FetchConfig(); err == nil {
        return cfg
    }
    return DefaultConfig()  // Always works
}

3. Expensive Default Path

Using Tier 3 as the happy path defeats the purpose.

# Bad: always does full install
- run: npm ci
- uses: actions/cache/save@v4
  with:
    path: node_modules/

# Good: cache-first approach
- uses: actions/cache@v4
  id: cache
  with:
    path: node_modules/
    key: deps-${{ hashFiles('package-lock.json') }}

- if: steps.cache.outputs.cache-hit != 'true'
  run: npm ci

4. No Observability

You need to know:

Which tier is serving traffic
How often fallbacks occur
Latency per tier

- name: Report cache tier
  run: |
    if [ "${{ steps.mount-cache.outcome }}" = "success" ]; then
      echo "cache_tier=mount" >> metrics.txt
    elif [ "${{ steps.api-cache.outcome }}" = "success" ]; then
      echo "cache_tier=api" >> metrics.txt
    else
      echo "cache_tier=rebuild" >> metrics.txt
    fi

See reference.md for additional techniques and detailed examples.

Comparison

These patterns are complementary, not contradictory:

Scenario	Pattern	Reasoning
Precondition not met	Fail Fast	Don't waste resources on doomed operations
Runtime component fails	Graceful Degradation	Continue with fallback
Invalid input	Fail Fast	User error, report immediately
Network timeout	Graceful Degradation	Infrastructure issue, retry/fallback
Missing required config	Fail Fast	Can't continue safely
Cache miss	Graceful Degradation	Expensive path still works

Decision rule: Fail fast on precondition failures. Degrade gracefully on runtime failures.

Anti-Patterns to Avoid

1. Silent Degradation

Degrading without logging or alerting means you won't know when Tier 1 is broken.

See examples.md for detailed code examples.

2. No Guaranteed Tier

Every chain needs a final tier that always succeeds.

See examples.md for detailed code examples.

3. Expensive Default Path

Using Tier 3 as the happy path defeats the purpose.

See examples.md for detailed code examples.

4. No Observability

You need to know:

Which tier is serving traffic
How often fallbacks occur
Latency per tier

- name: Report cache tier
  run: |
    if [ "${{ steps.mount-cache.outcome }}" = "success" ]; then
      echo "cache_tier=mount" >> metrics.txt
    elif [ "${{ steps.api-cache.outcome }}" = "success" ]; then
      echo "cache_tier=api" >> metrics.txt
    else
      echo "cache_tier=rebuild" >> metrics.txt
    fi

graceful-degradation

Install Skill

SKILL.md

Graceful Degradation

When to Use This Skill

Implementation

Cache Access Pattern

CI/CD Dependency Resolution

API Resilience

Authentication

Techniques

Anti-Patterns

1. Silent Degradation

2. No Guaranteed Tier

3. Expensive Default Path

4. No Observability

Comparison

Anti-Patterns to Avoid

1. Silent Degradation

2. No Guaranteed Tier

3. Expensive Default Path

4. No Observability

Examples

Full Reference

References