Claude Code Plugins

Community-maintained marketplace

Feedback

Graceful degradation through cascading fallback strategies - ensures system always completes while maintaining acceptable functionality

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name cascade-workflow
version 1.0.0
description Graceful degradation through cascading fallback strategies - ensures system always completes while maintaining acceptable functionality
auto_activates external API, external service, timeout handling, high availability, fallback strategy, resilient operation
explicit_triggers /amplihack:cascade
confirmation_required false
token_budget 3500

Cascade Workflow with Graceful Degradation Skill

Purpose

Implement graceful degradation through cascading fallback strategies. When optimal approaches fail or timeout, the system automatically falls back to simpler, more reliable alternatives while maintaining acceptable functionality.

When to Use This Skill

USE FOR:

  • External service dependencies (APIs, databases)
  • Time-sensitive operations with acceptable degraded modes
  • Operations where partial results are better than no results
  • High-availability requirements (system must always respond)
  • Scenarios where waiting for perfect solution is worse than good-enough solution

AVOID FOR:

  • Operations requiring exact correctness (no acceptable degradation)
  • Security-critical operations (authentication, authorization)
  • Financial transactions (no room for "approximate")
  • When failures must surface to user (diagnostic operations)
  • Simple operations with no meaningful fallback

Configuration

Core Parameters

Timeout Strategy:

  • aggressive - Fast failures, quick degradation (5s / 2s / 1s)
  • balanced - Reasonable attempts (30s / 10s / 5s) - DEFAULT
  • patient - Thorough attempts before fallback (120s / 30s / 10s)
  • custom - Define your own timeouts

Fallback Types:

  • service - External API → Cached data → Static defaults
  • quality - Comprehensive → Standard → Minimal analysis
  • freshness - Real-time → Recent → Historical data
  • completeness - Full dataset → Sample → Summary
  • accuracy - Precise → Approximate → Estimate

Degradation Notification:

  • silent - Log only, no user notification
  • warning - Inform user of degradation
  • explicit - Detailed explanation of what degraded and why

Cascade Level Requirements

PRIMARY (Optimal):

  • Best possible outcome
  • May depend on external services
  • May be slow or resource-intensive
  • Can fail or timeout

SECONDARY (Acceptable):

  • Reduced quality but functional
  • More reliable than primary
  • Faster or fewer dependencies
  • Acceptable for users

TERTIARY (Guaranteed):

  • Always succeeds, never fails
  • No external dependencies
  • Fast and reliable
  • Minimal but functional
  • CRITICAL: Must be designed to never fail

Execution Process

Step 1: Define Cascade Levels

  • Use architect agent to identify cascade levels
  • Define PRIMARY approach (optimal solution)
  • Define SECONDARY approach (acceptable degradation)
  • Define TERTIARY approach (guaranteed completion)
  • Set timeout for each level
  • Document what degrades at each level
  • CRITICAL: Ensure tertiary ALWAYS succeeds

Example Cascade Definitions:

Code Analysis with AI:

  • PRIMARY: GPT-4 comprehensive analysis (timeout: 30s)
  • SECONDARY: GPT-3.5 standard analysis (timeout: 10s)
  • TERTIARY: Static analysis with regex (timeout: 5s)

External API Data Fetch:

  • PRIMARY: Live API call (timeout: 10s)
  • SECONDARY: Cached data (timeout: 2s)
  • TERTIARY: Default values (timeout: 0s)

Test Execution:

  • PRIMARY: Full test suite (timeout: 120s)
  • SECONDARY: Critical tests only (timeout: 30s)
  • TERTIARY: Smoke tests (timeout: 10s)

Step 2: Attempt Primary Approach

  • Execute optimal solution
  • Set timeout based on strategy configuration
  • Monitor execution progress
  • If completes successfully: DONE (best outcome)
  • If fails or times out: Continue to Step 3
  • Log attempt and reason for failure
# Pseudocode for primary attempt
try:
    result = execute_primary_approach(timeout=PRIMARY_TIMEOUT)
    log_success(level="PRIMARY", result=result)
    return result  # DONE - best outcome achieved
except TimeoutError:
    log_failure(level="PRIMARY", reason="timeout")
    # Continue to Step 3
except ExternalServiceError as e:
    log_failure(level="PRIMARY", reason=f"service_error: {e}")
    # Continue to Step 3

Step 3: Attempt Secondary Approach

  • Log degradation to secondary level
  • Execute acceptable fallback solution
  • Set shorter timeout (typically 1/3 of primary)
  • Monitor execution progress
  • If completes successfully: DONE (acceptable outcome)
  • If fails or times out: Continue to Step 4
  • Log attempt and reason for failure
# Pseudocode for secondary attempt
log_degradation(from_level="PRIMARY", to_level="SECONDARY")
try:
    result = execute_secondary_approach(timeout=SECONDARY_TIMEOUT)
    log_success(level="SECONDARY", result=result, degraded=True)
    return result  # DONE - acceptable outcome
except TimeoutError:
    log_failure(level="SECONDARY", reason="timeout")
    # Continue to Step 4

Step 4: Attempt Tertiary Approach

  • Log degradation to tertiary level
  • Execute guaranteed completion approach
  • Set minimal timeout (typically 1s)
  • MUST succeed - no failures allowed
  • Return minimal but functional result
  • Log success (degraded but functional)
  • DONE (guaranteed completion)
# Pseudocode for tertiary attempt
log_degradation(from_level="SECONDARY", to_level="TERTIARY")
try:
    result = execute_tertiary_approach(timeout=TERTIARY_TIMEOUT)
    log_success(level="TERTIARY", result=result, heavily_degraded=True)
    return result  # DONE - minimal but functional
except Exception as e:
    # THIS SHOULD NEVER HAPPEN
    log_critical_failure("TERTIARY approach failed - this is a bug!")
    raise SystemError("Cascade safety violation: tertiary failed")

Step 5: Report Degradation

  • Determine notification level from configuration
  • Silent: Log only, no user message
  • Warning: Brief notification to user
  • Explicit: Detailed degradation explanation
  • Document which level succeeded
  • Explain impact of degradation
  • Log cascade path taken for analysis

Degradation Reporting Templates:

Silent Mode:

[LOG] CASCADE: PRIMARY timeout (30s) → SECONDARY success (6s)
Result: standard_analysis (degraded from comprehensive)

Warning Mode:

⚠️  Using cached data (less than 1 hour old)
Current real-time data unavailable.

Explicit Mode:

ℹ️  Analysis Quality Notice

We attempted to provide comprehensive code analysis using GPT-4,
but encountered slow response times (>30s timeout).

Fallback Applied:
- Used: GPT-3.5 standard analysis (completed in 6s)
- Quality: Standard (vs. Comprehensive)
- Impact: Advanced semantic insights not included

What You're Getting:
✓ Basic pattern detection
✓ Standard recommendations
✓ Code quality assessment

What's Missing:
✗ Complex architectural insights
✗ Deep semantic analysis
✗ Advanced refactoring suggestions

Step 6: Log Cascade Metrics

  • Record cascade path taken
  • Document level reached (primary/secondary/tertiary)
  • Log timing for each level attempted
  • Track degradation frequency
  • Identify patterns in failures
  • Update cascade strategy if needed

Metrics to Track:

  • Success rate by level
  • Average response times
  • Degradation frequency
  • User impact assessment

Step 7: Continuous Optimization

  • Use analyzer agent to review cascade metrics
  • Identify optimization opportunities
  • Adjust timeouts based on success rates
  • Improve secondary approaches if frequently used
  • Update tertiary if inadequate
  • Document learnings in .claude/context/DISCOVERIES.md

Optimization Criteria:

  • If PRIMARY succeeds < 50%: Timeout too aggressive → Increase timeout
  • If SECONDARY used > 40%: Secondary is really the "normal" case → Swap primary and secondary
  • If TERTIARY used > 10%: Secondary not reliable enough → Improve secondary

Trade-Offs

Benefit: System always completes, never fully fails Cost: Users may receive degraded responses Best For: User-facing features where responsiveness matters

Examples

Example 1: Weather API Integration

Configuration:

  • Strategy: Balanced (30s / 10s / 5s)
  • Type: Service fallback
  • Notification: Warning

Implementation:

async def get_weather(location: str) -> WeatherData:
    """Get weather data with cascade fallback"""

    # PRIMARY: Live weather API
    try:
        return await fetch_weather_api(location, timeout=30)
    except (TimeoutError, APIError):
        log.warning("PRIMARY weather API failed, trying cache")

    # SECONDARY: Cached weather data
    try:
        cached = await get_cached_weather(location, max_age=3600)
        if cached:
            notify_user("Using weather data from cache (< 1 hour old)")
            return cached
    except CacheError:
        log.warning("SECONDARY cache failed, using defaults")

    # TERTIARY: Default weather data
    return get_default_weather(location)  # Never fails

Outcome: System always returns weather data, quality degrades gracefully

Example 2: Code Review with AI

Configuration:

  • Strategy: Patient (120s / 30s / 10s)
  • Type: Quality fallback
  • Notification: Explicit

Cascade Path:

  1. PRIMARY: GPT-4 comprehensive review - TIMEOUT after 120s
  2. SECONDARY: GPT-3.5 standard review - SUCCESS in 18s
  3. TERTIARY: Not attempted

Example 3: Search Results Ranking

Configuration:

  • Strategy: Aggressive (5s / 2s / 1s)
  • Type: Accuracy fallback
  • Notification: Silent

Implementation:

def search_and_rank(query: str) -> List[Result]:
    """Search with ML ranking, fallback to simple ranking"""

    results = fetch_results(query)

    # PRIMARY: ML-based ranking (sophisticated)
    try:
        return ml_rank(results, timeout=5)
    except TimeoutError:
        pass  # Silent fallback

    # SECONDARY: Heuristic ranking (good enough)
    try:
        return heuristic_rank(results, timeout=2)
    except TimeoutError:
        pass

    # TERTIARY: Simple text match ranking (basic)
    return simple_rank(results)  # Always fast

Philosophy Alignment

This workflow enforces:

  • Resilience: System always completes, never completely fails
  • User Experience: Better degraded service than error message
  • Transparency: Users understand what they're getting (if explicit mode)
  • Progressive Enhancement: Optimal by default, degrade when necessary
  • Measurable Quality: Clear definition of what degrades at each level
  • Continuous Improvement: Metrics drive timeout optimization
  • Guaranteed Completion: Tertiary level must never fail

Key Principle

Better to deliver degraded service than no service