| name | cascade-workflow |
| version | 1.0.0 |
| description | Graceful degradation through cascading fallback strategies - ensures system always completes while maintaining acceptable functionality |
| auto_activates | external API, external service, timeout handling, high availability, fallback strategy, resilient operation |
| explicit_triggers | /amplihack:cascade |
| confirmation_required | false |
| token_budget | 3500 |
Cascade Workflow with Graceful Degradation Skill
Purpose
Implement graceful degradation through cascading fallback strategies. When optimal approaches fail or timeout, the system automatically falls back to simpler, more reliable alternatives while maintaining acceptable functionality.
When to Use This Skill
USE FOR:
- External service dependencies (APIs, databases)
- Time-sensitive operations with acceptable degraded modes
- Operations where partial results are better than no results
- High-availability requirements (system must always respond)
- Scenarios where waiting for perfect solution is worse than good-enough solution
AVOID FOR:
- Operations requiring exact correctness (no acceptable degradation)
- Security-critical operations (authentication, authorization)
- Financial transactions (no room for "approximate")
- When failures must surface to user (diagnostic operations)
- Simple operations with no meaningful fallback
Configuration
Core Parameters
Timeout Strategy:
aggressive- Fast failures, quick degradation (5s / 2s / 1s)balanced- Reasonable attempts (30s / 10s / 5s) - DEFAULTpatient- Thorough attempts before fallback (120s / 30s / 10s)custom- Define your own timeouts
Fallback Types:
service- External API → Cached data → Static defaultsquality- Comprehensive → Standard → Minimal analysisfreshness- Real-time → Recent → Historical datacompleteness- Full dataset → Sample → Summaryaccuracy- Precise → Approximate → Estimate
Degradation Notification:
silent- Log only, no user notificationwarning- Inform user of degradationexplicit- Detailed explanation of what degraded and why
Cascade Level Requirements
PRIMARY (Optimal):
- Best possible outcome
- May depend on external services
- May be slow or resource-intensive
- Can fail or timeout
SECONDARY (Acceptable):
- Reduced quality but functional
- More reliable than primary
- Faster or fewer dependencies
- Acceptable for users
TERTIARY (Guaranteed):
- Always succeeds, never fails
- No external dependencies
- Fast and reliable
- Minimal but functional
- CRITICAL: Must be designed to never fail
Execution Process
Step 1: Define Cascade Levels
- Use architect agent to identify cascade levels
- Define PRIMARY approach (optimal solution)
- Define SECONDARY approach (acceptable degradation)
- Define TERTIARY approach (guaranteed completion)
- Set timeout for each level
- Document what degrades at each level
- CRITICAL: Ensure tertiary ALWAYS succeeds
Example Cascade Definitions:
Code Analysis with AI:
- PRIMARY: GPT-4 comprehensive analysis (timeout: 30s)
- SECONDARY: GPT-3.5 standard analysis (timeout: 10s)
- TERTIARY: Static analysis with regex (timeout: 5s)
External API Data Fetch:
- PRIMARY: Live API call (timeout: 10s)
- SECONDARY: Cached data (timeout: 2s)
- TERTIARY: Default values (timeout: 0s)
Test Execution:
- PRIMARY: Full test suite (timeout: 120s)
- SECONDARY: Critical tests only (timeout: 30s)
- TERTIARY: Smoke tests (timeout: 10s)
Step 2: Attempt Primary Approach
- Execute optimal solution
- Set timeout based on strategy configuration
- Monitor execution progress
- If completes successfully: DONE (best outcome)
- If fails or times out: Continue to Step 3
- Log attempt and reason for failure
# Pseudocode for primary attempt
try:
result = execute_primary_approach(timeout=PRIMARY_TIMEOUT)
log_success(level="PRIMARY", result=result)
return result # DONE - best outcome achieved
except TimeoutError:
log_failure(level="PRIMARY", reason="timeout")
# Continue to Step 3
except ExternalServiceError as e:
log_failure(level="PRIMARY", reason=f"service_error: {e}")
# Continue to Step 3
Step 3: Attempt Secondary Approach
- Log degradation to secondary level
- Execute acceptable fallback solution
- Set shorter timeout (typically 1/3 of primary)
- Monitor execution progress
- If completes successfully: DONE (acceptable outcome)
- If fails or times out: Continue to Step 4
- Log attempt and reason for failure
# Pseudocode for secondary attempt
log_degradation(from_level="PRIMARY", to_level="SECONDARY")
try:
result = execute_secondary_approach(timeout=SECONDARY_TIMEOUT)
log_success(level="SECONDARY", result=result, degraded=True)
return result # DONE - acceptable outcome
except TimeoutError:
log_failure(level="SECONDARY", reason="timeout")
# Continue to Step 4
Step 4: Attempt Tertiary Approach
- Log degradation to tertiary level
- Execute guaranteed completion approach
- Set minimal timeout (typically 1s)
- MUST succeed - no failures allowed
- Return minimal but functional result
- Log success (degraded but functional)
- DONE (guaranteed completion)
# Pseudocode for tertiary attempt
log_degradation(from_level="SECONDARY", to_level="TERTIARY")
try:
result = execute_tertiary_approach(timeout=TERTIARY_TIMEOUT)
log_success(level="TERTIARY", result=result, heavily_degraded=True)
return result # DONE - minimal but functional
except Exception as e:
# THIS SHOULD NEVER HAPPEN
log_critical_failure("TERTIARY approach failed - this is a bug!")
raise SystemError("Cascade safety violation: tertiary failed")
Step 5: Report Degradation
- Determine notification level from configuration
- Silent: Log only, no user message
- Warning: Brief notification to user
- Explicit: Detailed degradation explanation
- Document which level succeeded
- Explain impact of degradation
- Log cascade path taken for analysis
Degradation Reporting Templates:
Silent Mode:
[LOG] CASCADE: PRIMARY timeout (30s) → SECONDARY success (6s)
Result: standard_analysis (degraded from comprehensive)
Warning Mode:
⚠️ Using cached data (less than 1 hour old)
Current real-time data unavailable.
Explicit Mode:
ℹ️ Analysis Quality Notice
We attempted to provide comprehensive code analysis using GPT-4,
but encountered slow response times (>30s timeout).
Fallback Applied:
- Used: GPT-3.5 standard analysis (completed in 6s)
- Quality: Standard (vs. Comprehensive)
- Impact: Advanced semantic insights not included
What You're Getting:
✓ Basic pattern detection
✓ Standard recommendations
✓ Code quality assessment
What's Missing:
✗ Complex architectural insights
✗ Deep semantic analysis
✗ Advanced refactoring suggestions
Step 6: Log Cascade Metrics
- Record cascade path taken
- Document level reached (primary/secondary/tertiary)
- Log timing for each level attempted
- Track degradation frequency
- Identify patterns in failures
- Update cascade strategy if needed
Metrics to Track:
- Success rate by level
- Average response times
- Degradation frequency
- User impact assessment
Step 7: Continuous Optimization
- Use analyzer agent to review cascade metrics
- Identify optimization opportunities
- Adjust timeouts based on success rates
- Improve secondary approaches if frequently used
- Update tertiary if inadequate
- Document learnings in
.claude/context/DISCOVERIES.md
Optimization Criteria:
- If PRIMARY succeeds < 50%: Timeout too aggressive → Increase timeout
- If SECONDARY used > 40%: Secondary is really the "normal" case → Swap primary and secondary
- If TERTIARY used > 10%: Secondary not reliable enough → Improve secondary
Trade-Offs
Benefit: System always completes, never fully fails Cost: Users may receive degraded responses Best For: User-facing features where responsiveness matters
Examples
Example 1: Weather API Integration
Configuration:
- Strategy: Balanced (30s / 10s / 5s)
- Type: Service fallback
- Notification: Warning
Implementation:
async def get_weather(location: str) -> WeatherData:
"""Get weather data with cascade fallback"""
# PRIMARY: Live weather API
try:
return await fetch_weather_api(location, timeout=30)
except (TimeoutError, APIError):
log.warning("PRIMARY weather API failed, trying cache")
# SECONDARY: Cached weather data
try:
cached = await get_cached_weather(location, max_age=3600)
if cached:
notify_user("Using weather data from cache (< 1 hour old)")
return cached
except CacheError:
log.warning("SECONDARY cache failed, using defaults")
# TERTIARY: Default weather data
return get_default_weather(location) # Never fails
Outcome: System always returns weather data, quality degrades gracefully
Example 2: Code Review with AI
Configuration:
- Strategy: Patient (120s / 30s / 10s)
- Type: Quality fallback
- Notification: Explicit
Cascade Path:
- PRIMARY: GPT-4 comprehensive review - TIMEOUT after 120s
- SECONDARY: GPT-3.5 standard review - SUCCESS in 18s
- TERTIARY: Not attempted
Example 3: Search Results Ranking
Configuration:
- Strategy: Aggressive (5s / 2s / 1s)
- Type: Accuracy fallback
- Notification: Silent
Implementation:
def search_and_rank(query: str) -> List[Result]:
"""Search with ML ranking, fallback to simple ranking"""
results = fetch_results(query)
# PRIMARY: ML-based ranking (sophisticated)
try:
return ml_rank(results, timeout=5)
except TimeoutError:
pass # Silent fallback
# SECONDARY: Heuristic ranking (good enough)
try:
return heuristic_rank(results, timeout=2)
except TimeoutError:
pass
# TERTIARY: Simple text match ranking (basic)
return simple_rank(results) # Always fast
Philosophy Alignment
This workflow enforces:
- Resilience: System always completes, never completely fails
- User Experience: Better degraded service than error message
- Transparency: Users understand what they're getting (if explicit mode)
- Progressive Enhancement: Optimal by default, degrade when necessary
- Measurable Quality: Clear definition of what degrades at each level
- Continuous Improvement: Metrics drive timeout optimization
- Guaranteed Completion: Tertiary level must never fail
Key Principle
Better to deliver degraded service than no service