Claude Code Plugins

Community-maintained marketplace

Feedback

Analyze AppWorld task failures to extract specific API patterns and generate actionable playbook bullets with concrete code examples

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name reflect-appworld-failure
description Analyze AppWorld task failures to extract specific API patterns and generate actionable playbook bullets with concrete code examples
allowed-tools Read

Reflect on AppWorld Failure

Analyze failed AppWorld tasks to extract specific, actionable learnings that can be added to the playbook.

Purpose

When an AppWorld task fails, the Reflector calls this Skill with error details and failed code. You analyze the failure semantically and generate a high-quality bullet with:

  1. Specific title describing the pattern
  2. Detailed content with working code examples
  3. Relevant tags for retrieval
  4. Appropriate confidence level

Input Format

The input will be a text description with sections:

# Task
<task instruction>

## Apps
<comma-separated list of apps used>

## Error Type
<error_type: api_misuse, logic_error, timeout, etc.>

## Error Messages
<list of error messages from execution>

## Failed Code Snippet
<relevant code that failed>

## Missing Patterns (from heuristics)
<list of patterns the old system identified>

## Suggested Fixes (from heuristics)
<list of fix suggestions>

Your Analysis Process

  1. Identify Root Cause: What was the fundamental mistake?

    • Wrong API method name?
    • Missing authentication?
    • Incorrect data structure access?
    • Logic error?
  2. Extract Pattern: What general pattern does this represent?

    • Is this specific to one app or applies to multiple?
    • Is this about API order (login first)?
    • Is this about method naming conventions?
    • Is this about data validation?
  3. Generate Concrete Example: Create working code that demonstrates the CORRECT pattern

  4. Write Actionable Bullet: Make it specific enough that the Generator can apply it

Output Format

Return a JSON object with this structure:

{
  "bullet": {
    "id": "bullet-YYYY-MM-DD-HHMMSS",
    "title": "<Specific pattern title>",
    "content": "<Detailed explanation with working code example>",
    "tags": ["app.<app_name>", "<error_category>", "<pattern_type>"],
    "evidence": [
      {
        "type": "execution",
        "ref": "<task_id>",
        "note": "<brief note about failure>"
      }
    ],
    "confidence": "high|medium|low",
    "scope": "app|global"
  }
}

Bullet Quality Guidelines

GOOD Bullets (Specific and Actionable)

Title: "Spotify: Use show_playlist_songs() not get_tracks()" Content: "Spotify API uses show_playlist_songs(access_token, playlist_id) to retrieve tracks. The method get_tracks() does not exist. Example: songs = apis.spotify.show_playlist_songs(access_token=token, playlist_id=playlist['id'])" Tags: ["app.spotify", "api_misuse", "method_names", "playlists"]

Title: "Venmo: Call login() before search_transactions()" Content: "Venmo API requires authentication token for all operations. Always call venmo.login() first to get access_token, then pass it to other methods. Example: response = apis.venmo.login(username='user', password='pass'); token = response['access_token']; results = apis.venmo.search_transactions(access_token=token, query={'friend': 'Alice'})" Tags: ["app.venmo", "authentication", "api_order", "search"]

BAD Bullets (Too Generic)

Title: "Verify venmo API logic and requirements" Content: "When implementing venmo operations: Check task logic and requirements; Missing login() call for venmo" Tags: ["logic", "debugging", "api", "app.venmo"]

Why Bad: No concrete code example, vague guidance, doesn't teach the specific pattern

Example Analysis

Input:

# Task
What is the title of the most-liked song in my Spotify playlists

## Apps
spotify

## Error Type
api_misuse

## Error Messages
AttributeError: 'Spotify' object has no attribute 'get_tracks'

## Failed Code Snippet
songs = spotify.get_tracks(playlist_id=pid)

## Missing Patterns
- Use correct Spotify API methods

## Suggested Fixes
- Check Spotify API documentation for available methods

Your Analysis:

  1. Root Cause: Code used non-existent method get_tracks() instead of correct show_playlist_songs()

  2. Pattern: Spotify uses show_* naming convention for retrieval methods

  3. Scope: App-specific (Spotify)

Output:

{
  "bullet": {
    "id": "bullet-2025-10-27-123456",
    "title": "Spotify: Use show_playlist_songs() to get tracks from playlist",
    "content": "To retrieve songs from a Spotify playlist, use show_playlist_songs(access_token, playlist_id). Don't use get_tracks() - it doesn't exist. Example: `token = apis.spotify.login()['access_token']; playlists = apis.spotify.show_playlist_library(access_token=token); songs = apis.spotify.show_playlist_songs(access_token=token, playlist_id=playlists[0]['id']); most_liked = max(songs, key=lambda s: s['likes'])`",
    "tags": ["app.spotify", "api_misuse", "method_names", "playlists", "retrieval"],
    "evidence": [
      {
        "type": "execution",
        "ref": "spotify_task_001",
        "note": "AttributeError: 'Spotify' object has no attribute 'get_tracks'"
      }
    ],
    "confidence": "high",
    "scope": "app"
  }
}

Common AppWorld Patterns to Look For

Authentication Order

  • Most apps require login() first to get access_token
  • Token must be passed to subsequent API calls

Method Naming Conventions

  • Spotify: show_* for retrieval (show_playlist_songs, show_album_library)
  • Venmo: show_friends, send_payment, search_transactions
  • Gmail: fetch_emails, send_email
  • Contacts: show_contacts, add_contact
  • Calendar: show_events, create_event

Data Structure Access

  • API responses may have nested structures
  • Always check if keys exist before accessing
  • Use .get() with defaults for safety

Aggregation Patterns

  • To find "most-liked song in playlists": Get all playlists → Get songs from each → Find max by likes
  • To find "most expensive transaction": Get all transactions → Find max by amount

Task Completion

  • ALWAYS call apis.supervisor.complete_task() at the end
  • This signals successful completion to test framework

Important Rules

  1. Be Specific: Include actual method names, parameter names, and code examples
  2. Be Actionable: The Generator should know exactly what to do after reading your bullet
  3. Include Working Code: Show a complete example that demonstrates the correct pattern
  4. Tag Appropriately: Use app.<app_name> for app-specific bullets, plus semantic tags
  5. Set Confidence: "high" for clear patterns, "medium" for uncertain, "low" for speculative
  6. Return ONLY JSON: No explanations, no markdown formatting outside the JSON

Response Format

Return the JSON object as plain text. Make sure it's valid JSON that can be parsed directly.