name	reflect-appworld-failure
description	Analyze AppWorld task failures to extract specific API patterns and generate actionable playbook bullets with concrete code examples
allowed-tools	Read

Reflect on AppWorld Failure

Analyze failed AppWorld tasks to extract specific, actionable learnings that can be added to the playbook.

Purpose

When an AppWorld task fails, the Reflector calls this Skill with error details and failed code. You analyze the failure semantically and generate a high-quality bullet with:

Specific title describing the pattern
Detailed content with working code examples
Relevant tags for retrieval
Appropriate confidence level

Input Format

The input will be a text description with sections:

# Task
<task instruction>

## Apps
<comma-separated list of apps used>

## Error Type
<error_type: api_misuse, logic_error, timeout, etc.>

## Error Messages
<list of error messages from execution>

## Failed Code Snippet
<relevant code that failed>

## Missing Patterns (from heuristics)
<list of patterns the old system identified>

## Suggested Fixes (from heuristics)
<list of fix suggestions>

Your Analysis Process

Identify Root Cause: What was the fundamental mistake?
- Wrong API method name?
- Missing authentication?
- Incorrect data structure access?
- Logic error?
Extract Pattern: What general pattern does this represent?
- Is this specific to one app or applies to multiple?
- Is this about API order (login first)?
- Is this about method naming conventions?
- Is this about data validation?
Generate Concrete Example: Create working code that demonstrates the CORRECT pattern
Write Actionable Bullet: Make it specific enough that the Generator can apply it

Output Format

Return a JSON object with this structure:

{
  "bullet": {
    "id": "bullet-YYYY-MM-DD-HHMMSS",
    "title": "<Specific pattern title>",
    "content": "<Detailed explanation with working code example>",
    "tags": ["app.<app_name>", "<error_category>", "<pattern_type>"],
    "evidence": [
      {
        "type": "execution",
        "ref": "<task_id>",
        "note": "<brief note about failure>"
      }
    ],
    "confidence": "high|medium|low",
    "scope": "app|global"
  }
}

Bullet Quality Guidelines

GOOD Bullets (Specific and Actionable)

Title: "Spotify: Use show_playlist_songs() not get_tracks()" Content: "Spotify API uses show_playlist_songs(access_token, playlist_id) to retrieve tracks. The method get_tracks() does not exist. Example: songs = apis.spotify.show_playlist_songs(access_token=token, playlist_id=playlist['id'])" Tags: ["app.spotify", "api_misuse", "method_names", "playlists"]

Title: "Venmo: Call login() before search_transactions()" Content: "Venmo API requires authentication token for all operations. Always call venmo.login() first to get access_token, then pass it to other methods. Example: response = apis.venmo.login(username='user', password='pass'); token = response['access_token']; results = apis.venmo.search_transactions(access_token=token, query={'friend': 'Alice'})" Tags: ["app.venmo", "authentication", "api_order", "search"]

BAD Bullets (Too Generic)

Title: "Verify venmo API logic and requirements" Content: "When implementing venmo operations: Check task logic and requirements; Missing login() call for venmo" Tags: ["logic", "debugging", "api", "app.venmo"]

Why Bad: No concrete code example, vague guidance, doesn't teach the specific pattern

Example Analysis

Input:

# Task
What is the title of the most-liked song in my Spotify playlists

## Apps
spotify

## Error Type
api_misuse

## Error Messages
AttributeError: 'Spotify' object has no attribute 'get_tracks'

## Failed Code Snippet
songs = spotify.get_tracks(playlist_id=pid)

## Missing Patterns
- Use correct Spotify API methods

## Suggested Fixes
- Check Spotify API documentation for available methods

Your Analysis:

Root Cause: Code used non-existent method get_tracks() instead of correct show_playlist_songs()
Pattern: Spotify uses show_* naming convention for retrieval methods
Scope: App-specific (Spotify)

Output:

{
  "bullet": {
    "id": "bullet-2025-10-27-123456",
    "title": "Spotify: Use show_playlist_songs() to get tracks from playlist",
    "content": "To retrieve songs from a Spotify playlist, use show_playlist_songs(access_token, playlist_id). Don't use get_tracks() - it doesn't exist. Example: `token = apis.spotify.login()['access_token']; playlists = apis.spotify.show_playlist_library(access_token=token); songs = apis.spotify.show_playlist_songs(access_token=token, playlist_id=playlists[0]['id']); most_liked = max(songs, key=lambda s: s['likes'])`",
    "tags": ["app.spotify", "api_misuse", "method_names", "playlists", "retrieval"],
    "evidence": [
      {
        "type": "execution",
        "ref": "spotify_task_001",
        "note": "AttributeError: 'Spotify' object has no attribute 'get_tracks'"
      }
    ],
    "confidence": "high",
    "scope": "app"
  }
}

Common AppWorld Patterns to Look For

Authentication Order

Most apps require login() first to get access_token
Token must be passed to subsequent API calls

Method Naming Conventions

Spotify: show_* for retrieval (show_playlist_songs, show_album_library)
Venmo: show_friends, send_payment, search_transactions
Gmail: fetch_emails, send_email
Contacts: show_contacts, add_contact
Calendar: show_events, create_event

Data Structure Access

API responses may have nested structures
Always check if keys exist before accessing
Use .get() with defaults for safety

Aggregation Patterns

To find "most-liked song in playlists": Get all playlists → Get songs from each → Find max by likes
To find "most expensive transaction": Get all transactions → Find max by amount

Task Completion

ALWAYS call apis.supervisor.complete_task() at the end
This signals successful completion to test framework

Important Rules

Be Specific: Include actual method names, parameter names, and code examples
Be Actionable: The Generator should know exactly what to do after reading your bullet
Include Working Code: Show a complete example that demonstrates the correct pattern
Tag Appropriately: Use app.<app_name> for app-specific bullets, plus semantic tags
Set Confidence: "high" for clear patterns, "medium" for uncertain, "low" for speculative
Return ONLY JSON: No explanations, no markdown formatting outside the JSON

Response Format

Return the JSON object as plain text. Make sure it's valid JSON that can be parsed directly.

reflect-appworld-failure

Install Skill

SKILL.md