| name | langfuse-prompt-and-trace-debugger |
| description | MANDATORY skill when KeyError or schema errors occur. Fetch actual prompt schemas instead of guessing. Use for debugging traces and understanding AI model behavior. |
Langfuse Prompt & Trace Debugger - MANDATORY FOR KEYERRORS
🔥 CRITICAL: Use This Skill Immediately When:
MANDATORY triggers (you MUST use this skill):
- ❗ Tests fail with
KeyError(e.g.,KeyError: 'therapist_response') - ❗ Schema validation errors
- ❗ Unexpected prompt response structure
- ❗ Need to understand what fields a prompt actually returns
Common triggers:
- User asks to view a specific prompt
- Code references a prompt but logic is unclear
- Investigating why AI behavior doesn't match expectations
- Debugging Langfuse traces
- Analyzing model output in production
🚀 PROACTIVE TRIGGERS (non-obvious scenarios where you should automatically use this skill):
Production Debugging Queries:
- "Why didn't user X get a message?"
- "Why did this intervention not fire?"
- "What errors happened in production [timeframe]?"
- "Debug this trace ID from Slack alert"
- User reports: "AI didn't respond" or "got wrong response"
- → Use
fetch_error_traces.pyto find error traces automatically
Performance & Cost Investigation:
- "Why are OpenAI costs high this week?"
- "Which prompts are slowest?"
- "Show me what happened during [time window]"
- Job timeout errors in CloudWatch/logs
- → Use
fetch_traces_by_time.pyto analyze patterns
Response Validation Issues:
- Logs show
_validation_errormetadata - "LLM returned unexpected structure"
- Pydantic validation errors on AI responses
- → Use
fetch_trace.pywith trace ID to see actual vs expected
- Logs show
Intervention Logic Questions:
- "How does X intervention condition work?"
- "What fields does cronjobs_yaml expect?"
- "Show me actual intervention logic from production"
- → Use
refresh_prompt_cache.pyto fetch YAML configs as prompts
Error Pattern Analysis:
- "Are users hitting a specific error frequently?"
- "Find all traces with 'timeout' errors"
- "Search for traces containing [error message]"
- → Use
search_trace_errors.pyto grep across traces
❌ ANTI-PATTERNS (violations of this skill):
- Saying "I need to check production" without actually fetching traces
- Debugging "why didn't user get X" by reading code instead of checking actual traces
- Investigating costs/performance by guessing instead of analyzing real trace data
- Answering "how does X work?" about prompts without fetching the actual prompt
🚨 VIOLATION: Guessing at Schemas
WRONG: "The prompt probably returns {field_name}, let me add that to the code" RIGHT: Uses this skill to fetch actual prompt, reads actual schema
DO NOT:
- ❌ Assume field names without checking
- ❌ Guess at optional vs required fields
- ❌ Try multiple field names hoping one works
- ❌ Look at old code and assume it's current
DO THIS:
- ✅ cd to
.claude/skills/langfuse-prompt-and-trace-debugger - ✅ Run
uv run python refresh_prompt_cache.py PROMPT_NAME - ✅ Read
docs/cached_prompts/PROMPT_NAME_production.txt - ✅ Read
docs/cached_prompts/PROMPT_NAME_production_config.json - ✅ Use the ACTUAL schema you just read
🏢 Understanding Langfuse Servers vs Labels
CRITICAL: We have TWO separate Langfuse servers:
Staging Langfuse Server (
https://langfuse.staging.cncorp.io)- Separate database/instance on staging ECS cluster
- Used for development and testing
- Has prompts tagged with "production" label - these are DEFAULT prompts for staging tests
- ⚠️ "production" label here does NOT mean real user-facing prompts
Production Langfuse Server (
https://langfuse.prod.cncorp.io)- Separate database/instance on production ECS cluster
- Used for real user-facing application
- Has prompts tagged with "production" label - these ARE the real prompts shown to users
- ✅ "production" label here means actual live prompts
Key Points:
- The two servers are completely independent - no automatic sync between them
- Both servers use the same label system (
production,development,staging, etc.) - A prompt with "production" label on staging server ≠ prompt with "production" label on prod server
- Labels control which prompt version is served within each server
- Server selection is controlled by
LANGFUSE_HOSTenvironment variable
Environment Setup
Required environment variables:
LANGFUSE_PUBLIC_KEY- Langfuse API public keyLANGFUSE_SECRET_KEY- Langfuse API secret keyLANGFUSE_HOST- Langfuse server URL- Staging:
https://langfuse.staging.cncorp.io - Production:
https://langfuse.prod.cncorp.io
- Staging:
Optional:
ENVIRONMENT- Label to fetch within the server (defaults to "production")- This is the LABEL/TAG within whichever server you're connected to
- NOT the same as which Langfuse server you're querying
Setup:
# Add to arsenal/.env:
# For STAGING Langfuse server (default for development):
LANGFUSE_PUBLIC_KEY_STAGING=pk-lf-... # pragma: allowlist-secret
LANGFUSE_SECRET_KEY_STAGING=sk-lf-... # pragma: allowlist-secret
LANGFUSE_HOST_STAGING=https://langfuse.staging.cncorp.io
# For PRODUCTION Langfuse server (real user-facing prompts):
LANGFUSE_PUBLIC_KEY_PROD=pk-lf-... # pragma: allowlist-secret
LANGFUSE_SECRET_KEY_PROD=sk-lf-... # pragma: allowlist-secret
LANGFUSE_HOST_PROD=https://langfuse.prod.cncorp.io
# Select which server to use:
LANGFUSE_ENVIRONMENT=staging # or 'production'
No manual environment loading needed! The scripts automatically find and load arsenal/.env from anywhere in the project.
Available Scripts
1. refresh_prompt_cache.py - Download Prompts Locally
Downloads Langfuse prompts to docs/cached_prompts/ for offline viewing.
IMPORTANT: Can fetch from BOTH staging and production servers
Usage:
# Navigate to the skill directory
cd .claude/skills/langfuse-prompt-and-trace-debugger
# Fetch from STAGING (default)
uv run python refresh_prompt_cache.py PROMPT_NAME
# Fetch from PRODUCTION (explicit flag)
uv run python refresh_prompt_cache.py PROMPT_NAME --production
# Fetch all prompts from staging
uv run python refresh_prompt_cache.py
# Fetch all prompts from production
uv run python refresh_prompt_cache.py --production
# Fetch multiple prompts from production
uv run python refresh_prompt_cache.py prompt1 prompt2 prompt3 --production
Environment Selection:
- Default: Fetches from STAGING server (safe)
- With
--productionflag: Fetches from PRODUCTION server - Clearly indicates which server is being used in output
Cached Location:
docs/cached_prompts/{prompt_name}_production.txt- Prompt content + versiondocs/cached_prompts/{prompt_name}_production_config.json- Configuration
2. check_prompts.py - List Available Prompts
Lists all prompts available in Langfuse and checks their availability in the current environment.
Usage:
# Navigate to the skill directory
cd .claude/skills/langfuse-prompt-and-trace-debugger
# Check all prompts
uv run python check_prompts.py
Output:
- Lists all prompt names in Langfuse
- Shows which prompts are available in the specified environment (from
ENVIRONMENTvariable) - Color-coded indicators (✓ green for available, ✗ red for missing)
- Summary statistics
3. fetch_trace.py - View Langfuse Traces
Fetch and display Langfuse traces for debugging AI model behavior.
Usage:
# Navigate to the skill directory
cd .claude/skills/langfuse-prompt-and-trace-debugger
# Fetch specific trace by ID
uv run python fetch_trace.py db29520b-9acb-4af9-a7a0-1aa005eb7b24
# Fetch trace from Langfuse URL
uv run python fetch_trace.py "https://langfuse.example.com/project/.../traces?peek=db29520b..."
# List recent traces
uv run python fetch_trace.py --list --limit 5
# View help
uv run python fetch_trace.py --help
What it shows:
- Trace ID and metadata
- All observations (LLM calls, tool uses, etc.)
- Input/output for each step
- Timing information
- Hierarchical display of nested observations
- Useful for debugging AI workflows
4. fetch_error_traces.py - Find Traces with Errors
Fetch traces that contain ERROR-level observations from a specified time range. Useful for investigating production issues and error patterns.
Usage:
# Navigate to the skill directory
cd .claude/skills/langfuse-prompt-and-trace-debugger
# Fetch error traces from last 24 hours (default)
uv run python fetch_error_traces.py
# Fetch error traces from last 48 hours
uv run python fetch_error_traces.py --hours 48
# Fetch error traces from last 7 days
uv run python fetch_error_traces.py --days 7
# Limit results to 5 traces
uv run python fetch_error_traces.py --limit 5
# Query production server for errors
uv run python fetch_error_traces.py --env production
# View help
uv run python fetch_error_traces.py --help
What it shows:
- Traces that contain observations with ERROR level
- Trace metadata (ID, name, timestamp, user)
- Error messages from failed observations
- Direct links to view traces in Langfuse UI
- Time-filtered results (last N hours/days)
Common use cases:
- Monitor production errors from the last day
- Investigate error patterns across multiple traces
- Find traces related to specific failure modes
- Debug issues reported by users
5. reconstruct_compiled_prompt.py - Prompt Reconstruction
Reconstructs full prompts from database + Langfuse template. Uses the same compilation logic as the admin panel prompt playground.
⚠️ Lives in CODEBASE at api/src/cli/ - uses shared compile_prompt() service.
Requires:
--message-idor-m: Message ID from the database--prompt-nameor-p: Langfuse prompt name (e.g.,daily_question_summary)
# Basic usage (from project root) - defaults to production database
cd api && PYTHONPATH=src python src/cli/reconstruct_compiled_prompt.py -m 91245 -p daily_question_summary
# A/B test with specific prompt version
cd api && PYTHONPATH=src python src/cli/reconstruct_compiled_prompt.py -m 91245 -p daily_question_summary --version 5
# Save to file
cd api && PYTHONPATH=src python src/cli/reconstruct_compiled_prompt.py -m 91245 -p daily_question_summary -o docs/debug/full_prompt.md
# Output as JSON
cd api && PYTHONPATH=src python src/cli/reconstruct_compiled_prompt.py -m 91245 -p daily_question_summary --json
# Use local docker database (for development)
cd api && PYTHONPATH=src python src/cli/reconstruct_compiled_prompt.py --local -m 12345 -p group_msg_intervention_needed
# List available prompts in Langfuse
cd api && PYTHONPATH=src python src/cli/reconstruct_compiled_prompt.py --list-prompts
When to use:
| Use Case | Command |
|---|---|
| Prompt reconstruction from message | reconstruct_compiled_prompt.py -m MESSAGE_ID -p PROMPT_NAME |
| A/B testing prompt versions | reconstruct_compiled_prompt.py -m MESSAGE_ID -p PROMPT_NAME --version N |
| List available prompts | reconstruct_compiled_prompt.py --list-prompts |
Understanding Prompt Configs
Prompt Text File
- Instructions: What AI should do
- Output format: JSON schema, required fields
- Variables:
{{sender_name}},{{current_message}}, etc. - Allowed values: Enumerated options for fields
- Version: Header shows version
Config JSON File
{
"model_config": {
"model": "gpt-4.1",
"temperature": 0.7,
"response_format": {
"type": "json_schema", // or "json_object"
"json_schema": { ... }
}
}
}
response_format types:
json_object- Unstructured (model decides fields)json_schema- Strict validation (fields enforced)
Debugging Workflows
KeyError in Tests
- Fetch the prompt using
refresh_prompt_cache.py - Check if field is optional/conditional in prompt text
- Check config:
json_objectvsjson_schema - Fix test to handle optional field OR update prompt
Schema Validation Fails
- Fetch the prompt using
refresh_prompt_cache.py - Read config's
json_schemasection - Check
requiredarray - Verify code provides all required parameters
Understanding AI Behavior
- Get trace ID from logs or Langfuse UI
- Use
fetch_trace.pyto view full trace - Examine inputs, outputs, and intermediate steps
- Check for unexpected model responses
Investigating Production Errors
- Use
fetch_error_traces.pyto find recent error traces - Review error messages and trace metadata
- Use
fetch_trace.pywith specific trace ID for detailed analysis - Identify patterns across multiple error traces
- Check for common error causes (API failures, schema issues, etc.)
Quick Reference
# Setup (one-time)
# Add LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY, LANGFUSE_HOST to arsenal/.env
# Make sure to add # pragma: allowlist-secret comments after secrets
# Navigate to skill directory
cd .claude/skills/langfuse-prompt-and-trace-debugger
# List all available prompts
uv run python check_prompts.py
# Fetch specific prompt
uv run python refresh_prompt_cache.py PROMPT_NAME
# View cached prompt
cat ../../docs/cached_prompts/PROMPT_NAME_production.txt
cat ../../docs/cached_prompts/PROMPT_NAME_production_config.json
# List recent traces
uv run python fetch_trace.py --list --limit 5
# Fetch specific trace
uv run python fetch_trace.py TRACE_ID
# Find error traces from last 24 hours
uv run python fetch_error_traces.py
# Find error traces from last 7 days
uv run python fetch_error_traces.py --days 7
# Find error traces in production
uv run python fetch_error_traces.py --env production
# Reconstruct prompt from message (requires message ID and prompt name)
# Run from project root - uses database + Langfuse template
cd /home/odio/Hacking/codel/ct3/api && PYTHONPATH=src python src/cli/reconstruct_compiled_prompt.py -m MESSAGE_ID -p PROMPT_NAME
# Or via docker compose
docker compose exec api python src/cli/reconstruct_compiled_prompt.py -m MESSAGE_ID -p PROMPT_NAME
# List available prompts in Langfuse
docker compose exec api python src/cli/reconstruct_compiled_prompt.py --list-prompts
Important Notes
READ-ONLY Operations:
- These scripts are for viewing and debugging only
- DO NOT use to modify or delete prompts in Langfuse
- DO NOT push changes to Langfuse
- Always verify you're looking at the correct environment
Portability:
- Scripts are fully standalone with their own virtual environment via UV
- Automatically find and load
arsenal/.envfrom anywhere in the project - No manual environment loading needed
- Dependencies (langfuse==2.60.3, httpx==0.27.2) are pinned for compatibility
- Work from any directory - the scripts locate project root automatically
🔐 Before Responding With Fetched Data
You fetched prompts or traces. Now before you respond to the user:
- ✅ Did you actually fetch the data (not guess at schema)?
- ✅ Are you showing real field names from the cached prompt?
- ✅ Did you fetch from the correct server (staging vs production)?
CRITICAL: Get manager approval before responding.
Your response with Langfuse data MUST include: ✅ approve_7f3d8a2e********
Get the full approval token from .claude/skills/manager-review/SKILL.md
Common Mistake #5 in the manager's table is "Assumed Langfuse schema" - describing fields without fetching the actual prompt. The manager WILL catch this.