name	production-api-tester
description	Live testing and validation of production research API for strategy optimization loops
allowed-tools	*

Production API Tester

Skill for testing the production research API in live environments. Enables optimization loops: create strategy → test in production → analyze with Langfuse → iterate.

When to Use This Skill

"Test the daily_news_briefing strategy in production"
"Create a test task and execute it"
"Run the full optimization loop for my new legal research strategy"
"Check if the production API is healthy"
"Clean up test tasks after experimentation"
"Monitor execution status and retrieve results"

What This Skill Does

5 Core Functions:

Create Test Tasks - Subscribe test emails to research topics
Execute Research - Trigger batch or single task execution
Monitor Status - Poll for completion and retrieve results
Analyze Results - Link to Langfuse traces for performance analysis
Cleanup - Delete test tasks after validation

Required Setup

Environment Variables

# Production API Configuration
export PROD_API_URL="https://webresearchagent.replit.app"  # Production URL
export PROD_API_KEY="your_api_key_here"  # Your API secret key
export CALLBACK_URL="https://webhook.site/your-unique-url"  # Webhook receiver URL

# Optional: Override for local testing
export PROD_API_URL="http://localhost:8000"  # Test locally

Getting Your API Key:

User will provide this
Store in environment or pass via --api-key flag

Setting Up Webhook Receiver:

Option 1: Use webhook.site (https://webhook.site) - get instant unique URL
Option 2: Use Langdock webhook (if using Langdock integration)
Option 3: Run local webhook receiver (provided in helpers)

Workflow

Use Case 1: Test Single Strategy (Quick Test)

Goal: Execute a strategy once and check results

Step 1: Create Test Task

cd /home/user/web_research_agent/.claude/skills/production-api-tester/helpers

# Create test task
python3 create_test_task.py \
  --api-key "$PROD_API_KEY" \
  --email "test@example.com" \
  --topic "AI developments in healthcare" \
  --frequency "daily" \
  --output /tmp/api_test/task.json

Output:

{
  "id": "abc123",
  "email": "test@example.com",
  "research_topic": "AI developments in healthcare",
  "frequency": "daily",
  "schedule_time": "09:00",
  "is_active": true,
  "created_at": "2025-11-09T10:00:00Z",
  "last_run_at": null
}

Step 2: Execute Research

# Execute batch (all daily tasks)
python3 execute_batch.py \
  --api-key "$PROD_API_KEY" \
  --frequency "daily" \
  --callback-url "$CALLBACK_URL" \
  --output /tmp/api_test/execution.json

Output:

{
  "status": "running",
  "frequency": "daily",
  "tasks_found": 1,
  "started_at": "2025-11-09T10:01:00Z"
}

Step 3: Monitor Results

# Option A: Poll webhook endpoint
python3 check_webhook.py \
  --webhook-url "$CALLBACK_URL" \
  --wait-for-results \
  --timeout 300

# Option B: Check task status
python3 get_task.py \
  --api-key "$PROD_API_KEY" \
  --task-id "abc123" \
  --output /tmp/api_test/task_status.json

Step 4: Link to Langfuse

# Get Langfuse trace for this execution
python3 link_to_langfuse.py \
  --task-id "abc123" \
  --email "test@example.com" \
  --output /tmp/api_test/langfuse_link.json

Output:

{
  "task_id": "abc123",
  "trace_query": {
    "metadata_filter": {
      "user_email": "test@example.com",
      "research_topic": "AI developments in healthcare"
    },
    "time_range": "last_1_hour"
  },
  "langfuse_url": "https://cloud.langfuse.com/project/.../traces?filter=...",
  "trace_count": 1,
  "latest_trace_id": "xyz789"
}

Step 5: Cleanup

# Delete test task
python3 delete_task.py \
  --api-key "$PROD_API_KEY" \
  --task-id "abc123"

Use Case 2: Full Optimization Loop (Strategy Development)

Goal: Test new strategy → analyze performance → iterate

This combines strategy-builder skill with production-api-tester skill.

Loop Iteration:

cd /home/user/web_research_agent/.claude/skills/production-api-tester/helpers

# 1. Generate/update strategy (using strategy-builder skill)
python3 ../../strategy-builder/helpers/generate_strategy.py \
  --slug "legal/court_cases_de" \
  --category "legal" \
  --time-window "month" \
  --depth "comprehensive" \
  --output /tmp/optimization_loop/strategy_v1.yaml

# 2. Validate strategy
python3 ../../strategy-builder/helpers/validate_strategy.py \
  --strategy /tmp/optimization_loop/strategy_v1.yaml

# 3. Deploy strategy to database
# (Manual: save to strategies/, add to index.yaml, migrate to DB)

# 4. Create test task for this strategy
python3 create_test_task.py \
  --api-key "$PROD_API_KEY" \
  --email "legal-test@example.com" \
  --topic "Datenschutz DSGVO Verstoß" \
  --frequency "daily" \
  --output /tmp/optimization_loop/task.json

# 5. Execute research
TASK_ID=$(jq -r '.id' /tmp/optimization_loop/task.json)
python3 execute_batch.py \
  --api-key "$PROD_API_KEY" \
  --frequency "daily" \
  --callback-url "$CALLBACK_URL" \
  --output /tmp/optimization_loop/execution.json

# 6. Wait for completion and get results
python3 wait_for_completion.py \
  --task-id "$TASK_ID" \
  --api-key "$PROD_API_KEY" \
  --timeout 600 \
  --output /tmp/optimization_loop/results.json

# 7. Link to Langfuse trace
python3 link_to_langfuse.py \
  --task-id "$TASK_ID" \
  --email "legal-test@example.com" \
  --output /tmp/optimization_loop/langfuse.json

# 8. Retrieve and analyze Langfuse trace
TRACE_ID=$(jq -r '.latest_trace_id' /tmp/optimization_loop/langfuse.json)
python3 ../../langfuse-optimization/helpers/retrieve_single_trace.py \
  "$TRACE_ID" \
  --filter-essential \
  --output /tmp/optimization_loop/trace.json

# 9. Analyze performance
python3 ../../strategy-builder/helpers/analyze_strategy_performance.py \
  --traces /tmp/optimization_loop/trace.json \
  --strategy "legal/court_cases_de" \
  --output /tmp/optimization_loop/performance.json

# 10. Review recommendations and iterate
cat /tmp/optimization_loop/performance.json
# Apply fixes to strategy YAML, repeat from step 1

Use Case 3: Batch Testing (Multiple Strategies)

Goal: Test multiple strategies in parallel

Step 1: Create Multiple Test Tasks

cd /home/user/web_research_agent/.claude/skills/production-api-tester/helpers

# Create tasks for different strategies
python3 batch_create_tasks.py \
  --api-key "$PROD_API_KEY" \
  --tasks-file /tmp/batch_test/tasks_config.json \
  --output /tmp/batch_test/created_tasks.json

tasks_config.json:

[
  {
    "email": "test-news@example.com",
    "research_topic": "AI regulation updates",
    "frequency": "daily",
    "strategy_hint": "daily_news_briefing"
  },
  {
    "email": "test-financial@example.com",
    "research_topic": "Tesla stock analysis",
    "frequency": "daily",
    "strategy_hint": "financial_research"
  },
  {
    "email": "test-legal@example.com",
    "research_topic": "GDPR compliance updates",
    "frequency": "daily",
    "strategy_hint": "legal/court_cases_de"
  }
]

Step 2: Execute All

# Execute daily batch
python3 execute_batch.py \
  --api-key "$PROD_API_KEY" \
  --frequency "daily" \
  --callback-url "$CALLBACK_URL"

Step 3: Monitor All

# Wait for all tasks to complete
python3 monitor_batch.py \
  --api-key "$PROD_API_KEY" \
  --tasks-file /tmp/batch_test/created_tasks.json \
  --timeout 900 \
  --output /tmp/batch_test/batch_results.json

Step 4: Analyze All

# Generate comparison report
python3 compare_strategies.py \
  --results /tmp/batch_test/batch_results.json \
  --output /tmp/batch_test/comparison.json

Output: Comparison of latency, success rate, error types across strategies

Step 5: Cleanup All

# Delete all test tasks
python3 batch_delete_tasks.py \
  --api-key "$PROD_API_KEY" \
  --tasks-file /tmp/batch_test/created_tasks.json

Use Case 4: Health Check & Monitoring

Goal: Verify production API is healthy

cd /home/user/web_research_agent/.claude/skills/production-api-tester/helpers

# Quick health check
python3 health_check.py \
  --api-url "$PROD_API_URL"

# Extended monitoring
python3 health_check.py \
  --api-url "$PROD_API_URL" \
  --continuous \
  --interval 60 \
  --duration 3600

Output:

✓ API is healthy
  Status: online
  Database: connected
  Langfuse: enabled
  Response time: 234ms

Helper Tools Reference

1. create_test_task.py

Purpose: Create research task subscription

Usage:

python3 create_test_task.py \
  --api-key "$PROD_API_KEY" \
  [--api-url "$PROD_API_URL"] \
  --email "test@example.com" \
  --topic "Research topic" \
  --frequency daily|weekly|monthly \
  [--schedule-time "09:00"] \
  --output /tmp/task.json

Output: Task object with ID for tracking

2. execute_batch.py

Purpose: Trigger batch research execution

Usage:

python3 execute_batch.py \
  --api-key "$PROD_API_KEY" \
  [--api-url "$PROD_API_URL"] \
  --frequency daily|weekly|monthly \
  --callback-url "https://webhook.site/..." \
  --output /tmp/execution.json

Output: Execution status (running, tasks_found, started_at)

3. get_task.py

Purpose: Retrieve task details

Usage:

python3 get_task.py \
  --api-key "$PROD_API_KEY" \
  [--api-url "$PROD_API_URL"] \
  --task-id "abc123" \
  --output /tmp/task.json

4. list_tasks.py

Purpose: List all research tasks

Usage:

python3 list_tasks.py \
  --api-key "$PROD_API_KEY" \
  [--api-url "$PROD_API_URL"] \
  [--email "filter@example.com"] \
  [--frequency daily] \
  --output /tmp/tasks.json

5. delete_task.py

Purpose: Delete research task

Usage:

python3 delete_task.py \
  --api-key "$PROD_API_KEY" \
  [--api-url "$PROD_API_URL"] \
  --task-id "abc123"

6. wait_for_completion.py

Purpose: Poll task until completion

Usage:

python3 wait_for_completion.py \
  --api-key "$PROD_API_KEY" \
  --task-id "abc123" \
  [--timeout 600] \
  [--poll-interval 10] \
  --output /tmp/results.json

7. link_to_langfuse.py

Purpose: Find Langfuse trace for task execution

Usage:

python3 link_to_langfuse.py \
  --task-id "abc123" \
  --email "test@example.com" \
  [--time-range "last_1_hour"] \
  --output /tmp/langfuse_link.json

Output:

Trace query parameters
Langfuse dashboard URL
Latest trace ID

8. health_check.py

Purpose: Check API health

Usage:

python3 health_check.py \
  [--api-url "$PROD_API_URL"] \
  [--continuous] \
  [--interval 60] \
  [--duration 3600]

9. webhook_receiver.py (Local Testing)

Purpose: Run local webhook receiver for testing

Usage:

# Start local webhook receiver
python3 webhook_receiver.py \
  --port 8080 \
  --output-dir /tmp/webhooks

# Use as callback URL
export CALLBACK_URL="http://localhost:8080/webhook"

Features:

Logs all incoming webhooks
Saves payloads to disk
Provides ngrok-style public URL (if using tunneling)

Integration with Other Skills

With strategy-builder Skill

Optimization Loop:

1. strategy-builder: analyze_research_query.py
   → Determine if new strategy needed

2. strategy-builder: generate_strategy.py
   → Create strategy YAML

3. strategy-builder: validate_strategy.py
   → Validate structure

4. [Manual: Deploy strategy to database]

5. production-api-tester: create_test_task.py
   → Create test subscription

6. production-api-tester: execute_batch.py
   → Run research

7. production-api-tester: wait_for_completion.py
   → Get results

8. production-api-tester: link_to_langfuse.py
   → Find trace

9. strategy-builder: analyze_strategy_performance.py
   → Analyze performance

10. [Iterate: Apply fixes and repeat]

With langfuse-optimization Skill

Performance Deep Dive:

1. production-api-tester: execute_batch.py
   → Generate new traces

2. langfuse-optimization: retrieve_traces_and_observations.py
   → Get detailed trace data

3. langfuse-optimization: [analyze and fix configs]
   → Optimize style.yaml, template.yaml, tools.yaml

With langfuse-advanced-filters Skill

Targeted Analysis:

1. production-api-tester: Create multiple test tasks
2. production-api-tester: Execute batch
3. langfuse-advanced-filters: query_with_filters.py
   → Filter by specific criteria (e.g., latency > 10s)
4. langfuse-advanced-filters: analyze_filtered_results.py
   → Identify patterns

Common Patterns

Pattern 1: Rapid Iteration Testing

# Loop for quick iterations
for i in {1..5}; do
  echo "Iteration $i"

  # Modify strategy (manual or automated)

  # Test
  python3 create_test_task.py ... --output /tmp/test_$i/task.json
  TASK_ID=$(jq -r '.id' /tmp/test_$i/task.json)
  python3 execute_batch.py ...
  python3 wait_for_completion.py --task-id "$TASK_ID" --output /tmp/test_$i/results.json

  # Analyze
  python3 link_to_langfuse.py --task-id "$TASK_ID" --output /tmp/test_$i/trace.json

  # Cleanup
  python3 delete_task.py --task-id "$TASK_ID"

  echo "Iteration $i complete. Review /tmp/test_$i/"
  sleep 5
done

Pattern 2: A/B Testing Strategies

# Test strategy A
python3 create_test_task.py --email "test-a@example.com" --topic "AI news" --output /tmp/ab_test/task_a.json

# Test strategy B (different strategy slug via topic classification)
python3 create_test_task.py --email "test-b@example.com" --topic "AI regulation detailed analysis" --output /tmp/ab_test/task_b.json

# Execute both
python3 execute_batch.py --frequency daily

# Compare results
python3 compare_tasks.py \
  --task-a /tmp/ab_test/task_a.json \
  --task-b /tmp/ab_test/task_b.json \
  --output /tmp/ab_test/comparison.json

Pattern 3: Regression Testing

# Before deploying changes to production, test current vs new

# 1. Baseline (current production strategy)
python3 create_test_task.py --email "baseline@example.com" --topic "Test topic" --output /tmp/regression/baseline_task.json
python3 execute_batch.py ...
# Save results

# 2. Make changes to strategy

# 3. Test new version
python3 create_test_task.py --email "new@example.com" --topic "Test topic" --output /tmp/regression/new_task.json
python3 execute_batch.py ...
# Compare results

# 4. Validate no regressions
python3 validate_regression.py \
  --baseline /tmp/regression/baseline_results.json \
  --new /tmp/regression/new_results.json \
  --output /tmp/regression/regression_report.json

Tips & Best Practices

1. Use Unique Test Emails

Always use identifiable test email addresses:

test-strategy-{strategy_name}@example.com
dev-{your_name}@example.com
Never use real user emails for testing

2. Clean Up Test Tasks

Always delete test tasks after validation:

# List all test tasks
python3 list_tasks.py --api-key "$PROD_API_KEY" | grep "test-"

# Bulk delete
python3 batch_delete_tasks.py --pattern "test-*"

3. Use Webhook.site for Quick Tests

For quick tests without setting up infrastructure:

Go to https://webhook.site
Copy your unique URL
Use as CALLBACK_URL
View results in browser

4. Tag Test Executions

When creating test tasks, use descriptive topics:

# Good
--topic "[TEST] AI news - strategy_v2_iteration_3"

# Bad
--topic "test"

This makes Langfuse traces easier to find and filter.

5. Automate Full Loop

Create a script for the full optimization loop:

#!/bin/bash
# optimize_strategy.sh

STRATEGY_SLUG=$1
TEST_TOPIC=$2

# Generate → Validate → Deploy → Test → Analyze → Report
...

6. Monitor API Rate Limits

Production API may have rate limits:

Wait between batch executions
Use --poll-interval to avoid overwhelming the API
Check API response headers for rate limit info

Troubleshooting

"Authentication failed":

Verify PROD_API_KEY is set correctly
Check API key is active in production database
Ensure X-API-Key header is sent

"Webhook not receiving results":

Verify CALLBACK_URL is publicly accessible
Check webhook receiver logs
Use webhook.site for debugging
Ensure URL doesn't have trailing slash inconsistencies

"Task execution times out":

Increase --timeout parameter
Check production logs for errors
Verify strategy is valid and doesn't have infinite loops
Check Langfuse for ERROR level traces

"Cannot find Langfuse trace":

Wait 30-60 seconds for trace to be indexed
Verify metadata fields match (email, topic)
Check time range is wide enough
Use --time-range "last_1_day" for safety

"Health check fails":

Verify PROD_API_URL is correct
Check if API is deployed and running
Verify network connectivity
Check API logs for startup errors

Security Considerations

1. API Key Management

DO:

Store API key in environment variables
Use separate API keys for testing vs production
Rotate keys regularly

DON'T:

Commit API keys to git
Share API keys in logs or screenshots
Use production API key for automated testing

2. Test Data

DO:

Use fake/test email addresses
Use non-sensitive research topics
Mark test tasks clearly

DON'T:

Use real user data for testing
Test with sensitive/confidential topics
Leave test tasks in production database

3. Webhook Security

DO:

Use HTTPS for webhook URLs
Validate webhook payloads
Log webhook failures

DON'T:

Expose webhook receiver without authentication
Trust webhook data without validation
Store sensitive data in webhook logs

Success Criteria

Good production testing should:

✅ Use isolated test tasks (identifiable emails)
✅ Clean up after completion
✅ Link to Langfuse traces for analysis
✅ Document results for comparison
✅ Enable rapid iteration (< 5 min per cycle)
✅ Validate before deploying to real users

Remember: This skill is about safe production testing, not replacing proper staging environments. Use it for:

Strategy validation
Performance profiling
Regression testing
Optimization loops

For high-risk changes, always test locally first using run_daily_briefing.py.

Install Skill

SKILL.md

Production API Tester

When to Use This Skill

What This Skill Does

Required Setup

Environment Variables

Workflow

Use Case 1: Test Single Strategy (Quick Test)

Step 1: Create Test Task

Step 2: Execute Research

Step 3: Monitor Results

Step 4: Link to Langfuse

Step 5: Cleanup

Use Case 2: Full Optimization Loop (Strategy Development)

Loop Iteration:

Use Case 3: Batch Testing (Multiple Strategies)

Step 1: Create Multiple Test Tasks

Step 2: Execute All

Step 3: Monitor All

Step 4: Analyze All

Step 5: Cleanup All

Use Case 4: Health Check & Monitoring

Helper Tools Reference

1. create_test_task.py

2. execute_batch.py

3. get_task.py

4. list_tasks.py

5. delete_task.py

6. wait_for_completion.py

7. link_to_langfuse.py

8. health_check.py

9. webhook_receiver.py (Local Testing)

Integration with Other Skills

With strategy-builder Skill

With langfuse-optimization Skill

With langfuse-advanced-filters Skill

Common Patterns

Pattern 1: Rapid Iteration Testing

Pattern 2: A/B Testing Strategies

Pattern 3: Regression Testing

Tips & Best Practices

1. Use Unique Test Emails

2. Clean Up Test Tasks

3. Use Webhook.site for Quick Tests

4. Tag Test Executions

5. Automate Full Loop

6. Monitor API Rate Limits

Troubleshooting

Security Considerations

1. API Key Management

2. Test Data

3. Webhook Security

Success Criteria