| name | temporal |
| description | Manage Temporal workflows: server lifecycle, worker processes, workflow execution, monitoring, and troubleshooting for Python SDK with temporal server start-dev. |
| version | 1.0.0 |
Temporal Skill
Manage Temporal workflows using local development server. This skill focuses on the execution, validation, and troubleshooting lifecycle of workflows.
| Property | Value |
|---|---|
| Target SDK | Python only |
| Server Type | temporal server start-dev (local development) |
| gRPC Port | 7233 |
Critical Concepts
Understanding how Temporal components interact is essential for troubleshooting:
How Workers, Workflows, and Tasks Relate
┌─────────────────────────────────────────────────────────────────┐
│ TEMPORAL SERVER │
│ Stores workflow history, manages task queues, coordinates work │
└─────────────────────────────────────────────────────────────────┘
│
Task Queue (named queue)
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ WORKER │
│ Long-running process that polls task queue for work │
│ Contains: Workflow definitions + Activity implementations │
│ │
│ When work arrives: │
│ - Workflow Task → Execute workflow code decisions │
│ - Activity Task → Execute activity code (business logic) │
└─────────────────────────────────────────────────────────────────┘
Key Insight: The workflow code runs inside the worker. If worker code is outdated or buggy, workflow execution fails.
Workflow Task vs Activity Task
| Task Type | What It Does | Where It Runs | On Failure |
|---|---|---|---|
| Workflow Task | Makes workflow decisions (what to do next) | Worker | Stalls the workflow until fixed |
| Activity Task | Executes business logic | Worker | Retries per retry policy |
CRITICAL: Workflow Task errors are fundamentally different from Activity Task errors:
- Workflow Task Failure → Workflow stops making progress entirely
- Activity Task Failure → Workflow retries the activity (workflow still progressing)
Environment Variables
| Variable | Default | Description |
|---|---|---|
CLAUDE_TEMPORAL_LOG_DIR |
/tmp/claude-temporal-logs |
Directory for worker log files |
CLAUDE_TEMPORAL_PID_DIR |
/tmp/claude-temporal-pids |
Directory for worker PID files |
CLAUDE_TEMPORAL_PROJECT_DIR |
$(pwd) |
Project root directory |
CLAUDE_TEMPORAL_PROJECT_NAME |
$(basename "$PWD") |
Project name (used for log/PID naming) |
CLAUDE_TEMPORAL_NAMESPACE |
default |
Temporal namespace |
TEMPORAL_ADDRESS |
localhost:7233 |
Temporal server gRPC address |
TEMPORAL_CLI |
temporal |
Path to Temporal CLI binary |
TEMPORAL_WORKER_CMD |
uv run worker |
Command to start worker |
Quick Start
# 1. Start server
./tools/ensure-server.sh
# 2. Start worker (ensures no old workers, starts fresh one)
./tools/ensure-worker.sh
# 3. Execute workflow
uv run starter # Capture workflow_id from output
# 4. Wait for completion
./tools/wait-for-workflow-status.sh --workflow-id <id> --status COMPLETED
# 5. Get result (IMPORTANT: verify result is correct, not an error message)
./tools/get-workflow-result.sh --workflow-id <id>
# 6. CLEANUP: Kill workers when done
./tools/kill-worker.sh
Worker Management
The Golden Rule
Ensure no old workers are running. Stale workers with outdated code cause:
- Non-determinism errors (history mismatch)
- Executing old buggy code
- Confusing behavior
Best practice: Run only ONE worker instance with the latest code.
Starting Workers
# PREFERRED: Smart restart (kills old, starts fresh)
./tools/ensure-worker.sh
This command:
- Finds ALL existing workers for the project
- Kills them
- Starts a new worker with fresh code
- Waits for worker to be ready
Verifying Workers
# List all running workers
./tools/list-workers.sh
# Check specific worker health
./tools/monitor-worker-health.sh
# View worker logs
tail -f $CLAUDE_TEMPORAL_LOG_DIR/worker-$(basename "$(pwd)").log
What to look for in logs:
Worker started, listening on task queue: ...→ Worker is readyWorker process died during startup→ Startup failure, check logs for error
Cleanup (REQUIRED)
Always kill workers when done. Don't leave workers running.
# Kill current project's worker
./tools/kill-worker.sh
# Kill ALL workers (full cleanup)
./tools/kill-all-workers.sh
# Kill all workers AND server
./tools/kill-all-workers.sh --include-server
Workflow Execution
Starting Workflows
# Execute workflow via starter script
uv run starter
CRITICAL: Capture the Workflow ID from output. You need it for all monitoring/troubleshooting.
Checking Status
# Get workflow status
temporal workflow describe --workflow-id <id>
# Wait for specific status
./tools/wait-for-workflow-status.sh \
--workflow-id <id> \
--status COMPLETED \
--timeout 60
Workflow Status Reference
| Status | Meaning | Action |
|---|---|---|
RUNNING |
Workflow in progress | Wait, or check if stalled |
COMPLETED |
Successfully finished | Get result, verify correctness |
FAILED |
Error during execution | Analyze error |
CANCELED |
Explicitly canceled | Review reason |
TERMINATED |
Force-stopped | Review reason |
TIMED_OUT |
Exceeded timeout | Increase timeout |
Getting Results
./tools/get-workflow-result.sh --workflow-id <id>
IMPORTANT - False Positive Detection:
Workflows may COMPLETE but return undesired results (e.g., error messages in the result payload).
// This workflow COMPLETED but the result is an ERROR!
{"status": "error", "message": "Failed to process request"}
Always verify the result content is correct, not just that the status is COMPLETED.
Troubleshooting
Step 1: Identify the Problem
# Check workflow status
temporal workflow describe --workflow-id <id>
# Check for stalled workflows (workflows stuck in RUNNING)
./tools/find-stalled-workflows.sh
# Analyze specific workflow errors
./tools/analyze-workflow-error.sh --workflow-id <id>
Step 2: Diagnose Using This Decision Tree
Workflow not behaving as expected?
│
├── Status: RUNNING but no progress (STALLED)
│ │
│ ├── Is it an interactive workflow waiting for signal/update?
│ │ └── YES → Send the required interaction
│ │
│ └── NO → Run: ./tools/find-stalled-workflows.sh
│ │
│ ├── WorkflowTaskFailed detected
│ │ │
│ │ ├── Non-determinism error (history mismatch)?
│ │ │ └── See: "Fixing Non-Determinism Errors" below
│ │ │
│ │ └── Other workflow task error (code bug, missing registration)?
│ │ └── See: "Fixing Other Workflow Task Errors" below
│ │
│ └── ActivityTaskFailed (excessive retries)
│ └── Activity is retrying. Fix activity code, restart worker.
│ Workflow will auto-retry with new code.
│
├── Status: COMPLETED but wrong result
│ └── Check result: ./tools/get-workflow-result.sh --workflow-id <id>
│ Is result an error message? → Fix workflow/activity logic
│
├── Status: FAILED
│ └── Run: ./tools/analyze-workflow-error.sh --workflow-id <id>
│ Fix code → ./tools/ensure-worker.sh → Start NEW workflow
│
├── Status: TIMED_OUT
│ └── Increase timeouts → ./tools/ensure-worker.sh → Start NEW workflow
│
└── Workflow never starts
└── Check: Worker running? Task queue matches? Workflow registered?
Fixing Workflow Task Errors
Workflow task errors STALL the workflow - it stops making progress entirely until the issue is fixed.
Fixing Non-Determinism Errors
Non-determinism occurs when workflow code changes while a workflow is running, causing history mismatch.
Symptoms:
WorkflowTaskFailedevents in history- "Non-deterministic error" or "history mismatch" in logs
Fix procedure:
# 1. TERMINATE affected workflows (they cannot recover)
temporal workflow terminate --workflow-id <id>
# 2. Kill existing workers
./tools/kill-worker.sh
# 3. Fix the workflow code if needed
# 4. Restart worker with corrected code
./tools/ensure-worker.sh
# 5. Verify workflow logic is correct
# 6. Start NEW workflow execution
uv run starter
Key point: Non-determinism corrupts the workflow. You MUST terminate and start fresh.
Fixing Other Workflow Task Errors
For workflow task errors that are NOT non-determinism (code bugs, missing registration, etc.):
Symptoms:
WorkflowTaskFailedevents- Error is NOT "history mismatch" or "non-deterministic"
Fix procedure:
# 1. Identify the error
./tools/analyze-workflow-error.sh --workflow-id <id>
# 2. Fix the root cause (code bug, worker config, etc.)
# 3. Kill and restart worker with fixed code
./tools/ensure-worker.sh
# 4. NO NEED TO TERMINATE - the workflow will automatically resume
# The new worker picks up where it left off and continues execution
Key point: Unlike non-determinism, the workflow can recover once you fix the code.
Fixing Activity Task Errors
Activity task errors cause retries, not immediate workflow failure.
Workflow Stalling Due to Retries
Workflows can appear stalled because an activity keeps failing and retrying.
Diagnosis:
# Check for excessive activity retries
./tools/find-stalled-workflows.sh
# Look for ActivityTaskFailed count
# Check worker logs for retry messages
tail -100 $CLAUDE_TEMPORAL_LOG_DIR/worker-$(basename "$(pwd)").log
Fix procedure:
# 1. Fix the activity code
# 2. Restart worker with fixed code
./tools/ensure-worker.sh
# 3. Worker auto-retries with new code
# No need to terminate or restart workflow
Activity Failure (Retries Exhausted)
When all retries are exhausted, the activity fails permanently.
Fix procedure:
# 1. Analyze the error
./tools/analyze-workflow-error.sh --workflow-id <id>
# 2. Fix activity code
# 3. Restart worker
./tools/ensure-worker.sh
# 4. Start NEW workflow (old one has failed)
uv run starter
Common Error Types Reference
| Error Type | Where to Find | What Happened | Recovery |
|---|---|---|---|
| Non-determinism | WorkflowTaskFailed in history |
Code changed during execution | Terminate workflow → Fix → Restart worker → NEW workflow |
| Workflow code bug | WorkflowTaskFailed in history |
Bug in workflow logic | Fix code → Restart worker → Workflow auto-resumes |
| Missing workflow | Worker logs | Workflow not registered | Add to worker.py → Restart worker |
| Missing activity | Worker logs | Activity not registered | Add to worker.py → Restart worker |
| Activity bug | ActivityTaskFailed in history |
Bug in activity code | Fix code → Restart worker → Auto-retries |
| Activity retries | ActivityTaskFailed (count >2) |
Repeated failures | Fix code → Restart worker → Auto-retries |
| Sandbox violation | Worker logs | Bad imports in workflow | Fix workflow.py imports → Restart worker |
| Task queue mismatch | Workflow never starts | Different queues in starter/worker | Align task queue names |
| Timeout | Status = TIMED_OUT | Operation too slow | Increase timeout config |
Interactive Workflows
Interactive workflows pause and wait for external input (signals or updates).
Signals
# Send signal to workflow
temporal workflow signal \
--workflow-id <id> \
--name "signal_name" \
--input '{"key": "value"}'
# Or via interact script (if available)
uv run interact --workflow-id <id> --signal-name "signal_name" --data '{"key": "value"}'
Updates
# Send update to workflow
temporal workflow update \
--workflow-id <id> \
--name "update_name" \
--input '{"approved": true}'
Queries
# Query workflow state (read-only)
temporal workflow query \
--workflow-id <id> \
--name "get_status"
Common Recipes
Recipe 1: Clean Start (Fresh Environment)
./tools/kill-all-workers.sh
./tools/ensure-server.sh
./tools/ensure-worker.sh
uv run starter
Recipe 2: Debug Stalled Workflow
# 1. Find what's wrong
./tools/find-stalled-workflows.sh
./tools/analyze-workflow-error.sh --workflow-id <id>
# 2. Check worker logs
tail -100 $CLAUDE_TEMPORAL_LOG_DIR/worker-$(basename "$(pwd)").log
# 3. Fix based on error type (see decision tree above)
Recipe 3: Clear Stalled Environment
./tools/find-stalled-workflows.sh
./tools/bulk-cancel-workflows.sh
./tools/kill-worker.sh
./tools/ensure-worker.sh
Recipe 4: Test Interactive Workflow
./tools/ensure-worker.sh
uv run starter # Get workflow_id
./tools/wait-for-workflow-status.sh --workflow-id $workflow_id --status RUNNING
uv run interact --workflow-id $workflow_id --signal-name "approval" --data '{"approved": true}'
./tools/wait-for-workflow-status.sh --workflow-id $workflow_id --status COMPLETED
./tools/get-workflow-result.sh --workflow-id $workflow_id
./tools/kill-worker.sh # CLEANUP
Recipe 5: Check Recent Workflow Results
# List recent workflows
./tools/list-recent-workflows.sh --minutes 30
# Check results (verify they're correct, not error messages!)
./tools/get-workflow-result.sh --workflow-id <id1>
./tools/get-workflow-result.sh --workflow-id <id2>
Tool Reference
Lifecycle Tools
| Tool | Description | Key Options |
|---|---|---|
ensure-server.sh |
Start dev server if not running | - |
ensure-worker.sh |
Kill old workers, start fresh one | Uses $TEMPORAL_WORKER_CMD |
kill-worker.sh |
Kill current project's worker | - |
kill-all-workers.sh |
Kill all workers | --include-server |
list-workers.sh |
List running workers | - |
Monitoring Tools
| Tool | Description | Key Options |
|---|---|---|
list-recent-workflows.sh |
Show recent executions | --minutes N (default: 5) |
find-stalled-workflows.sh |
Detect stalled workflows | --query "..." |
monitor-worker-health.sh |
Check worker status | - |
wait-for-workflow-status.sh |
Block until status | --workflow-id, --status, --timeout |
Debugging Tools
| Tool | Description | Key Options |
|---|---|---|
analyze-workflow-error.sh |
Extract errors from history | --workflow-id, --run-id |
get-workflow-result.sh |
Get workflow output | --workflow-id, --raw |
bulk-cancel-workflows.sh |
Mass cancellation | --pattern "..." |
Log Files
| Log | Location | Content |
|---|---|---|
| Worker logs | $CLAUDE_TEMPORAL_LOG_DIR/worker-{project}.log |
Worker output, activity logs, errors |
Useful searches:
# Find errors
grep -i "error" $CLAUDE_TEMPORAL_LOG_DIR/worker-*.log
# Check worker startup
grep -i "started" $CLAUDE_TEMPORAL_LOG_DIR/worker-*.log
# Find activity issues
grep -i "activity" $CLAUDE_TEMPORAL_LOG_DIR/worker-*.log