name	agent-observability
description	Production tracing and metrics for multi-agent workflows. Track agent decisions, tool calls, and performance without monitoring conversation content.
type	infrastructure
priority	medium

Agent Observability

Purpose

Enable systematic diagnosis of multi-agent workflow failures by tracking:

Agent decision patterns
Interaction structures
Performance metrics
Error patterns

Important: Track behavior, not content. Respect user privacy.

Trace Events

Event Types

Event	Trigger	Data Captured
`agent_spawned`	Task tool called	agent_type, model, task_summary
`task_assigned`	Delegation created	task_id, agent, complexity
`tool_called`	Any tool invocation	tool_name, duration_ms
`result_received`	Agent completion	agent_id, status, findings_count
`decision_made`	Branching point	decision_type, choice, reasoning_length
`checkpoint_saved`	Memory save	checkpoint_type, location
`error_occurred`	Failure detected	error_type, agent, recoverable
`iteration_started`	Gap-filling loop	iteration_number, gaps_count

Event Format

{
  "event": "agent_spawned",
  "timestamp": "2025-01-04T12:00:00.000Z",
  "session_id": "sess_abc123",
  "task_id": "task_xyz",
  "data": {
    "agent_type": "mobile-ui-specialist",
    "model": "sonnet",
    "task_summary": "Create StationCard component",
    "complexity": "simple",
    "parent_agent": "lead-orchestrator"
  }
}

Trace Storage

Directory Structure

.temp/traces/
├── sessions/
│   └── sess_{id}/
│       ├── events.jsonl      # Append-only event log
│       ├── metrics.json      # Aggregated metrics
│       └── summary.md        # Human-readable summary
└── archive/
    └── {date}/
        └── sess_{id}.tar.gz

Event Log Format (JSONL)

{"event":"session_started","timestamp":"...","session_id":"sess_abc"}
{"event":"agent_spawned","timestamp":"...","session_id":"sess_abc","data":{...}}
{"event":"tool_called","timestamp":"...","session_id":"sess_abc","data":{...}}

Metrics

Per-Session Metrics

{
  "session_id": "sess_abc123",
  "started_at": "2025-01-04T12:00:00Z",
  "ended_at": "2025-01-04T12:15:00Z",
  "duration_ms": 900000,

  "agents": {
    "spawned": 4,
    "succeeded": 3,
    "failed": 1,
    "by_type": {
      "mobile-ui-specialist": 1,
      "backend-integration-specialist": 1,
      "test-automation-specialist": 1,
      "quality-validator": 1
    }
  },

  "tools": {
    "total_calls": 47,
    "by_tool": {
      "read": 15,
      "edit": 12,
      "grep": 8,
      "bash": 7,
      "write": 5
    }
  },

  "tokens": {
    "estimated_input": 45000,
    "estimated_output": 12000,
    "total": 57000
  },

  "iterations": 2,
  "checkpoints": 3,
  "errors": 1
}

Key Performance Indicators

Metric	Target	Warning	Critical
Agent success rate	>90%	<80%	<60%
Avg tools per agent	10-20	>30	>50
Iteration count	1-2	3	>4
Token efficiency	<100K	>150K	>200K
Time to completion	<15min	>30min	>60min

Tracing Operations

Start Session

Event: session_started
Data:
  - session_id: generated UUID
  - task_description: summary (no content)
  - complexity_assessment: trivial|simple|moderate|complex
  - planned_agents: count

Track Agent Spawn

Event: agent_spawned
Data:
  - agent_id: unique ID
  - agent_type: specialist name
  - model: sonnet|opus|haiku
  - task_summary: 10-20 words max
  - dependencies: list of agent_ids to wait for

Track Tool Call

Event: tool_called
Data:
  - tool_name: read|edit|grep|etc
  - duration_ms: execution time
  - success: boolean
  - file_count: for file operations

Track Decision

Event: decision_made
Data:
  - decision_type: effort_scaling|delegation|iteration|completion
  - choice: what was decided
  - alternatives_considered: count
  - confidence: high|medium|low

Track Error

Event: error_occurred
Data:
  - error_type: timeout|validation|integration|tool_failure
  - agent_id: where it occurred
  - recoverable: boolean
  - recovery_action: retry|skip|abort

End Session

Event: session_ended
Data:
  - status: completed|partial|failed|aborted
  - deliverables_count: files created/modified
  - quality_gates_passed: boolean

Analysis Patterns

Failure Diagnosis

When a session fails, analyze:

Error Clustering
- Are errors concentrated in one agent?
- Are they at specific phases?
- What tool calls preceded failures?
Decision Path
- Was complexity correctly assessed?
- Were agent boundaries clear?
- Were iterations excessive?
Performance Anomalies
- Unusually high tool calls?
- Long durations for simple tasks?
- Token usage spikes?

Success Patterns

Track what works:

Optimal agent combinations for task types
Effective delegation patterns
Successful iteration counts

Privacy Considerations

NEVER LOG:

Actual file contents
User messages (beyond classification)
Code snippets
Personal information
API keys or secrets

ALWAYS LOG:

Structural information (file counts, not files)
Timing information
Success/failure states
Tool names (not arguments)
Agent types (not outputs)

Integration with Orchestrator

During Execution

Lead Orchestrator responsibilities:
1. Generate session_id at start
2. Log agent_spawned for each Task call
3. Track decision points
4. Log errors with context
5. Save metrics at session end

Post-Execution

Analysis workflow:
1. Read session metrics
2. Compare to KPIs
3. Identify anomalies
4. Feed to agent-improvement skill

Example Trace Summary

# Session Summary: sess_abc123

## Overview
- Task: Add station favorites feature
- Complexity: Moderate
- Duration: 12m 34s
- Status: COMPLETED

## Agent Activity
| Agent | Tools | Duration | Status |
|-------|-------|----------|--------|
| backend-integration | 15 | 4m 12s | Success |
| mobile-ui | 18 | 5m 45s | Success |
| test-automation | 12 | 2m 15s | Success |
| quality-validator | 4 | 22s | Success |

## Iterations
- Round 1: 3 agents, 2 gaps found
- Round 2: 1 follow-up agent, completed

## Metrics
- Total tool calls: 49
- Estimated tokens: 67,000
- Checkpoints saved: 2
- Errors: 0

## Performance
- Agent success rate: 100%
- Token efficiency: Good (<100K)
- Iteration count: Normal (2)

Quick Commands

# View recent sessions
ls -lt .temp/traces/sessions/

# Read latest session events
tail -100 .temp/traces/sessions/sess_latest/events.jsonl

# View session metrics
cat .temp/traces/sessions/sess_latest/metrics.json

# Archive old sessions
./scripts/archive-traces.sh 7  # Archive sessions older than 7 days

Version: 1.0 | Last Updated: 2025-01-04

agent-observability

Install Skill

SKILL.md