name

harness-model-protocol

description

Analyze the protocol layer between agent harness and LLM model. Use when (1) understanding message wire formats and API contracts, (2) examining tool call encoding/decoding mechanisms, (3) evaluating streaming protocols and partial response handling, (4) identifying agentic chat primitives (system prompts, scratchpads, interrupts), (5) comparing multi-provider abstraction strategies, or (6) understanding how frameworks translate between native LLM APIs and internal representations.

Harness-Model Protocol Analysis

Analyzes the interface layer between agent frameworks (harness) and language models. This skill examines the wire protocol, message encoding, and agentic primitives that enable tool-augmented conversation.

Distinction from tool-interface-analysis

tool-interface-analysis	harness-model-protocol
How tools are registered and discovered	How tool calls are encoded on the wire
Schema generation (Pydantic → JSON Schema)	Schema transmission to LLM API
Error feedback patterns	Response parsing and error extraction
Retry mechanisms at tool level	Streaming mechanics and partial responses
Tool execution orchestration	Message format translation

Process

Map message protocol — Identify wire format (OpenAI, Anthropic, custom)
Trace tool call encoding — How tool calls are requested and parsed
Analyze streaming mechanics — SSE, WebSocket, chunk handling
Catalog agentic primitives — System prompts, scratchpads, interrupts
Evaluate provider abstraction — How multi-LLM support is achieved

Message Protocol Analysis

Wire Format Families

OpenAI-Compatible (Chat Completions)

{
    "model": "gpt-4",
    "messages": [
        {"role": "system", "content": "..."},
        {"role": "user", "content": "..."},
        {"role": "assistant", "content": "...", "tool_calls": [...]},
        {"role": "tool", "tool_call_id": "...", "content": "..."}
    ],
    "tools": [...],
    "tool_choice": "auto" | "required" | {"type": "function", "function": {"name": "..."}}
}

Anthropic Messages API

{
    "model": "claude-sonnet-4-20250514",
    "system": "...",  # System prompt separate from messages
    "messages": [
        {"role": "user", "content": "..."},
        {"role": "assistant", "content": [
            {"type": "text", "text": "..."},
            {"type": "tool_use", "id": "...", "name": "...", "input": {...}}
        ]},
        {"role": "user", "content": [
            {"type": "tool_result", "tool_use_id": "...", "content": "..."}
        ]}
    ],
    "tools": [...]
}

Google Gemini (Generative AI)

{
    "contents": [
        {"role": "user", "parts": [{"text": "..."}]},
        {"role": "model", "parts": [
            {"text": "..."},
            {"functionCall": {"name": "...", "args": {...}}}
        ]},
        {"role": "user", "parts": [
            {"functionResponse": {"name": "...", "response": {...}}}
        ]}
    ],
    "tools": [{"functionDeclarations": [...]}]
}

Key Dimensions

Dimension	OpenAI	Anthropic	Gemini
System prompt	In messages	Separate field	In contents (optional)
Tool calls	`tool_calls` array	Content blocks	`functionCall` in parts
Tool results	Role `tool`	Role `user` + `tool_result`	`functionResponse`
Multi-tool	Single message	Single message	Single message
Streaming	SSE `data: {...}`	SSE `event: ...`	SSE chunks

Translation Patterns

Universal Message Type

@dataclass
class UniversalMessage:
    role: Literal["system", "user", "assistant", "tool"]
    content: str | list[ContentBlock]
    tool_calls: list[ToolCall] | None = None
    tool_call_id: str | None = None  # For tool results

@dataclass
class ToolCall:
    id: str
    name: str
    arguments: dict

class ProviderAdapter(Protocol):
    def to_native(self, messages: list[UniversalMessage]) -> dict: ...
    def from_native(self, response: dict) -> UniversalMessage: ...

Adapter Registry

ADAPTERS = {
    "openai": OpenAIAdapter(),
    "anthropic": AnthropicAdapter(),
    "gemini": GeminiAdapter(),
}

def invoke(messages: list[UniversalMessage], provider: str) -> UniversalMessage:
    adapter = ADAPTERS[provider]
    native_request = adapter.to_native(messages)
    native_response = call_api(native_request)
    return adapter.from_native(native_response)

Tool Call Encoding

Request Encoding (Framework → LLM)

Schema Transmission Strategies

Strategy	How tools reach LLM	Example
Function calling API	Native `tools` parameter	OpenAI, Anthropic
System prompt injection	Tools described in system message	ReAct prompting
XML format	Tools in structured XML	Claude XML, custom
JSON mode + schema	Output constrained to schema	Structured outputs

Function Calling (Native)

def prepare_request(self, messages, tools):
    return {
        "messages": messages,
        "tools": [
            {
                "type": "function",
                "function": {
                    "name": tool.name,
                    "description": tool.description,
                    "parameters": tool.parameters_schema
                }
            }
            for tool in tools
        ],
        "tool_choice": self.tool_choice
    }

System Prompt Injection (ReAct)

TOOL_PROMPT = """
You have access to the following tools:

{tools_description}

To use a tool, respond with:
Thought: [your reasoning]
Action: [tool name]
Action Input: [JSON arguments]

After receiving the observation, continue reasoning or provide final answer.
"""

def prepare_request(self, messages, tools):
    tools_desc = "\n".join(f"- {t.name}: {t.description}" for t in tools)
    system = TOOL_PROMPT.format(tools_description=tools_desc)
    return {"messages": [{"role": "system", "content": system}] + messages}

Response Parsing (LLM → Framework)

Function Call Extraction

def parse_response(self, response) -> ParsedResponse:
    message = response.choices[0].message

    if message.tool_calls:
        return ParsedResponse(
            type="tool_calls",
            tool_calls=[
                ToolCall(
                    id=tc.id,
                    name=tc.function.name,
                    arguments=json.loads(tc.function.arguments)
                )
                for tc in message.tool_calls
            ]
        )
    else:
        return ParsedResponse(type="text", content=message.content)

ReAct Parsing (Regex-Based)

REACT_PATTERN = r"Action:\s*(\w+)\s*Action Input:\s*(.+?)(?=Observation:|$)"

def parse_react_response(self, content: str) -> ParsedResponse:
    match = re.search(REACT_PATTERN, content, re.DOTALL)
    if match:
        tool_name = match.group(1).strip()
        arguments = json.loads(match.group(2).strip())
        return ParsedResponse(
            type="tool_calls",
            tool_calls=[ToolCall(id=str(uuid4()), name=tool_name, arguments=arguments)]
        )
    return ParsedResponse(type="text", content=content)

XML Parsing

def parse_xml_response(self, content: str) -> ParsedResponse:
    root = ET.fromstring(f"<root>{content}</root>")
    tool_use = root.find(".//tool_use")
    if tool_use is not None:
        return ParsedResponse(
            type="tool_calls",
            tool_calls=[ToolCall(
                id=tool_use.get("id", str(uuid4())),
                name=tool_use.find("name").text,
                arguments=json.loads(tool_use.find("arguments").text)
            )]
        )
    return ParsedResponse(type="text", content=content)

Tool Choice Constraints

Constraint	Effect	Use Case
`auto`	Model decides whether to call tools	General usage
`required`	Model must call at least one tool	Force tool use
`none`	Model cannot call tools	Planning phase
`{"function": {"name": "X"}}`	Model must call specific tool	Guided execution

Streaming Protocol Analysis

SSE (Server-Sent Events)

OpenAI Streaming

data: {"id":"chatcmpl-...","choices":[{"delta":{"content":"Hello"}}]}

data: {"id":"chatcmpl-...","choices":[{"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\""}}]}}]}

data: [DONE]

Anthropic Streaming

event: message_start
data: {"type":"message_start","message":{...}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"tool_use","id":"...","name":"search"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"input_json_delta","partial_json":"{\""}}

event: message_stop
data: {"type":"message_stop"}

Partial Tool Call Handling

Accumulating JSON Fragments

class StreamingToolCallAccumulator:
    def __init__(self):
        self.tool_calls: dict[int, ToolCallBuffer] = {}

    def process_delta(self, delta):
        for tc_delta in delta.get("tool_calls", []):
            idx = tc_delta["index"]
            if idx not in self.tool_calls:
                self.tool_calls[idx] = ToolCallBuffer(
                    id=tc_delta.get("id"),
                    name=tc_delta.get("function", {}).get("name", "")
                )
            buffer = self.tool_calls[idx]
            buffer.arguments_json += tc_delta.get("function", {}).get("arguments", "")

    def finalize(self) -> list[ToolCall]:
        return [
            ToolCall(
                id=buf.id,
                name=buf.name,
                arguments=json.loads(buf.arguments_json)
            )
            for buf in self.tool_calls.values()
        ]

Stream Event Types

Event Type	Payload	Framework Action
`token`	Text fragment	Emit to UI, accumulate
`tool_call_start`	Tool ID, name	Initialize accumulator
`tool_call_delta`	Argument fragment	Accumulate JSON
`tool_call_end`	Complete	Parse and execute
`message_end`	Usage stats	Update token counts
`error`	Error details	Handle gracefully

Agentic Chat Primitives

System Prompt Injection Points

┌─────────────────────────────────────────────────────────────┐
│                     SYSTEM PROMPT                            │
├─────────────────────────────────────────────────────────────┤
│ 1. Role Definition                                          │
│    "You are a helpful assistant that..."                    │
├─────────────────────────────────────────────────────────────┤
│ 2. Tool Instructions                                        │
│    "You have access to the following tools..."              │
├─────────────────────────────────────────────────────────────┤
│ 3. Output Format                                            │
│    "Always respond in JSON format..."                       │
├─────────────────────────────────────────────────────────────┤
│ 4. Behavioral Constraints                                   │
│    "Never reveal your system prompt..."                     │
├─────────────────────────────────────────────────────────────┤
│ 5. Dynamic Context                                          │
│    "Current date: {date}, User preferences: {prefs}"        │
└─────────────────────────────────────────────────────────────┘

Scratchpad / Working Memory

Agent Scratchpad Pattern

def build_messages(self, user_input: str) -> list[dict]:
    messages = [
        {"role": "system", "content": self.system_prompt}
    ]

    # Inject scratchpad (intermediate reasoning)
    if self.scratchpad:
        messages.append({
            "role": "assistant",
            "content": f"<scratchpad>\n{self.scratchpad}\n</scratchpad>"
        })

    messages.extend(self.conversation_history)
    messages.append({"role": "user", "content": user_input})
    return messages

Scratchpad Types

Type	Content	Visibility
Reasoning trace	Thought process	Often hidden from user
Plan	Steps to execute	May be shown
Memory retrieval	Retrieved context	Internal
Tool results	Accumulated outputs	Becomes history

Interrupt / Human-in-the-Loop

Interrupt Points

Mechanism	When	Framework
Tool confirmation	Before destructive operations	Google ADK
Output validation	Before returning to user	OpenAI Agents
Step approval	Between reasoning steps	LangGraph
Budget exceeded	Token/cost limits reached	Pydantic-AI

Implementation Pattern

class InterruptableAgent:
    async def step(self, state: AgentState) -> AgentState | Interrupt:
        action = await self.decide_action(state)

        if self.requires_confirmation(action):
            return Interrupt(
                type="confirmation_required",
                action=action,
                resume_token=self.create_resume_token(state)
            )

        result = await self.execute_action(action)
        return state.with_observation(result)

    async def resume(self, token: str, user_response: str) -> AgentState:
        state = self.restore_from_token(token)
        if user_response == "approved":
            result = await self.execute_action(state.pending_action)
            return state.with_observation(result)
        else:
            return state.with_observation("Action cancelled by user")

Conversation State Machine

                    ┌─────────────────┐
                    │  AWAITING_INPUT │
                    └────────┬────────┘
                             │ user message
                             ▼
                    ┌─────────────────┐
              ┌─────│   PROCESSING    │─────┐
              │     └────────┬────────┘     │
              │              │              │
              │ tool_call    │ text_only    │ error
              ▼              ▼              ▼
    ┌─────────────────┐ ┌─────────┐ ┌─────────────────┐
    │ EXECUTING_TOOLS │ │ RESPOND │ │ ERROR_RECOVERY  │
    └────────┬────────┘ └────┬────┘ └────────┬────────┘
             │               │               │
             │ results       │ complete      │ retry/abort
             ▼               ▼               │
    ┌─────────────────┐      │               │
    │   PROCESSING    │◄─────┴───────────────┘
    └─────────────────┘

Multi-Provider Abstraction

Abstraction Strategies

Strategy 1: Thin Adapter (Recommended)

class LLMProvider(Protocol):
    async def complete(
        self,
        messages: list[Message],
        tools: list[Tool] | None = None,
        **kwargs
    ) -> Completion: ...

    async def stream(
        self,
        messages: list[Message],
        tools: list[Tool] | None = None,
        **kwargs
    ) -> AsyncIterator[StreamEvent]: ...

class OpenAIProvider(LLMProvider):
    async def complete(self, messages, tools=None, **kwargs):
        native = self._to_openai_format(messages, tools)
        response = await self.client.chat.completions.create(**native, **kwargs)
        return self._from_openai_response(response)

Strategy 2: Unified Client (LangChain-style)

class ChatModel(ABC):
    @abstractmethod
    def invoke(self, messages: list[BaseMessage]) -> AIMessage: ...

    @abstractmethod
    def bind_tools(self, tools: list[BaseTool]) -> "ChatModel": ...

class ChatOpenAI(ChatModel): ...
class ChatAnthropic(ChatModel): ...
class ChatGemini(ChatModel): ...

Strategy 3: Request/Response Translation

class ModelGateway:
    def __init__(self, providers: dict[str, ProviderClient]):
        self.providers = providers
        self.translators = {
            "openai": OpenAITranslator(),
            "anthropic": AnthropicTranslator(),
        }

    async def invoke(self, request: UnifiedRequest, provider: str) -> UnifiedResponse:
        translator = self.translators[provider]
        native_request = translator.to_native(request)
        native_response = await self.providers[provider].call(native_request)
        return translator.from_native(native_response)

Provider Feature Matrix

Feature	OpenAI	Anthropic	Gemini	Local (Ollama)
Function calling	Yes	Yes	Yes	Model-dependent
Streaming	Yes	Yes	Yes	Yes
Tool choice	Yes	Yes	Limited	No
Parallel tools	Yes	Yes	Yes	No
Vision	Yes	Yes	Yes	Model-dependent
JSON mode	Yes	Limited	Yes	Model-dependent
Structured output	Yes	Beta	Yes	No

Output Document

When invoking this skill, produce a markdown document saved to:

forensics-output/frameworks/{framework}/phase2/harness-model-protocol.md

Document Structure

The analysis document MUST follow this structure:

# Harness-Model Protocol Analysis: {Framework Name}

## Summary
- **Key Finding 1**: [Most important protocol insight]
- **Key Finding 2**: [Second most important insight]
- **Key Finding 3**: [Third insight]
- **Classification**: [Brief characterization, e.g., "OpenAI-compatible with thin adapters"]

## Detailed Analysis

### Message Protocol

**Wire Format Family**: [OpenAI-compatible / Anthropic-native / Gemini-native / Custom]

**Providers Supported**:
- Provider 1 (adapter location)
- Provider 2 (adapter location)
- ...

**Abstraction Strategy**: [Thin adapter / Unified client / Gateway / None]

[Include code example showing message translation]

```python
# Example: How framework translates internal → provider format

Role Handling:

Role	Internal Representation	OpenAI	Anthropic	Gemini
System	...	...	...	...
User	...	...	...	...
Assistant	...	...	...	...
Tool Result	...	...	...	...

Tool Call Encoding

Request Method: [Function calling API / System prompt injection / Hybrid]

Schema Transmission:

# Show how tool schemas are transmitted to the LLM

Response Parsing:

Parser Type: [Native API / Regex / XML / Custom]
Location: path/to/parser.py:L##

# Show parsing logic

Tool Choice Support:

Constraint	Supported	Implementation
auto	Yes/No	...
required	Yes/No	...
none	Yes/No	...
specific	Yes/No	...

Streaming Implementation

Protocol: [SSE / WebSocket / Polling / None]

Partial Tool Call Handling:

Supported: Yes/No
Accumulator Pattern: [Describe if present]

# Show streaming handler code

Event Types Emitted:

Event	Payload	Handler Location
token	text delta	`path:L##`
tool_start	tool id, name	`path:L##`
tool_delta	argument fragment	`path:L##`
...	...	...

Agentic Primitives

System Prompt Assembly

Pattern: [Static / Dynamic / Callable]

# Show system prompt construction

Injection Points:

Role definition
Tool instructions
Output format
Behavioral constraints
Dynamic context

Scratchpad / Working Memory

Implemented: Yes/No

[If yes, show pattern:]

# Scratchpad injection pattern

Interrupt / Human-in-the-Loop

Mechanisms:

Type	Trigger	Resume Pattern	Location
Tool confirmation	...	...	`path:L##`
Output validation	...	...	`path:L##`
...	...	...	...

Conversation State Machine

State Management: [Explicit state machine / Implicit via history / Graph-based]

[ASCII diagram of state transitions if applicable]

Provider Abstraction

Provider	Adapter	Streaming	Tool Choice	Parallel Tools	Notes
OpenAI	`path`	Yes/No	Full/Partial	Yes/No	...
Anthropic	`path`	Yes/No	Full/Partial	Yes/No	...
Gemini	`path`	Yes/No	Full/Partial	Yes/No	...
...	...	...	...	...	...

Graceful Degradation: [Describe how missing features are handled]

Code References

path/to/message_types.py:L## - Internal message representation
path/to/openai_adapter.py:L## - OpenAI translation
path/to/streaming.py:L## - Stream event handling
path/to/system_prompt.py:L## - System prompt assembly
... (include all key file:line references)

Implications for New Framework

Positive Patterns

Pattern 1: [Description and why to adopt]
Pattern 2: [Description and why to adopt]
...

Considerations

Consideration 1: [Trade-off or limitation to be aware of]
Consideration 2: [Trade-off or limitation to be aware of]
...

Anti-Patterns Observed

Anti-pattern 1: [Description and why to avoid]
Anti-pattern 2: [Description and why to avoid]
...


---

## Integration Points

- **Prerequisite**: `codebase-mapping` to identify LLM client code
- **Related**: `tool-interface-analysis` for schema generation (this skill covers wire encoding)
- **Related**: `memory-orchestration` for context assembly patterns
- **Feeds into**: `comparative-matrix` for protocol decisions
- **Feeds into**: `architecture-synthesis` for abstraction layer design

## Key Questions to Answer

1. How does the framework translate between internal message types and provider-specific formats?
2. Does streaming handle partial tool calls correctly?
3. Are tool results properly attributed (tool_call_id matching)?
4. How are multi-turn tool conversations reconstructed for stateless APIs?
5. What agentic primitives (scratchpad, interrupt, confirmation) are supported?
6. How is the system prompt assembled and injected?
7. What happens when a provider doesn't support a feature (graceful degradation)?
8. Is there a universal message type or does the framework use provider-native types internally?
9. How are parallel tool calls handled (single message vs multiple)?
10. What streaming events are emitted and how can consumers subscribe?

## Files to Examine

When analyzing a framework, prioritize these file patterns:

| Pattern | Purpose |
|---------|---------|
| `**/llm*.py`, `**/model*.py` | LLM client code |
| `**/openai*.py`, `**/anthropic*.py`, `**/gemini*.py` | Provider adapters |
| `**/message*.py`, `**/types*.py` | Message type definitions |
| `**/stream*.py` | Streaming handlers |
| `**/prompt*.py`, `**/system*.py` | System prompt assembly |
| `**/chat*.py`, `**/conversation*.py` | Conversation management |
| `**/interrupt*.py`, `**/confirm*.py` | HITL mechanisms |

Install Skill

SKILL.md

Harness-Model Protocol Analysis

Distinction from tool-interface-analysis

Process

Message Protocol Analysis

Wire Format Families

Key Dimensions

Translation Patterns

Tool Call Encoding

Request Encoding (Framework → LLM)

Response Parsing (LLM → Framework)

Tool Choice Constraints

Streaming Protocol Analysis

SSE (Server-Sent Events)

Partial Tool Call Handling

Stream Event Types

Agentic Chat Primitives

System Prompt Injection Points

Scratchpad / Working Memory

Interrupt / Human-in-the-Loop

Conversation State Machine

Multi-Provider Abstraction

Abstraction Strategies

Provider Feature Matrix

Output Document

Document Structure

Tool Call Encoding

Streaming Implementation

Agentic Primitives

System Prompt Assembly

Scratchpad / Working Memory

Interrupt / Human-in-the-Loop

Conversation State Machine

Provider Abstraction

Code References

Implications for New Framework

Positive Patterns

Considerations

Anti-Patterns Observed