name	response-rater
version	1.0.0
description	Run headless AI CLIs (Claude Code, Gemini, OpenAI Codex, Cursor Agent, GitHub Copilot) to rate an assistant response against a rubric and return actionable feedback plus a rewritten improved response; use for response quality audits, prompt/docs reviews, and "have another AI critique this answer" workflows.
allowed-tools	bash

Response Rater

Use this skill to get independent critiques of an assistant response by calling external AI CLIs in headless mode and aggregating the results.

Supported Providers

Provider	CLI Command	Auth	Default Model
Claude Code	`claude`	Session or `ANTHROPIC_API_KEY`	(CLI default)
Gemini CLI	`gemini`	Session or `GEMINI_API_KEY`/`GOOGLE_API_KEY`	gemini-3-pro-preview
OpenAI Codex	`codex`	`OPENAI_API_KEY` or `CODEX_API_KEY`	gpt-5.1-codex-max
Cursor Agent	`cursor-agent` (via WSL on Windows)	Session or `CURSOR_API_KEY`	auto
GitHub Copilot	`copilot`	GitHub auth (`gh auth login`)	claude-sonnet-4.5

Quick Start

Save the response you want reviewed to a file (or pipe it via stdin).
Ensure you have at least one provider authenticated (see table above).
Run:

node .claude/skills/response-rater/scripts/rate.cjs --response-file <path> --providers claude,gemini

Or via stdin (PowerShell):

Get-Content response.txt | node .claude/skills/response-rater/scripts/rate.cjs --providers claude,gemini

Or via stdin (Bash):

cat response.txt | node .claude/skills/response-rater/scripts/rate.cjs --providers claude,gemini

All Providers Example

node .claude/skills/response-rater/scripts/rate.cjs \
  --response-file response.txt \
  --providers claude,gemini,codex,cursor,copilot

Model Selection

Each provider has a configurable model:

# Use specific models for each provider
node .claude/skills/response-rater/scripts/rate.cjs \
  --response-file response.txt \
  --providers gemini,codex,copilot \
  --gemini-model gemini-2.5-pro \
  --codex-model gpt-5.1-codex \
  --copilot-model gpt-5

Available Models by Provider

Gemini CLI:

gemini-3-pro-preview (default, latest flagship)
gemini-3-flash-preview (fast, latest)
gemini-2.5-pro (stable pro)
gemini-2.5-flash (stable fast)
gemini-2.5-flash-lite (lightweight)

OpenAI Codex:

gpt-5.1-codex-max (default, flagship)
gpt-5.1-codex (standard)
gpt-5.1-codex-mini (faster/cheaper)

Cursor Agent:

auto (default, smart selection)
gpt-5.1-high, gpt-5.1-codex-high
opus-4.5, sonnet-4.5
gemini-3-pro

GitHub Copilot:

claude-sonnet-4.5 (default)
claude-opus-4.5, claude-haiku-4.5, claude-sonnet-4
gpt-5.1-codex-max, gpt-5.1-codex, gpt-5.1-codex-mini
gpt-5.2, gpt-5.1, gpt-5, gpt-5-mini, gpt-4.1
gemini-3-pro-preview

Templates

# Response review (default) - critique against rubric
--template response-review

# Vocabulary review - security audit for LLM vocabulary files
--template vocab-review

Vocabulary Review Example

Get-Content vocabulary.json | node .claude/skills/response-rater/scripts/rate.cjs \
  --providers claude,gemini \
  --template vocab-review

Auth Behavior

By default the runner uses --auth-mode session-first:

Try the CLI using your existing logged-in session/subscription
If that fails and env keys exist, retry using env keys

To flip the order:

node .claude/skills/response-rater/scripts/rate.cjs \
  --response-file response.txt \
  --auth-mode env-first \
  --providers claude,gemini

Direct Headless Commands (No Script)

Claude Code:

Get-Content response.txt | claude -p --output-format json --permission-mode bypassPermissions

Gemini CLI:

Get-Content response.txt | gemini --output-format json --model gemini-3-pro-preview

OpenAI Codex:

codex exec --json --color never --model gpt-5.1-codex-max --skip-git-repo-check "Your prompt"

Cursor Agent (via WSL on Windows):

wsl bash -lc "cursor-agent -p 'Your prompt' --output-format json --model auto"

GitHub Copilot:

copilot -p --silent --no-color --model claude-sonnet-4.5 "Your prompt"

Output Format

JSON to stdout with:

promptVersion: Schema version
template: Template used
authMode: Auth mode used
providers: Object with per-provider results:
- ok: Boolean success
- authUsed: Which auth method worked
- raw: Truncated raw output
- parsed: Extracted JSON with scores, summary, improvements, rewrite
- attempts: Auth attempt history

Sample Output

{
  "promptVersion": 3,
  "template": "response-review",
  "authMode": "session-first",
  "providers": {
    "claude": {
      "ok": true,
      "authUsed": "session",
      "parsed": {
        "scores": {
          "correctness": 8,
          "completeness": 7,
          "clarity": 9,
          "actionability": 8,
          "risk_management": 6,
          "constraint_alignment": 8,
          "brevity": 7
        },
        "summary": "The response is well-structured...",
        "improvements": ["Add error handling for...", "..."],
        "rewrite": "Improved version..."
      }
    },
    "gemini": { "ok": true, "..." },
    "codex": { "ok": true, "..." }
  }
}

Notes / Constraints

This skill requires network access to contact the providers.
If a provider is missing credentials, it will be skipped with a clear error.
Keep the reviewed response reasonably sized; start with the exact section you want critiqued.
Timeout is 180 seconds per provider (3 minutes).

Installation Requirements

Install the CLIs you want to use:

# Claude Code (via npm)
npm install -g @anthropic-ai/claude-code

# Gemini CLI (via npm)
npm install -g @anthropic-ai/gemini

# OpenAI Codex (via npm)
npm install -g @openai/codex

# Cursor Agent (via curl, inside WSL on Windows)
wsl bash -lc "curl https://cursor.com/install -fsS | bash"

# GitHub Copilot (via npm)
npm install -g @github/copilot

Skill Invocation

Natural language (recommended):

"Rate this response against the rubric"
"Have another AI critique this answer"
"Review the vocabulary file for security issues"
"Get feedback from Claude, Gemini, and Codex on this response"

Direct CLI:

node .claude/skills/response-rater/scripts/rate.cjs --response-file response.txt --providers claude,gemini,codex,cursor,copilot

CLI Reference

Usage:
  node .claude/skills/response-rater/scripts/rate.cjs --response-file <path> [options]
  cat response.txt | node .claude/skills/response-rater/scripts/rate.cjs [options]

Options:
  --response-file <path>   # file containing the response to review
  --question-file <path>   # optional; original question/request for context
  --providers <list>       # comma-separated: claude,gemini,codex,cursor,copilot

Models:
  --gemini-model <model>   # default: gemini-3-pro-preview
  --codex-model <model>    # default: gpt-5.1-codex-max
  --cursor-model <model>   # default: auto
  --copilot-model <model>  # default: claude-sonnet-4.5

Templates:
  --template response-review   # default
  --template vocab-review      # security audit

Auth:
  --auth-mode session-first   # default
  --auth-mode env-first       # try env keys first

Plan Rating for Orchestration

Overview

The response-rater skill is mandatory for orchestrators to validate plan quality before workflow execution. This ensures all plans meet minimum quality standards and reduces execution failures.

Plan Rating Usage for Orchestrators

When to Use: After Planner creates a plan, before workflow execution starts

Workflow Pattern:

Planner creates plan-{{workflow_id}}.json
Orchestrator invokes response-rater skill with plan content
Response-rater evaluates plan using rubric
If score >= 7/10: Proceed with execution
If score < 7/10: Return to Planner with feedback (max 3 attempts)

Command:

node .claude/skills/response-rater/scripts/rate.cjs \
  --response-file .claude/context/artifacts/plan-{{workflow_id}}.json \
  --providers claude,gemini \
  --template plan-review

Rubric Dimensions

Plans are evaluated on 5 key dimensions (equal weight):

Dimension	Weight	Description
completeness	20%	All required information present, no gaps
feasibility	20%	Plan is realistic and achievable
risk_mitigation	20%	Risks identified and mitigation strategies defined
agent_coverage	20%	Appropriate agents assigned to each task
integration	20%	Plan integrates properly with existing systems

Overall Score: Average of all 5 dimensions

Minimum Score: 7/10 (standard), 8/10 (enterprise), 9/10 (critical)

Feedback Format

Response-rater returns structured JSON with:

{
  "scores": {
    "completeness": 8,
    "feasibility": 7,
    "risk_mitigation": 6,
    "agent_coverage": 9,
    "integration": 7
  },
  "overall_score": 7.4,
  "summary": "Plan is generally solid with strong agent coverage...",
  "improvements": [
    "Add explicit error handling strategy for each phase",
    "Define fallback agents for critical steps",
    "Include rollback procedures for failed steps"
  ],
  "rewrite": "Improved plan with enhanced risk mitigation..."
}

Minimum Scores by Task Type

Task Type	Minimum Score	Example
Emergency	5/10	Production outages, critical hotfixes
Standard	7/10	Regular features, bug fixes (default)
Enterprise	8/10	Enterprise integrations, migrations
Critical	9/10	Security, compliance, data protection

Provider Configuration for Plan Rating

Default Providers

Standard plan rating uses 2 providers for consensus:

--providers claude,gemini

Enterprise plan rating uses 3 providers:

--providers claude,gemini,codex

Critical plan rating uses all 5 providers:

--providers claude,gemini,codex,cursor,copilot

Provider Selection Strategy

Scenario	Providers	Rationale
Standard plans	claude,gemini	Fast, reliable, good consensus
Enterprise plans	claude,gemini,codex	Additional perspective for complex plans
Critical plans	All 5	Maximum validation for mission-critical work

Timeout and Failure Handling

Timeout Configuration

Default timeout: 180 seconds per provider (3 minutes)

Rationale: Plan rating requires deep analysis; allow sufficient time

Increase timeout for complex plans:

# Enterprise migration plan (5 minutes per provider)
node .claude/skills/response-rater/scripts/rate.cjs \
  --response-file plan-enterprise-migration.json \
  --providers claude,gemini,codex \
  --timeout 300000

# Critical security audit plan (10 minutes per provider)
node .claude/skills/response-rater/scripts/rate.cjs \
  --response-file plan-security-audit.json \
  --providers claude,gemini,codex,cursor,copilot \
  --timeout 600000

Failure Handling

Scenario 1: One provider fails

Action: Continue with successful providers
Minimum: Require at least 1 successful provider
Log: Record failure in reasoning file

Scenario 2: All providers fail

Action: Retry with exponential backoff (3 attempts)
If still failing: Escalate to user
Options: Manual review, skip rating (force-proceed), cancel workflow

Scenario 3: Provider timeout

Action: Retry provider once
If timeout persists: Skip provider, continue with others
Log: Record timeout in reasoning file

Scenario 4: Authentication failure

Action: Retry with environment-based auth (if available)
If auth still fails: Skip provider, log error
Minimum: Require at least 1 authenticated provider

Workflow Integration for Plan Rating

Step 0.1: Plan Rating Gate (Standard Pattern)

All workflows include a plan rating gate after planning:

steps:
  - step: 0.1
    name: "Plan Rating Gate"
    agent: orchestrator
    type: validation
    skill: response-rater
    inputs:
      - plan-{{workflow_id}}.json (from step 0)
    outputs:
      - .claude/context/runs/<run_id>/plans/<plan_id>-rating.json
      - reasoning: .claude/context/history/reasoning/{{workflow_id}}/00.1-orchestrator.json
    validation:
      minimum_score: 7
      rubric_file: .claude/context/artifacts/standard-plan-rubric.json
      gate: .claude/context/history/gates/{{workflow_id}}/00.1-orchestrator.json
    retry:
      max_attempts: 3
      on_failure: escalate_to_human
    description: |
      Rate plan quality using response-rater skill.
      - Minimum passing score: 7/10
      - If score < 7: Return to Planner with feedback
      - If score >= 7: Proceed with workflow execution

Rating File Location

Standard path: .claude/context/runs/<run_id>/plans/<plan_id>-rating.json

Example: .claude/context/runs/run-001/plans/plan-greenfield-2025-01-06-rating.json

Orchestrator Responsibilities

After Planner completes: Invoke response-rater skill with plan content
Process rating results: Extract overall score and rubric scores
Save rating: Persist to standard path for audit trail
Decision logic:
- If score >= minimum: Proceed to next step
- If score < minimum: Return to Planner with feedback (max 3 attempts)
- If 3 failures: Escalate to user for manual review

Retry Logic

Attempt 1: Rate → if < 7 → return to Planner with feedback
Attempt 2: Planner revises → re-rate → if < 7 → return again
Attempt 3: Final revision → re-rate → if < 7 → escalate to user

Force-proceed: User can approve execution despite low score (requires explicit acknowledgment of risks)

Timeout Escalation

When a provider times out, the response-rater follows this escalation pattern:

Escalation Flow

Primary Provider Timeout (after 180s)
- Log timeout with provider name
- Mark provider as failed for this request
- Immediately try next provider in list
Secondary Provider Timeout
- Log cumulative timeout
- If more providers available, try next
- If total time > 600s, stop and use available results
All Providers Timeout
- Check for cached rating (< 1 hour old)
- If cached: Use cached rating with warning
- If no cache: Escalate to manual review

Configuration

Timeouts are configured in .claude/config/response-rater.yaml:

timeouts:
  per_provider: 180    # Per-provider timeout
  total_max: 600       # Max total time
  connection: 30       # Connection timeout

Example Timeout Scenario

Provider: claude (timeout after 180s) → FAILED
Provider: gemini (success in 45s) → Score: 7.2
Provider: codex (timeout after 180s) → FAILED

Result: Score 7.2 from gemini (single provider)
Warning: "2 of 3 providers timed out"

Fallback Behavior

fallback:
  on_all_fail: use_cached_or_manual_review
  cache_ttl: 3600  # Use cache if < 1 hour old
  manual_review_enabled: true

When all providers fail:

Check rating cache for identical plan hash
If valid cache exists, return cached rating with source: cache flag
If no cache, return { status: "manual_review_required" }
Log escalation for monitoring

Provider Selection by Workflow

Response-rater automatically selects providers based on workflow criticality:

Workflow Type	Tier	Providers	Rationale
incident-flow	standard	claude, gemini	Fast response for emergencies
quick-flow	standard	claude, gemini	Quick validation for minor changes
enterprise-track	enterprise	claude, gemini, codex	Enhanced validation for enterprise
automated-enterprise-flow	enterprise	claude, gemini, codex	Enterprise compliance requirements
legacy-modernization-flow	enterprise	claude, gemini, codex	Complex migration validation
ai-system-flow	critical	all 5 providers	Maximum validation for AI systems
security-flow	critical	all 5 providers	Critical security validation
compliance-flow	critical	all 5 providers	Regulatory compliance requirements

Provider selection tool:

node .claude/tools/response-rater-provider-selector.mjs --workflow <name>

References

Plan Rating Guide: .claude/docs/PLAN_RATING_GUIDE.md (comprehensive documentation)
Enforcement Gate: .claude/tools/enforcement-gate.mjs (validation CLI)
Plan Review Matrix: .claude/context/plan-review-matrix.json (minimum scores by task type)
Workflow Guide: .claude/workflows/WORKFLOW-GUIDE.md (workflow integration)
Provider Config: .claude/config/response-rater.yaml (provider selection and timeouts)

Install Skill

SKILL.md