Claude Code Plugins

Community-maintained marketplace

Feedback

Independent assessment without debate context — adversarial loop prevents gaming

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name evaluator
description Independent assessment without debate context — adversarial loop prevents gaming
allowed-tools read_file, write_file
tier 1
protocol INDEPENDENT-EVALUATOR
tags evaluation, adversarial, scoring, review
credits Mike Gallaher — independent evaluator pattern
related adversarial-committee, rubric, roberts-rules, room

Evaluator

"Fresh eyes, no bias, just the rubric."

Committee output goes to a separate model instance with NO debate context.

The Separation

evaluation:
  principle: "Evaluator has NO access to:"
    - debate_transcript
    - speaker_identities
    - amendment_history
    - voting_patterns
    - minority_dissents
    
  evaluator_sees_only:
    - final_output
    - rubric_criteria
    - subject_matter_context

Room Architecture

# committee-room/
#   ROOM.yml
#   debate.yml
#   output.yml
#   outbox/
#     evaluation-request-001.yml

# evaluation-room/
#   ROOM.yml
#   rubric.yml
#   inbox/
#     evaluation-request-001.yml  # Landed here
#   evaluations/
#     eval-001.yml

Evaluation Request

# Thrown from committee to evaluator
evaluation_request:
  id: eval-req-001
  from: committee-room
  timestamp: "2026-01-05T15:00:00Z"
  
  subject: "Client X Engagement Decision"
  
  output_only: |
    Recommendation: Accept Client X with:
    - Explicit scope boundaries
    - Milestone-based billing
    - Quarterly scope review
    
    Confidence: 0.65
    
    Key considerations:
    - Revenue opportunity aligns with growth goals
    - Risk mitigated by contractual protections
    - Capacity impact manageable
    
  rubric: client-evaluation-v1
  
  # Note: NO debate context included

Evaluation Process

evaluation:
  request: eval-req-001
  evaluator: "fresh model instance"
  context_loaded: false  # Critical!
  
  steps:
    1. load_rubric: client-evaluation-v1
    2. read_output: evaluation_request.output_only
    3. score_each_criterion: independently
    4. calculate_weighted_total: true
    5. generate_critique: if score < threshold

Evaluation Output

# evaluation-room/evaluations/eval-001.yml
evaluation:
  id: eval-001
  request: eval-req-001
  timestamp: "2026-01-05T15:05:00Z"
  
  rubric: client-evaluation-v1
  
  scores:
    resource_efficiency:
      score: 4
      rationale: "Output indicates capacity is manageable"
      
    risk_level:
      score: 3
      rationale: "Mitigations proposed but not detailed"
      confidence: "Would score higher with specific terms"
      
    strategic_alignment:
      score: 4
      rationale: "Growth goals mentioned, seems aligned"
      
    stakeholder_impact:
      score: 3
      rationale: "Not explicitly addressed in output"
      flag: "Committee should consider stakeholder effects"
      
  weighted_total: 3.45
  threshold: 3.5
  
  result: REVIEW  # Just below accept
  
  critique:
    summary: "Close to acceptance threshold"
    
    gaps:
      - "Risk mitigation lacks specifics"
      - "Stakeholder impact not addressed"
      - "Confidence of 0.65 seems low for recommendation"
      
    suggestions:
      - "Detail the milestone structure"
      - "Explain how scope boundaries will be enforced"
      - "Address impact on existing clients and team"
      
    if_addressed: "Score could reach 3.7+ (accept)"

Revision Loop

revision_loop:
  max_iterations: 3
  
  flow:
    1. committee_outputs: recommendation
    2. evaluator_scores: against rubric
    3. if score >= threshold: ACCEPT
    4. if score < threshold:
         - evaluator_generates: critique
         - critique_thrown_to: committee inbox
         - committee_revises: based on critique
         - goto: step 1
    5. if max_iterations reached: ESCALATE to human

Adversarial Properties

adversarial_separation:
  why: "Prevents committee gaming metrics"
  
  committee_cannot:
    - see_evaluator_reasoning
    - predict_exact_scores
    - optimize_for_rubric_loopholes
    
  evaluator_cannot:
    - be_influenced_by_debate_dynamics
    - favor_particular_speakers
    - weight_majority_over_minority
    
  result: "Genuine quality signal"

Commands

Command Action
EVALUATE [output] Send to independent evaluator
APPLY RUBRIC [name] Score against criteria
CRITIQUE Generate improvement suggestions
REVISE Committee addresses critique
ESCALATE Send to human decision maker

Integration

graph TD
    C[Committee] -->|output| T[THROW to outbox]
    T -->|lands in| I[Evaluator inbox]
    I --> E[Evaluate]
    R[RUBRIC.yml] --> E
    E --> S{Score}
    S -->|≥ threshold| A[✅ Accept]
    S -->|< threshold| CR[Generate Critique]
    CR -->|THROW back| CI[Committee inbox]
    CI --> REV[Revise]
    REV --> C
    
    subgraph "No Context Crossing"
    I
    E
    R
    end

Model Instance Separation

For true independence:

implementation:
  option_1:
    name: "Fresh conversation"
    method: "New chat with no history"
    
  option_2:
    name: "Separate model"
    method: "Different API call, different instance"
    
  option_3:
    name: "System prompt separation"
    method: "Explicit instruction: 'You have no prior context'"
    
  key_principle: |
    The evaluator must NOT have access to:
    - How the committee reached the conclusion
    - Who said what
    - What alternatives were considered
    - Why certain risks were dismissed
    
    Only the final output matters.