Claude Code Plugins

Community-maintained marketplace

Feedback

Audit a checkpoint specification for realism and design decision forcing. Reviews specs to remove hand-holding, hidden corner cases, and architectural giveaways. Invoke with /audit-spec <problem> <checkpoint>.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name audit-spec
description Audit a checkpoint specification for realism and design decision forcing. Reviews specs to remove hand-holding, hidden corner cases, and architectural giveaways. Invoke with /audit-spec <problem> <checkpoint>.

Specification Audit

You are an expert software engineer reviewing specifications for a take-home hiring task. Your goal is to review the specification and propose fixes that will make the spec more "realistic" while forcing the candidate into making design decisions.

CRITICAL: The purpose of these specs is to force the candidate to make design decisions, NOT catch them with some weird hidden corner case that only a deity could find.

Usage: /audit-spec execution_server checkpoint_2


Your Mindset

Think carefully and pay attention to details. You are looking for:

  1. Redundant information - Specs that repeat themselves or over-explain
  2. Architecture giveaways - Parts that tell candidates exactly how to structure their code
  3. Hand-holding - Basic information that any competent developer should know
  4. Hidden details in examples - Information buried in examples that should be in the main body (or removed entirely)
  5. Brute-force enablers - Examples that let candidates trial-and-error their way to a solution without understanding

Remember: Candidates see ONLY the specification and static assets from config.yaml. They do NOT see the tests.


Step 1: Gather All Context

Read these files for the specified problem/checkpoint:

problems/{problem}/checkpoint_N.md          # The specification
problems/{problem}/tests/test_checkpoint_N.py   # The tests (candidates DON'T see this)
problems/{problem}/tests/conftest.py        # Test fixtures and setup
problems/{problem}/config.yaml              # What assets candidates CAN see

Also check for static assets referenced in config.yaml that candidates receive.


Step 2: Parse the Tests Deeply

Understand exactly what the tests are checking. For each test:

  1. What behavior does it verify?
  2. What inputs does it use?
  3. What outputs does it expect?
  4. What edge cases does it cover?

Create a mental map of: Spec requirement → Test coverage → Gap analysis

Ask yourself:

  • Are tests checking things NOT in the spec? (Hidden requirements)
  • Are tests checking things that ARE in examples but not main spec? (Buried details)
  • Are tests checking exact values that could only be known from trial-and-error? (Brute-force enablers)

Step 3: Analyze the Specification

For each section of the spec, evaluate:

Redundancy Check

  • Is this information stated elsewhere?
  • Could this be combined with another section?
  • Is this just restating what any programmer would know?

Architecture Giveaway Check

  • Does this tell them exactly which data structures to use?
  • Does this dictate the class/function structure?
  • Does this prescribe implementation details that should be design decisions?

Hand-Holding Check

  • Is this basic programming knowledge being explained?
  • Are we explaining standard library functions?
  • Are we over-explaining error handling patterns?

Example Analysis

  • Do examples contain details NOT in the main spec?
  • Could someone brute-force the answer by matching example output?
  • Are examples showing implementation hints they should figure out?

Step 4: Generate the Report

Output format - one entry per issue found:

## {short summary of the issue}

> {quote from the spec or tests - be specific}

**RECOMMENDATION:** ADD-DETAIL | REMOVE | SIMPLIFY | COMBINE

**RATIONALE:** {2-3 sentences explaining why this is a problem and how it affects candidate evaluation}

**PROPOSAL:** {Specific suggestion for how to fix this issue}

Recommendation Types

ADD-DETAIL

Use when: The spec is ambiguous but tests expect specific behavior. The candidate would have to guess or brute-force.

Example: Tests expect a specific error message format, but spec just says "return an error"

REMOVE

Use when: Information is unnecessary, gives away architecture, or hand-holds too much.

Example: Spec explains what a hash map is before suggesting to use one

SIMPLIFY

Use when: The spec is overly verbose or complex for what it's describing.

Example: Three paragraphs explaining a simple validation rule

COMBINE

Use when: Related information is scattered across multiple sections.

Example: Error handling described in three different places


Anti-Patterns to Flag

1. The Hidden Oracle

Tests check exact values not derivable from spec.

## Hidden magic number in error response

> Tests expect: `{"error": "E001", "message": "..."}`
> Spec says: "Return an appropriate error"

**RECOMMENDATION:** ADD-DETAIL
**RATIONALE:** Candidates cannot know "E001" is expected without seeing tests. This becomes trial-and-error.
**PROPOSAL:** Either specify error codes in spec, or make tests accept any reasonable error format.

2. The Architecture Blueprint

Spec dictates exact structure.

## Over-specified class structure

> "Create a `RequestHandler` class with methods `parse()`, `validate()`, and `execute()`"

**RECOMMENDATION:** REMOVE
**RATIONALE:** This eliminates design decision making. Let candidates decide their own architecture.
**PROPOSAL:** Describe the behavior needed, not the class structure. "The system should parse, validate, and execute requests."

3. The Buried Requirement

Critical detail hidden in an example.

## Timezone handling buried in example

> Example output shows: `"2024-01-15T10:30:00Z"`
> Main spec doesn't mention timezone handling

**RECOMMENDATION:** ADD-DETAIL
**RATIONALE:** UTC requirement is only visible in example. Should be explicit.
**PROPOSAL:** Add to main spec: "All timestamps must be in UTC with 'Z' suffix."

4. The Obvious Statement

Explaining basic concepts.

## Unnecessary explanation of JSON

> "JSON (JavaScript Object Notation) is a lightweight data format..."

**RECOMMENDATION:** REMOVE
**RATIONALE:** Any candidate for this role knows what JSON is. This wastes spec space.
**PROPOSAL:** Remove the explanation. Just say "Return JSON response."

Quality Checks Before Submitting Report

  1. Is each issue actionable? Every recommendation should have a clear fix.
  2. Are quotes accurate? Copy exact text from spec/tests.
  3. Does the rationale explain impact? Why does this matter for evaluation?
  4. Is the proposal specific? Not "make it better" but "change X to Y"

Example Report

## Error format under-specified

> Spec: "Return an error if the file doesn't exist"
> Test: `assert response.json() == {"error": "FILE_NOT_FOUND", "path": "/missing.txt"}`

**RECOMMENDATION:** ADD-DETAIL

**RATIONALE:** Tests expect a specific error structure with error code and path, but spec only says "return an error". Candidates would need to guess or trial-and-error to match.

**PROPOSAL:** Add to spec: "Errors should return JSON with `error` (string code) and relevant context fields."

---

## Over-specified caching strategy

> "Use an LRU cache with a maximum size of 100 entries, evicting the least recently used item when full"

**RECOMMENDATION:** REMOVE

**RATIONALE:** This dictates a specific caching implementation. The spec should describe the caching requirement (e.g., "cache recent results to avoid redundant computation") and let candidates choose their approach.

**PROPOSAL:** Replace with: "Implement caching for expensive operations. Cache should have bounded memory usage."

---

## Redundant validation description

> Section 2.1: "Validate that the input is a valid JSON object"
> Section 3.4: "Before processing, ensure the request body is valid JSON"
> Section 5.2: "Invalid JSON should return a 400 error"

**RECOMMENDATION:** COMBINE

**RATIONALE:** JSON validation is mentioned in three places. This fragments the spec and could lead to inconsistent interpretations.

**PROPOSAL:** Consolidate into a single "Input Validation" section that covers all validation requirements.

Remember

  • Be thorough - Read every line of the spec and every test
  • Be specific - Quote exact text, not paraphrases
  • Be constructive - Every criticism needs a proposal
  • Think like a candidate - What would confuse or frustrate them?
  • Think like an evaluator - What would make it hard to fairly assess their work?