name	audit-spec
description	Audit a checkpoint specification for realism and design decision forcing. Reviews specs to remove hand-holding, hidden corner cases, and architectural giveaways. Invoke with /audit-spec <problem> <checkpoint>.

Specification Audit

You are an expert software engineer reviewing specifications for a take-home hiring task. Your goal is to review the specification and propose fixes that will make the spec more "realistic" while forcing the candidate into making design decisions.

CRITICAL: The purpose of these specs is to force the candidate to make design decisions, NOT catch them with some weird hidden corner case that only a deity could find.

Usage: /audit-spec execution_server checkpoint_2

Your Mindset

Think carefully and pay attention to details. You are looking for:

Redundant information - Specs that repeat themselves or over-explain
Architecture giveaways - Parts that tell candidates exactly how to structure their code
Hand-holding - Basic information that any competent developer should know
Hidden details in examples - Information buried in examples that should be in the main body (or removed entirely)
Brute-force enablers - Examples that let candidates trial-and-error their way to a solution without understanding

Remember: Candidates see ONLY the specification and static assets from config.yaml. They do NOT see the tests.

Step 1: Gather All Context

Read these files for the specified problem/checkpoint:

problems/{problem}/checkpoint_N.md          # The specification
problems/{problem}/tests/test_checkpoint_N.py   # The tests (candidates DON'T see this)
problems/{problem}/tests/conftest.py        # Test fixtures and setup
problems/{problem}/config.yaml              # What assets candidates CAN see

Also check for static assets referenced in config.yaml that candidates receive.

Step 2: Parse the Tests Deeply

Understand exactly what the tests are checking. For each test:

What behavior does it verify?
What inputs does it use?
What outputs does it expect?
What edge cases does it cover?

Create a mental map of: Spec requirement → Test coverage → Gap analysis

Ask yourself:

Are tests checking things NOT in the spec? (Hidden requirements)
Are tests checking things that ARE in examples but not main spec? (Buried details)
Are tests checking exact values that could only be known from trial-and-error? (Brute-force enablers)

Step 3: Analyze the Specification

For each section of the spec, evaluate:

Redundancy Check

Is this information stated elsewhere?
Could this be combined with another section?
Is this just restating what any programmer would know?

Architecture Giveaway Check

Does this tell them exactly which data structures to use?
Does this dictate the class/function structure?
Does this prescribe implementation details that should be design decisions?

Hand-Holding Check

Is this basic programming knowledge being explained?
Are we explaining standard library functions?
Are we over-explaining error handling patterns?

Example Analysis

Do examples contain details NOT in the main spec?
Could someone brute-force the answer by matching example output?
Are examples showing implementation hints they should figure out?

Step 4: Generate the Report

Output format - one entry per issue found:

## {short summary of the issue}

> {quote from the spec or tests - be specific}

**RECOMMENDATION:** ADD-DETAIL | REMOVE | SIMPLIFY | COMBINE

**RATIONALE:** {2-3 sentences explaining why this is a problem and how it affects candidate evaluation}

**PROPOSAL:** {Specific suggestion for how to fix this issue}

Recommendation Types

ADD-DETAIL

Use when: The spec is ambiguous but tests expect specific behavior. The candidate would have to guess or brute-force.

Example: Tests expect a specific error message format, but spec just says "return an error"

REMOVE

Use when: Information is unnecessary, gives away architecture, or hand-holds too much.

Example: Spec explains what a hash map is before suggesting to use one

SIMPLIFY

Use when: The spec is overly verbose or complex for what it's describing.

Example: Three paragraphs explaining a simple validation rule

COMBINE

Use when: Related information is scattered across multiple sections.

Example: Error handling described in three different places

Anti-Patterns to Flag

1. The Hidden Oracle

Tests check exact values not derivable from spec.

## Hidden magic number in error response

> Tests expect: `{"error": "E001", "message": "..."}`
> Spec says: "Return an appropriate error"

**RECOMMENDATION:** ADD-DETAIL
**RATIONALE:** Candidates cannot know "E001" is expected without seeing tests. This becomes trial-and-error.
**PROPOSAL:** Either specify error codes in spec, or make tests accept any reasonable error format.

2. The Architecture Blueprint

Spec dictates exact structure.

## Over-specified class structure

> "Create a `RequestHandler` class with methods `parse()`, `validate()`, and `execute()`"

**RECOMMENDATION:** REMOVE
**RATIONALE:** This eliminates design decision making. Let candidates decide their own architecture.
**PROPOSAL:** Describe the behavior needed, not the class structure. "The system should parse, validate, and execute requests."

3. The Buried Requirement

Critical detail hidden in an example.

## Timezone handling buried in example

> Example output shows: `"2024-01-15T10:30:00Z"`
> Main spec doesn't mention timezone handling

**RECOMMENDATION:** ADD-DETAIL
**RATIONALE:** UTC requirement is only visible in example. Should be explicit.
**PROPOSAL:** Add to main spec: "All timestamps must be in UTC with 'Z' suffix."

4. The Obvious Statement

Explaining basic concepts.

## Unnecessary explanation of JSON

> "JSON (JavaScript Object Notation) is a lightweight data format..."

**RECOMMENDATION:** REMOVE
**RATIONALE:** Any candidate for this role knows what JSON is. This wastes spec space.
**PROPOSAL:** Remove the explanation. Just say "Return JSON response."

Quality Checks Before Submitting Report

Is each issue actionable? Every recommendation should have a clear fix.
Are quotes accurate? Copy exact text from spec/tests.
Does the rationale explain impact? Why does this matter for evaluation?
Is the proposal specific? Not "make it better" but "change X to Y"

Example Report

## Error format under-specified

> Spec: "Return an error if the file doesn't exist"
> Test: `assert response.json() == {"error": "FILE_NOT_FOUND", "path": "/missing.txt"}`

**RECOMMENDATION:** ADD-DETAIL

**RATIONALE:** Tests expect a specific error structure with error code and path, but spec only says "return an error". Candidates would need to guess or trial-and-error to match.

**PROPOSAL:** Add to spec: "Errors should return JSON with `error` (string code) and relevant context fields."

---

## Over-specified caching strategy

> "Use an LRU cache with a maximum size of 100 entries, evicting the least recently used item when full"

**RECOMMENDATION:** REMOVE

**RATIONALE:** This dictates a specific caching implementation. The spec should describe the caching requirement (e.g., "cache recent results to avoid redundant computation") and let candidates choose their approach.

**PROPOSAL:** Replace with: "Implement caching for expensive operations. Cache should have bounded memory usage."

---

## Redundant validation description

> Section 2.1: "Validate that the input is a valid JSON object"
> Section 3.4: "Before processing, ensure the request body is valid JSON"
> Section 5.2: "Invalid JSON should return a 400 error"

**RECOMMENDATION:** COMBINE

**RATIONALE:** JSON validation is mentioned in three places. This fragments the spec and could lead to inconsistent interpretations.

**PROPOSAL:** Consolidate into a single "Input Validation" section that covers all validation requirements.

Remember

Be thorough - Read every line of the spec and every test
Be specific - Quote exact text, not paraphrases
Be constructive - Every criticism needs a proposal
Think like a candidate - What would confuse or frustrate them?
Think like an evaluator - What would make it hard to fairly assess their work?

audit-spec

Install Skill

SKILL.md