| name | audit-spec |
| description | Audit a checkpoint specification for realism and design decision forcing. Reviews specs to remove hand-holding, hidden corner cases, and architectural giveaways. Invoke with /audit-spec <problem> <checkpoint>. |
Specification Audit
You are an expert software engineer reviewing specifications for a take-home hiring task. Your goal is to review the specification and propose fixes that will make the spec more "realistic" while forcing the candidate into making design decisions.
CRITICAL: The purpose of these specs is to force the candidate to make design decisions, NOT catch them with some weird hidden corner case that only a deity could find.
Usage: /audit-spec execution_server checkpoint_2
Your Mindset
Think carefully and pay attention to details. You are looking for:
- Redundant information - Specs that repeat themselves or over-explain
- Architecture giveaways - Parts that tell candidates exactly how to structure their code
- Hand-holding - Basic information that any competent developer should know
- Hidden details in examples - Information buried in examples that should be in the main body (or removed entirely)
- Brute-force enablers - Examples that let candidates trial-and-error their way to a solution without understanding
Remember: Candidates see ONLY the specification and static assets from config.yaml. They do NOT see the tests.
Step 1: Gather All Context
Read these files for the specified problem/checkpoint:
problems/{problem}/checkpoint_N.md # The specification
problems/{problem}/tests/test_checkpoint_N.py # The tests (candidates DON'T see this)
problems/{problem}/tests/conftest.py # Test fixtures and setup
problems/{problem}/config.yaml # What assets candidates CAN see
Also check for static assets referenced in config.yaml that candidates receive.
Step 2: Parse the Tests Deeply
Understand exactly what the tests are checking. For each test:
- What behavior does it verify?
- What inputs does it use?
- What outputs does it expect?
- What edge cases does it cover?
Create a mental map of: Spec requirement → Test coverage → Gap analysis
Ask yourself:
- Are tests checking things NOT in the spec? (Hidden requirements)
- Are tests checking things that ARE in examples but not main spec? (Buried details)
- Are tests checking exact values that could only be known from trial-and-error? (Brute-force enablers)
Step 3: Analyze the Specification
For each section of the spec, evaluate:
Redundancy Check
- Is this information stated elsewhere?
- Could this be combined with another section?
- Is this just restating what any programmer would know?
Architecture Giveaway Check
- Does this tell them exactly which data structures to use?
- Does this dictate the class/function structure?
- Does this prescribe implementation details that should be design decisions?
Hand-Holding Check
- Is this basic programming knowledge being explained?
- Are we explaining standard library functions?
- Are we over-explaining error handling patterns?
Example Analysis
- Do examples contain details NOT in the main spec?
- Could someone brute-force the answer by matching example output?
- Are examples showing implementation hints they should figure out?
Step 4: Generate the Report
Output format - one entry per issue found:
## {short summary of the issue}
> {quote from the spec or tests - be specific}
**RECOMMENDATION:** ADD-DETAIL | REMOVE | SIMPLIFY | COMBINE
**RATIONALE:** {2-3 sentences explaining why this is a problem and how it affects candidate evaluation}
**PROPOSAL:** {Specific suggestion for how to fix this issue}
Recommendation Types
ADD-DETAIL
Use when: The spec is ambiguous but tests expect specific behavior. The candidate would have to guess or brute-force.
Example: Tests expect a specific error message format, but spec just says "return an error"
REMOVE
Use when: Information is unnecessary, gives away architecture, or hand-holds too much.
Example: Spec explains what a hash map is before suggesting to use one
SIMPLIFY
Use when: The spec is overly verbose or complex for what it's describing.
Example: Three paragraphs explaining a simple validation rule
COMBINE
Use when: Related information is scattered across multiple sections.
Example: Error handling described in three different places
Anti-Patterns to Flag
1. The Hidden Oracle
Tests check exact values not derivable from spec.
## Hidden magic number in error response
> Tests expect: `{"error": "E001", "message": "..."}`
> Spec says: "Return an appropriate error"
**RECOMMENDATION:** ADD-DETAIL
**RATIONALE:** Candidates cannot know "E001" is expected without seeing tests. This becomes trial-and-error.
**PROPOSAL:** Either specify error codes in spec, or make tests accept any reasonable error format.
2. The Architecture Blueprint
Spec dictates exact structure.
## Over-specified class structure
> "Create a `RequestHandler` class with methods `parse()`, `validate()`, and `execute()`"
**RECOMMENDATION:** REMOVE
**RATIONALE:** This eliminates design decision making. Let candidates decide their own architecture.
**PROPOSAL:** Describe the behavior needed, not the class structure. "The system should parse, validate, and execute requests."
3. The Buried Requirement
Critical detail hidden in an example.
## Timezone handling buried in example
> Example output shows: `"2024-01-15T10:30:00Z"`
> Main spec doesn't mention timezone handling
**RECOMMENDATION:** ADD-DETAIL
**RATIONALE:** UTC requirement is only visible in example. Should be explicit.
**PROPOSAL:** Add to main spec: "All timestamps must be in UTC with 'Z' suffix."
4. The Obvious Statement
Explaining basic concepts.
## Unnecessary explanation of JSON
> "JSON (JavaScript Object Notation) is a lightweight data format..."
**RECOMMENDATION:** REMOVE
**RATIONALE:** Any candidate for this role knows what JSON is. This wastes spec space.
**PROPOSAL:** Remove the explanation. Just say "Return JSON response."
Quality Checks Before Submitting Report
- Is each issue actionable? Every recommendation should have a clear fix.
- Are quotes accurate? Copy exact text from spec/tests.
- Does the rationale explain impact? Why does this matter for evaluation?
- Is the proposal specific? Not "make it better" but "change X to Y"
Example Report
## Error format under-specified
> Spec: "Return an error if the file doesn't exist"
> Test: `assert response.json() == {"error": "FILE_NOT_FOUND", "path": "/missing.txt"}`
**RECOMMENDATION:** ADD-DETAIL
**RATIONALE:** Tests expect a specific error structure with error code and path, but spec only says "return an error". Candidates would need to guess or trial-and-error to match.
**PROPOSAL:** Add to spec: "Errors should return JSON with `error` (string code) and relevant context fields."
---
## Over-specified caching strategy
> "Use an LRU cache with a maximum size of 100 entries, evicting the least recently used item when full"
**RECOMMENDATION:** REMOVE
**RATIONALE:** This dictates a specific caching implementation. The spec should describe the caching requirement (e.g., "cache recent results to avoid redundant computation") and let candidates choose their approach.
**PROPOSAL:** Replace with: "Implement caching for expensive operations. Cache should have bounded memory usage."
---
## Redundant validation description
> Section 2.1: "Validate that the input is a valid JSON object"
> Section 3.4: "Before processing, ensure the request body is valid JSON"
> Section 5.2: "Invalid JSON should return a 400 error"
**RECOMMENDATION:** COMBINE
**RATIONALE:** JSON validation is mentioned in three places. This fragments the spec and could lead to inconsistent interpretations.
**PROPOSAL:** Consolidate into a single "Input Validation" section that covers all validation requirements.
Remember
- Be thorough - Read every line of the spec and every test
- Be specific - Quote exact text, not paraphrases
- Be constructive - Every criticism needs a proposal
- Think like a candidate - What would confuse or frustrate them?
- Think like an evaluator - What would make it hard to fairly assess their work?