name	self_consistency_checker
description	Ensures cross-section consistency by verifying theoretical claims match code, benchmarks reflect analysis, and report accurately cites all stages. Use for final validation.
allowed-tools	Read, Grep, Glob

Self-Consistency Checker

Purpose

Validate that all artifacts (spec, theory, code, benchmarks, report) are mutually consistent with no contradictions.

Consistency Checks

1. Spec ↔ Theory

Verify:

Complexity analysis uses correct input constraints from spec
Algorithm satisfies spec invariants
Postconditions guaranteed by algorithm

Example Check:

Spec says: "Output sorted if inputs sorted"
Analysis proves: "Heap extracts elements in order"
✓ Consistent

2. Theory ↔ Code

Verify:

Chosen data structure (e.g., heap) appears in code
Algorithm complexity matches implementation (O(log k) operations)
Code comments reference theoretical analysis

Example Check:

Analysis: "Array-based min-heap"
Java code: Uses PriorityQueue (array-backed) ✓
C++ code: Uses std::priority_queue (array-backed) ✓
Rust code: Uses BinaryHeap (array-backed) ✓

3. Code ↔ Tests

Verify:

Tests validate all spec invariants
Edge cases from spec appear in tests
Test names reference specification

Example Check:

Spec invariant: "Output size = sum of input sizes"
Test: testOutputSizeMatchesInputSum() ✓
Spec edge case: "Empty iterators"
Test: testEmptyIterators() ✓

4. Code ↔ Benchmarks

Verify:

Benchmarks measure the implementations that exist
File references in benchmark setup match actual code locations
Benchmark configurations reflect code capabilities

Example Check:

Benchmark imports: "com.example.MergeIterator"
Code exists: src/main/java/.../MergeIterator.java ✓
Benchmark tests: k=10,100,1000
Code supports: Arbitrary k ✓

5. Theory ↔ Benchmarks

Verify:

Measured complexity matches theoretical prediction (within tolerance)
Bottleneck identified in analysis appears in measurement
Crossover points (if predicted) validated

Example Check:

Theory: O(N log k), predicts 60 cycles/element on M2
Measurement: 58 cycles/element (±4)
Deviation: 3.4% < 20% threshold ✓

Theory: Bottleneck is branches (60%)
Measurement: Profile shows 58% branch cost ✓

6. Benchmarks ↔ Report

Verify:

All benchmark results cited in report are real (not fabricated)
Numbers in tables match raw benchmark output
Analysis claims supported by measurements

Example Check:

Report Table 2: "Rust: 62M elements/sec at k=100"
Benchmark log: "rust_k100: 62.3M ops/sec" ✓

Report: "Rust 38% faster than Java"
Calculation: (62-45)/45 = 37.8% ≈ 38% ✓

7. Report Internal Consistency

Verify:

Abstract matches conclusions
Cross-references valid (Section X exists)
Figures/tables referenced in text
Terminology consistent (don't switch between "iterator" and "stream")

Checker Output

Report Format

## Consistency Check Report

**Generated**: 2024-01-15T12:00:00Z
**Pipeline**: k-way-merge-research

### Summary
- Total checks: 47
- Passed: 45 ✓
- Warnings: 2 ⚠
- Failures: 0 ✗

### Warnings

⚠ **Theory ↔ Benchmark** (Minor)
- Predicted 60 cycles/elem, measured 58 cycles
- Deviation: 3.4% (within tolerance, but model slightly pessimistic)
- Action: Document model assumptions in report

⚠ **Report Cross-Refs** (Minor)
- Section 6.3 referenced but titled "6.2" in document
- Action: Fix section numbering

### Passed Checks
✓ Spec invariants → Code implementation (3/3)
✓ Theory complexity → Code data structures (3/3 languages)
✓ Code → Tests coverage (invariants: 5/5, edges: 8/8)
✓ Benchmarks → Report citations (all 9 results cited correctly)
✓ Report abstract → Conclusions (consistent)

### Details

**Check 1: Spec Invariant "Heap Property"**
- Specified in: docs/spec.md:12
- Implemented in: Java:23, C++:34, Rust:28
- Tested in: Java test:45, C++ test:67, Rust test:52
- Status: ✓ Verified across all implementations

**Check 2: Complexity O(N log k)**
- Analyzed in: docs/analysis.md:23
- Realized in: PriorityQueue (Java), priority_queue (C++), BinaryHeap (Rust)
- Measured: Java 78 cycles (log 100 ≈ 6.6 → ~12 cycles/op ✓)
- Status: ✓ Matches theoretical prediction

[... 45 more checks ...]

Automated Checks

File Existence

# Verify all referenced files exist
grep -r 'src/.*\.java' report.md | check_files_exist

Number Consistency

# Extract numbers from benchmark logs and report
bench_nums = extract_numbers("benchmark_results.log")
report_nums = extract_numbers("report.md")
assert all(n in report_nums for n in bench_nums)

Cross-Reference Validation

# Verify section references
sections = extract_sections("report.md")
references = extract_references("report.md")
assert all(ref in sections for ref in references)

Usage in Pipeline

Invoked as final stage:

Read all artifacts (spec, analysis, code, benchmarks, report)
Extract claims, data, references
Cross-check consistency
Generate report
If failures: Block publication, require fixes
If warnings: Log but allow publication

Tolerance Thresholds

Theory vs Measurement: ±20% acceptable (systems variability)
Report vs Raw Data: ±1% acceptable (rounding)
Cross-references: 0% tolerance (must be exact)

Cross-Skill Integration

Requires: All pipeline artifacts Uses: skill_context_cache for cross-references Feeds into: Final approval for publication

self_consistency_checker

Install Skill

SKILL.md

Self-Consistency Checker

Purpose

Consistency Checks

1. Spec ↔ Theory

2. Theory ↔ Code

3. Code ↔ Tests

4. Code ↔ Benchmarks

5. Theory ↔ Benchmarks

6. Benchmarks ↔ Report

7. Report Internal Consistency

Checker Output

Report Format

Automated Checks

File Existence

Number Consistency

Cross-Reference Validation

Usage in Pipeline

Tolerance Thresholds

Cross-Skill Integration