| name | run-golden-test |
| description | Execute golden tests that compare the extracted OCaml interpreter against MLIR toolchain output. Use when validating semantics implementation or debugging differences between formal semantics and MLIR behavior. |
Run Golden Tests
This skill guides you through running and maintaining golden tests that validate the extracted interpreter against MLIR's official toolchain.
Overview
Golden tests ensure that:
- The extracted interpreter matches MLIR's behavior
- Semantics definitions are faithful to MLIR specification
- Changes don't introduce regressions
The workflow involves:
- Building the extracted interpreter
- Running test cases through both interpreters
- Comparing outputs
- Analyzing and resolving differences
Workflow Steps
Step 1: Ensure Prerequisites
Check that required tools are available:
# Check MLIR tools
which mlir-opt
which mlir-translate
# Check build system
dune --version
# Check OCaml
ocaml --version
If tools are missing, guide user to install them.
Step 2: Build the Extracted Interpreter
Extract and build the OCaml interpreter:
# Build Coq project
dune build
# Extract to OCaml
dune build @extract
# Build extracted interpreter
dune build driver/mlir_interp.exe
If extraction fails:
- Check extraction configuration in
src/Extraction/ - Verify all Coq modules compile successfully
- Look for non-computational definitions that can't be extracted
Step 3: Identify Test Cases
Determine which tests to run:
- All tests:
test/oracle/**/*.mlir - Specific dialect:
test/oracle/arith/*.mlir - Specific test:
test/oracle/arith/addi_simple.mlir
Ask user which scope to test if not specified.
Step 4: Run MLIR Toolchain on Test Cases
For each .mlir test file:
# Validate MLIR file
mlir-opt --verify-diagnostics test/oracle/arith/addi_simple.mlir
# Run through MLIR interpreter (if available)
mlir-cpu-runner test/oracle/arith/addi_simple.mlir
# Or use custom MLIR pass/tool
mlir-opt --some-pass test/oracle/arith/addi_simple.mlir -o expected.mlir
Save MLIR's output as the "expected" result.
Step 5: Run Extracted Interpreter on Test Cases
Run the extracted interpreter:
# Run interpreter on test file
_build/default/driver/mlir_interp.exe test/oracle/arith/addi_simple.mlir
Save the extracted interpreter's output as the "actual" result.
Step 6: Compare Outputs
Compare expected vs actual results:
# Direct comparison
diff expected.txt actual.txt
# Or use test driver
dune test
Comparison strategies:
- Exact match: Outputs should be identical
- Semantic equivalence: Allow different but equivalent representations
- Approximate match: For floating-point values
Document the comparison strategy for each test.
Step 7: Analyze Differences
If outputs differ, investigate:
Is MLIR behavior correct?
- Check MLIR documentation
- Review dialect specifications
- Test with MLIR's own test suite
Is our semantics definition correct?
- Review Coq definitions
- Check ITree handler implementations
- Verify extraction configuration
Is this an acceptable difference?
- Some differences may be intentional (e.g., different error messages)
- Document known differences in test metadata
Step 8: Update Test Expectations (if needed)
If the difference is expected:
- Update expected output: If MLIR behavior changed
- Fix semantics: If our definition was wrong
- Document difference: If difference is acceptable
Add comments to test files explaining expected behavior:
// RUN: mlir-opt %s | FileCheck %s
// EXPECTED: Result should be constant 42
func.func @test() -> i32 {
%c1 = arith.constant 1 : i32
%c41 = arith.constant 41 : i32
%result = arith.addi %c1, %c41 : i32
return %result : i32
}
// CHECK: %[[RES:.*]] = arith.constant 42 : i32
Step 9: Create Regression Tests
For each fixed bug or resolved difference:
- Add a specific test case to
test/oracle/ - Document what the test validates
- Ensure test is run in CI
Test Organization
Organize tests by dialect and complexity:
test/oracle/
├── arith/
│ ├── constant.mlir # Simple constant operations
│ ├── addi_simple.mlir # Basic addition
│ ├── addi_overflow.mlir # Edge case: overflow
│ └── complex_expr.mlir # Complex expressions
├── scf/
│ ├── if_simple.mlir # Basic conditionals
│ ├── for_loop.mlir # Loop constructs
│ └── nested_control_flow.mlir # Nested structures
└── integration/
├── multi_dialect.mlir # Cross-dialect tests
└── optimization.mlir # Pass integration tests
Test Metadata
Add metadata to each test file:
// TEST: arith.addi semantics
// DIALECT: arith
// FEATURES: integer arithmetic, constant folding
// MLIR-VERSION: 1.0
// EXPECTED-BEHAVIOR: Addition should be commutative and associative
// KNOWN-ISSUES: None
Automation with Test Driver
The test driver (test/test_driver.ml) should:
- Discover all
.mlirfiles intest/oracle/ - Run each through both interpreters
- Compare outputs
- Report results (pass/fail)
- Store failing cases for debugging
Example test driver usage:
# Run all golden tests
dune test
# Run specific test
dune exec -- test/test_driver.exe test/oracle/arith/addi_simple.mlir
# Update all expectations
dune test --force
Debugging Differences
When a test fails:
Step 1: Isolate the Issue
# Run single test
_build/default/driver/mlir_interp.exe test/oracle/failing_test.mlir -v
# Enable debug output
MLIR_INTERP_DEBUG=1 _build/default/driver/mlir_interp.exe test/oracle/failing_test.mlir
Step 2: Examine Intermediate States
- Add logging to ITree handlers
- Print intermediate values in transformation
- Compare step-by-step execution
Step 3: Minimize Test Case
Reduce failing test to minimal example:
// Original (complex)
func.func @complex() { ... 20 operations ... }
// Minimized
func.func @minimal() {
%c1 = arith.constant 1 : i32 // This line causes failure
return %c1 : i32
}
Step 4: Check Semantics Definition
Review Coq definitions:
(* Is this the correct semantics for arith.constant? *)
Definition interp_constant (t : type) (v : value) : itree E value :=
Ret v. (* Maybe this should trigger some event? *)
Step 5: Verify Extraction
Check that extracted OCaml code is correct:
(* _build/default/src/Semantics/Arith.ml *)
let interp_constant t v = Ret v (* Is this what we expect? *)
Continuous Integration
Integrate golden tests into CI:
# CI/.github/workflows/test.yml
- name: Run Golden Tests
run: |
dune build @extract
dune test
# Fail if any differences found
if [ $? -ne 0 ]; then
echo "Golden tests failed!"
exit 1
fi
Store test artifacts:
- Failing test outputs
- Diff results
- Debug logs
Updating Tests After MLIR Changes
When MLIR version updates:
- Re-run all tests:
dune test --force - Review failures: Some may be due to MLIR behavior changes
- Update expectations: If changes are correct
- Update semantics: If our definitions need adjustment
- Document changes: Note MLIR version and behavior changes in ADR
Best Practices
- Test Early: Add golden tests as soon as dialect support is added
- Test Often: Run tests on every commit (CI)
- Test Thoroughly: Cover edge cases, not just happy paths
- Document Tests: Explain what each test validates
- Keep Tests Small: One concept per test file
- Version Control Expectations: Commit expected outputs
- Automate Everything: Use test driver, don't manually compare
Common Issues
Extraction Fails
- Cause: Non-computational definitions, missing extraction hints
- Fix: Add extraction configuration, refactor definitions
Output Format Mismatch
- Cause: Different pretty-printing between MLIR and extracted interpreter
- Fix: Normalize outputs before comparison, or use semantic comparison
Test Flakiness
- Cause: Non-deterministic behavior, timing issues
- Fix: Make tests deterministic, use fixed seeds for random testing
Performance Issues
- Cause: Extracted code is slow, large test suite
- Fix: Profile and optimize hot paths, parallelize tests
Integration with Other Workflows
Golden tests complement:
- Unit tests: Unit tests check small pieces, golden tests check integration
- Property tests: Properties check invariants, golden tests check concrete behavior
- Proof verification: Proofs ensure correctness, golden tests validate implementation
Example: Adding a Golden Test
# 1. Create test file
cat > test/oracle/arith/addi_simple.mlir <<EOF
// TEST: Basic integer addition
func.func @test_addi() -> i32 {
%c1 = arith.constant 1 : i32
%c2 = arith.constant 2 : i32
%sum = arith.addi %c1, %c2 : i32
return %sum : i32
}
// EXPECTED: 3
EOF
# 2. Run through MLIR
mlir-opt test/oracle/arith/addi_simple.mlir
# 3. Build interpreter
dune build driver/mlir_interp.exe
# 4. Run test
dune exec -- test/test_driver.exe test/oracle/arith/addi_simple.mlir
# 5. Check result
echo "Test passed!"
After Running Tests
Once tests complete:
- Record results: Pass/fail counts, timing
- Update dashboard: If you have test result tracking
- Fix failures: Don't let failures accumulate
- Celebrate success: When all tests pass!