| name | when-validating-code-works-use-functionality-audit |
| type | testing-quality |
| description | Validates that code actually works through sandbox testing, execution verification, and systematic debugging. Use this skill after code generation or modification to ensure functionality is genuine rather than assumed. The skill creates isolated test environments, executes code with realistic inputs, identifies bugs through systematic analysis, and applies best practices to fix issues without breaking existing functionality. |
| agents | tester, coder, reviewer, production-validator |
| phases | 5 |
| memory_pattern | testing/functionality-audit/phase-{N}/{agent}/{deliverable} |
| scripts | pre-task, session-restore, phase-coordination, memory-sync, post-task |
| hooks | [object Object] |
Functionality Audit - Code Execution Validation
When to Use This Skill
Trigger Conditions:
- After generating new code or modifying existing code
- When code appears complete but actual functionality is uncertain
- Before merging PRs or deploying to production
- When debugging reported issues or unexpected behavior
- As part of quality assurance workflows
- When validating third-party code integrations
Situations Requiring Functionality Audit:
- Code generated by AI that needs execution validation
- Complex logic changes requiring runtime verification
- Integration of new libraries or dependencies
- Refactoring that may have introduced regressions
- Migration to new frameworks or language versions
Overview
This skill systematically validates that code delivers its intended behavior through actual execution rather than static analysis alone. It creates isolated testing environments (sandboxes), executes code with realistic inputs, captures outputs and errors, identifies root causes of failures through systematic debugging, and applies fixes using best practices that preserve existing functionality.
The skill emphasizes genuine functionality over appearance, detecting "theater code" that looks correct but fails during execution. It combines automated testing, manual validation, and debugging expertise to ensure code reliability.
Phase 1: Setup Testing Environment (Sequential)
Agents: tester (lead), coder (support) Duration: 10-15 minutes
Scripts:
# Initialize phase
npx claude-flow hooks pre-task --description "Phase 1: Setup Testing Environment"
npx claude-flow swarm init --topology hierarchical --max-agents 2
# Spawn agents
npx claude-flow agent spawn --type tester --capabilities "sandbox-setup,environment-config,dependency-management"
npx claude-flow agent spawn --type coder --capabilities "tooling-setup,script-generation"
# Memory coordination - store environment config
npx claude-flow memory store --key "testing/functionality-audit/phase-1/tester/sandbox-config" --value '{"isolated":true,"snapshot_enabled":true}'
npx claude-flow memory store --key "testing/functionality-audit/phase-1/coder/dependencies" --value '{"package_manager":"npm","install_command":"npm install"}'
# Execute phase work
# 1. Create isolated sandbox environment
echo "Creating sandbox for code execution..."
mkdir -p /tmp/functionality-audit-sandbox
cd /tmp/functionality-audit-sandbox
# 2. Install dependencies and setup tools
npm init -y 2>/dev/null || true
npm install --save-dev jest @types/jest ts-node typescript 2>/dev/null || true
# 3. Configure testing framework
cat > jest.config.js << 'EOF'
module.exports = {
preset: 'ts-jest',
testEnvironment: 'node',
collectCoverage: true,
coverageDirectory: 'coverage',
testMatch: ['**/*.test.ts', '**/*.test.js'],
verbose: true
};
EOF
# Complete phase
npx claude-flow hooks post-task --task-id "phase-1-setup"
npx claude-flow memory store --key "testing/functionality-audit/phase-1/output" --value '{"status":"complete","sandbox_ready":true}'
Memory Pattern:
- Input:
testing/functionality-audit/phase-0/user/code-to-validate - Output:
testing/functionality-audit/phase-1/tester/sandbox-ready - Shared:
testing/functionality-audit/shared/environment-config
Success Criteria:
- Isolated sandbox environment created successfully
- All required dependencies installed without errors
- Testing framework configured and operational
- Environment snapshot created for rollback capability
- Sandbox validation completed (basic sanity checks pass)
Deliverables:
- Configured sandbox environment
- Dependency manifest (package.json or requirements.txt)
- Testing framework configuration files
- Environment setup documentation
Phase 2: Execute Code with Realistic Inputs (Parallel)
Agents: tester (lead), production-validator (support) Duration: 15-20 minutes
Scripts:
# Initialize phase
npx claude-flow hooks pre-task --description "Phase 2: Execute Code with Realistic Inputs"
npx claude-flow swarm scale --target-agents 2
# Retrieve sandbox config from Phase 1
npx claude-flow memory retrieve --key "testing/functionality-audit/phase-1/output"
# Memory coordination - define test scenarios
npx claude-flow memory store --key "testing/functionality-audit/phase-2/tester/test-scenarios" --value '{"unit_tests":true,"integration_tests":true,"edge_cases":true}'
# Execute phase work
# 1. Copy code to sandbox
echo "Copying code to sandbox environment..."
# (Code paths retrieved from memory)
# 2. Generate test cases with realistic inputs
cat > sandbox-tests.test.js << 'EOF'
// Auto-generated test cases for functionality validation
describe('Functionality Audit Tests', () => {
test('Happy path with valid inputs', async () => {
// Test implementation with realistic data
});
test('Edge cases and boundary conditions', async () => {
// Test edge cases
});
test('Error handling with invalid inputs', async () => {
// Test error scenarios
});
});
EOF
# 3. Execute tests and capture output
npm test -- --coverage --verbose > test-output.log 2>&1
TEST_EXIT_CODE=$?
# 4. Store results in memory
npx claude-flow memory store --key "testing/functionality-audit/phase-2/tester/execution-results" --value "{\"exit_code\":$TEST_EXIT_CODE,\"timestamp\":\"$(date -Iseconds)\"}"
# Complete phase
npx claude-flow hooks post-task --task-id "phase-2-execute"
Memory Pattern:
- Input:
testing/functionality-audit/phase-1/tester/sandbox-ready - Output:
testing/functionality-audit/phase-2/tester/execution-results - Shared:
testing/functionality-audit/shared/test-logs
Success Criteria:
- All test cases executed without environment errors
- Output captured completely (stdout, stderr, exit codes)
- Code coverage metrics collected successfully
- Performance metrics recorded (execution time, memory usage)
- Results stored in structured format for analysis
Deliverables:
- Test execution logs with full output
- Code coverage reports
- Performance metrics
- Captured errors and stack traces
- Test results summary
Phase 3: Debug Issues (Sequential)
Agents: coder (lead), tester (support), reviewer (validation) Duration: 20-30 minutes
Scripts:
# Initialize phase
npx claude-flow hooks pre-task --description "Phase 3: Debug Issues"
npx claude-flow swarm scale --target-agents 3
# Retrieve execution results from Phase 2
npx claude-flow memory retrieve --key "testing/functionality-audit/phase-2/tester/execution-results"
# Memory coordination - analyze failures
npx claude-flow memory store --key "testing/functionality-audit/phase-3/coder/debugging-strategy" --value '{"method":"systematic-root-cause","tools":["debugger","logs","profiler"]}'
# Execute phase work
# 1. Analyze test failures systematically
echo "Analyzing failures and errors..."
grep -E "(FAIL|ERROR|Exception)" test-output.log > failures.txt || true
# 2. Identify root causes using debugging techniques
# - Stack trace analysis
# - Variable inspection
# - Logic flow validation
# - Dependency conflict detection
# 3. Categorize issues by type
cat > issue-analysis.json << 'EOF'
{
"syntax_errors": [],
"runtime_errors": [],
"logic_errors": [],
"integration_failures": [],
"dependency_issues": []
}
EOF
# 4. Prioritize fixes by impact
# Critical: Code doesn't run at all
# High: Core functionality broken
# Medium: Edge cases failing
# Low: Minor issues, optimizations
# Store debugging results
npx claude-flow memory store --key "testing/functionality-audit/phase-3/coder/root-causes" --value "$(cat issue-analysis.json)"
# Complete phase
npx claude-flow hooks post-task --task-id "phase-3-debug"
Memory Pattern:
- Input:
testing/functionality-audit/phase-2/tester/execution-results - Output:
testing/functionality-audit/phase-3/coder/root-causes - Shared:
testing/functionality-audit/shared/issue-tracker
Success Criteria:
- All failures categorized by root cause type
- Stack traces analyzed and understood completely
- Dependencies validated and conflicts identified
- Logic errors traced to specific code locations
- Fix strategy documented with priorities
Deliverables:
- Root cause analysis report
- Issue categorization matrix
- Priority-ranked fix list
- Debugging session logs
- Dependency conflict resolution plan
Phase 4: Validate Functionality (Parallel)
Agents: production-validator (lead), reviewer (quality gate), tester (regression) Duration: 15-25 minutes
Scripts:
# Initialize phase
npx claude-flow hooks pre-task --description "Phase 4: Validate Functionality"
npx claude-flow swarm scale --target-agents 3
# Retrieve fixes from Phase 3
npx claude-flow memory retrieve --key "testing/functionality-audit/phase-3/coder/root-causes"
# Memory coordination - define validation criteria
npx claude-flow memory store --key "testing/functionality-audit/phase-4/validator/criteria" --value '{"functional_correctness":true,"performance_acceptable":true,"no_regressions":true}'
# Execute phase work
# 1. Apply fixes and rerun tests
echo "Applying fixes and validating..."
npm test -- --coverage --verbose > retest-output.log 2>&1
RETEST_EXIT_CODE=$?
# 2. Compare results with baseline
diff test-output.log retest-output.log > validation-diff.txt || true
# 3. Validate no regressions introduced
# - Check previously passing tests still pass
# - Verify no new failures introduced
# - Confirm fixes resolved original issues
# 4. Production readiness assessment
cat > production-readiness.json << 'EOF'
{
"all_tests_passing": false,
"coverage_threshold_met": false,
"performance_acceptable": false,
"security_validated": false,
"ready_for_deployment": false
}
EOF
# Update readiness based on results
if [ $RETEST_EXIT_CODE -eq 0 ]; then
echo "Tests passing, updating readiness..."
fi
# Store validation results
npx claude-flow memory store --key "testing/functionality-audit/phase-4/validator/readiness" --value "$(cat production-readiness.json)"
# Complete phase
npx claude-flow hooks post-task --task-id "phase-4-validate"
Memory Pattern:
- Input:
testing/functionality-audit/phase-3/coder/root-causes - Output:
testing/functionality-audit/phase-4/validator/readiness - Shared:
testing/functionality-audit/shared/validation-results
Success Criteria:
- All critical tests passing without failures
- Code coverage meets or exceeds threshold (80%+)
- No regressions detected in previously working code
- Performance metrics within acceptable ranges
- Production readiness criteria satisfied
Deliverables:
- Validation test results
- Regression analysis report
- Production readiness assessment
- Performance comparison metrics
- Quality gate approval status
Phase 5: Report Results (Sequential)
Agents: reviewer (lead), tester (metrics) Duration: 10-15 minutes
Scripts:
# Initialize phase
npx claude-flow hooks pre-task --description "Phase 5: Report Results"
npx claude-flow swarm scale --target-agents 2
# Retrieve all phase outputs
npx claude-flow memory retrieve --key "testing/functionality-audit/phase-4/validator/readiness"
npx claude-flow memory retrieve --key "testing/functionality-audit/phase-3/coder/root-causes"
npx claude-flow memory retrieve --key "testing/functionality-audit/phase-2/tester/execution-results"
# Execute phase work
# 1. Compile comprehensive audit report
cat > functionality-audit-report.md << 'EOF'
# Functionality Audit Report
## Executive Summary
- **Status**: [PASS/FAIL/PARTIAL]
- **Tests Executed**: X
- **Tests Passed**: Y
- **Code Coverage**: Z%
- **Critical Issues**: N
## Phase-by-Phase Results
### Phase 1: Environment Setup
- Status: Complete
- Issues: None
### Phase 2: Execution Testing
- Total Tests: X
- Passed: Y
- Failed: Z
- Coverage: N%
### Phase 3: Debugging
- Issues Identified: X
- Root Causes: Y categories
- Fixes Applied: Z
### Phase 4: Validation
- Regression Status: No regressions
- Production Ready: Yes/No
- Performance: Acceptable
## Recommendations
1. [Action items for addressing remaining issues]
2. [Performance optimization suggestions]
3. [Code quality improvements]
## Artifacts
- Test logs: test-output.log
- Coverage report: coverage/
- Issue analysis: issue-analysis.json
- Validation results: production-readiness.json
EOF
# 2. Generate metrics dashboard
cat > audit-metrics.json << 'EOF'
{
"timestamp": "",
"duration_minutes": 0,
"tests_total": 0,
"tests_passed": 0,
"coverage_percentage": 0,
"issues_found": 0,
"issues_fixed": 0,
"production_ready": false
}
EOF
# 3. Store final results
npx claude-flow memory store --key "testing/functionality-audit/final-report" --value "$(cat functionality-audit-report.md)"
# Complete phase and export metrics
npx claude-flow hooks post-task --task-id "phase-5-report" --export-metrics true
npx claude-flow hooks session-end --export-metrics true
Memory Pattern:
- Input:
testing/functionality-audit/phase-{1-4}/*/output - Output:
testing/functionality-audit/final-report - Shared:
testing/functionality-audit/shared/complete-audit
Success Criteria:
- Comprehensive audit report generated with all findings
- Metrics compiled and formatted for stakeholder consumption
- Actionable recommendations documented clearly
- All artifacts linked and accessible
- Results stored in memory for future reference
Deliverables:
- Comprehensive functionality audit report (Markdown)
- Metrics dashboard (JSON)
- Recommendations document
- Complete test artifacts package
- Memory-persisted results for trend analysis
Memory Coordination
Namespace Convention
testing/functionality-audit/phase-{N}/{agent-type}/{data-type}
Examples:
testing/functionality-audit/phase-1/tester/sandbox-configtesting/functionality-audit/phase-2/tester/execution-resultstesting/functionality-audit/phase-3/coder/root-causestesting/functionality-audit/phase-4/validator/readinesstesting/functionality-audit/shared/environment-config
Cross-Agent Sharing
// Store outputs for downstream agents
mcp__claude_flow__memory_store({
key: "testing/functionality-audit/phase-2/tester/output",
value: JSON.stringify({
status: "complete",
results: {
tests_run: 45,
tests_passed: 38,
tests_failed: 7,
coverage: 82.5,
execution_time_ms: 3420
},
timestamp: Date.now(),
artifacts: {
logs: "/tmp/functionality-audit-sandbox/test-output.log",
coverage: "/tmp/functionality-audit-sandbox/coverage/"
}
})
})
// Retrieve inputs from upstream agents
mcp__claude_flow__memory_retrieve({
key: "testing/functionality-audit/phase-1/tester/sandbox-ready"
}).then(data => {
const sandboxConfig = JSON.parse(data);
console.log(`Using sandbox: ${sandboxConfig.path}`);
})
Automation Scripts
Hook Integration
# Pre-task hook - Initialize audit session
npx claude-flow hooks pre-task \
--description "Functionality audit for module: auth-service" \
--session-id "functionality-audit-$(date +%s)"
# Post-edit hook - Track code changes during debugging
npx claude-flow hooks post-edit \
--file "src/auth/login.ts" \
--memory-key "testing/functionality-audit/coder/edits"
# Post-task hook - Finalize and export metrics
npx claude-flow hooks post-task \
--task-id "functionality-audit" \
--export-metrics true
# Session end - Generate summary and persist state
npx claude-flow hooks session-end \
--export-metrics true \
--summary "Functionality audit completed: 38/45 tests passing, 7 issues debugged and fixed"
Evidence-Based Validation
Self-Consistency Checking
Before finalizing each phase, validate:
Does this approach align with successful past work?
- Compare testing strategy with historical successful audits
- Validate debugging methods match proven techniques
- Confirm fix patterns follow established best practices
Do the outputs support the stated objectives?
- Verify test results directly address functionality concerns
- Ensure debugging identified actual root causes, not symptoms
- Confirm fixes resolve issues without introducing regressions
Is the chosen method appropriate for the context?
- Match testing depth to code criticality (production vs. prototype)
- Scale debugging effort based on issue severity
- Apply fix complexity proportional to problem scope
Are there any internal contradictions?
- Check that passing tests align with manual validation
- Verify coverage metrics match actual test execution
- Confirm reported fixes actually appear in code changes
Program-of-Thought Decomposition
Define objective precisely
- Objective: Validate that code executes correctly with realistic inputs
- Success metric: 100% of critical functionality tests pass
- Constraint: No regressions introduced during debugging
Decompose into sub-goals
- Sub-goal 1: Create isolated, reproducible test environment
- Sub-goal 2: Execute code with comprehensive test coverage
- Sub-goal 3: Identify and categorize all failures systematically
- Sub-goal 4: Apply fixes using best practices
- Sub-goal 5: Validate fixes and assess production readiness
Identify dependencies
- Environment setup must complete before execution
- Execution must complete before debugging
- Debugging must complete before fix validation
- Validation must complete before reporting
Evaluate options
- Testing frameworks: Jest vs. Mocha vs. Vitest (choose based on project stack)
- Debugging approaches: Interactive debugger vs. log analysis vs. profiling
- Fix strategies: Minimal changes vs. comprehensive refactoring
Synthesize solution
- Integrate testing framework with existing CI/CD
- Combine automated testing with manual validation
- Apply fixes incrementally with continuous validation
Plan-and-Solve Framework
Planning Phase:
- Analyze code scope and criticality level
- Define test coverage requirements (unit, integration, e2e)
- Identify realistic input scenarios and edge cases
- Establish success criteria and quality gates
- Allocate time and resources per phase
Validation Gate 1: Review strategy against objectives
- Does test plan cover all critical functionality?
- Are edge cases and error scenarios included?
- Is debugging approach systematic and thorough?
- Are fix criteria clear and measurable?
Implementation Phase:
- Execute Phases 1-4 with continuous monitoring
- Track progress against success criteria
- Adapt approach based on findings
- Document issues and resolutions in real-time
Validation Gate 2: Verify outputs and performance
- All critical tests passing (100% of P0, 90%+ of P1)
- Code coverage exceeds threshold (80%+ line, 70%+ branch)
- No regressions introduced (100% of previously passing tests still pass)
- Performance within acceptable ranges (no degradation >10%)
Optimization Phase:
- Refine tests for better coverage
- Optimize debugging workflow for efficiency
- Improve fix quality and code maintainability
- Enhance reporting clarity and actionability
Validation Gate 3: Confirm targets met before concluding
- Production readiness assessment complete
- All stakeholders informed of results
- Artifacts stored and accessible
- Lessons learned documented for future audits
Integration with Other Skills
Related Skills:
when-detecting-fake-code-use-theater-detection- Pre-audit detection of non-functional codewhen-reviewing-code-comprehensively-use-code-review-assistant- Post-audit quality reviewwhen-verifying-quality-use-verification-quality- Comprehensive quality validationwhen-auditing-code-style-use-style-audit- Code style and conventions validation
Coordination Points:
- Theater detection can run before functionality audit to identify obviously broken code
- Code review can follow functionality audit to assess code quality beyond functionality
- Quality verification provides broader validation context for audit results
- Style audit ensures code meets conventions after functional fixes
Memory Sharing:
# Share audit results with code review skill
npx claude-flow memory store \
--key "code-review/input/functionality-audit-results" \
--value "$(cat functionality-audit-report.md)"
# Retrieve theater detection findings
npx claude-flow memory retrieve \
--key "theater-detection/output/suspicious-code"
Common Issues and Solutions
Issue 1: Sandbox Environment Fails to Initialize
Symptoms: Dependency installation errors, configuration failures Root Cause: Missing system dependencies, incompatible versions Solution:
# Validate system dependencies
node --version
npm --version
# Clean install
rm -rf node_modules package-lock.json
npm install --force
# Use Docker for isolation if local setup problematic
docker run -it --rm -v $(pwd):/workspace node:18 bash
Issue 2: Tests Fail Due to Missing Environment Variables
Symptoms: Runtime errors referencing undefined config Root Cause: Environment variables not set in sandbox Solution:
# Create .env file with required variables
cat > .env << 'EOF'
NODE_ENV=test
API_KEY=test-key-12345
DATABASE_URL=sqlite::memory:
EOF
# Load environment in test setup
require('dotenv').config();
Issue 3: Code Coverage Lower Than Expected
Symptoms: Coverage report shows gaps in tested code Root Cause: Missing test cases for edge cases, error paths Solution:
- Review coverage report to identify untested lines
- Add test cases for error handling paths
- Test edge cases and boundary conditions
- Mock external dependencies to isolate code under test
Issue 4: Debugging Reveals Complex Interdependencies
Symptoms: Fixing one issue breaks other functionality Root Cause: Tight coupling, shared state, hidden dependencies Solution:
- Use dependency injection to decouple components
- Refactor shared state into explicit parameters
- Add integration tests to catch cascading failures
- Document dependencies clearly in code comments
Issue 5: Performance Degradation After Fixes
Symptoms: Tests pass but execution time significantly increased Root Cause: Inefficient fix implementation, excessive validation Solution:
# Profile execution to identify bottlenecks
node --prof app.js
node --prof-process isolate-*.log > profile.txt
# Optimize hot paths
# - Cache repeated computations
# - Reduce unnecessary iterations
# - Use appropriate data structures
Examples
Example 1: Validating API Endpoint Functionality
# Initialize audit for API endpoint
npx claude-flow hooks pre-task --description "Audit /api/users endpoint"
# Phase 1: Setup
mkdir -p /tmp/api-audit
cd /tmp/api-audit
npm init -y
npm install --save-dev jest supertest
# Phase 2: Execute tests
cat > users.test.js << 'EOF'
const request = require('supertest');
const app = require('../src/app');
describe('GET /api/users', () => {
test('returns 200 with user list', async () => {
const response = await request(app).get('/api/users');
expect(response.statusCode).toBe(200);
expect(Array.isArray(response.body)).toBe(true);
});
test('handles authentication', async () => {
const response = await request(app)
.get('/api/users')
.set('Authorization', 'Bearer invalid-token');
expect(response.statusCode).toBe(401);
});
});
EOF
npm test
# Phase 3: Debug failures (if any)
# Analyze error messages and stack traces
# Fix authentication logic or data access issues
# Phase 4: Validate fixes
npm test -- --coverage
# Phase 5: Report
echo "API audit complete: All endpoints functional"
Example 2: Validating Data Processing Function
// Phase 2: Create test with realistic data
const { processUserData } = require('./data-processor');
describe('processUserData', () => {
test('processes valid user data correctly', () => {
const input = {
name: 'John Doe',
email: 'john@example.com',
age: 30,
roles: ['user', 'admin']
};
const result = processUserData(input);
expect(result).toHaveProperty('id');
expect(result.email).toBe('john@example.com');
expect(result.roles).toContain('admin');
});
test('handles missing fields gracefully', () => {
const input = { name: 'Jane' };
const result = processUserData(input);
expect(result).toHaveProperty('email');
expect(result.email).toBe(''); // Default value
});
test('validates email format', () => {
const input = { email: 'invalid-email' };
expect(() => processUserData(input)).toThrow('Invalid email format');
});
});
// Phase 3: Debug - Found that email validation regex was incorrect
// Fix: Update regex to properly match email patterns
// Phase 4: Validate - All tests pass, coverage 95%
Example 3: Validating Frontend Component
// Phase 2: Create component test with user interactions
import { render, screen, fireEvent } from '@testing-library/react';
import LoginForm from './LoginForm';
describe('LoginForm', () => {
test('submits form with valid credentials', async () => {
const handleSubmit = jest.fn();
render(<LoginForm onSubmit={handleSubmit} />);
fireEvent.change(screen.getByLabelText(/email/i), {
target: { value: 'test@example.com' }
});
fireEvent.change(screen.getByLabelText(/password/i), {
target: { value: 'password123' }
});
fireEvent.click(screen.getByRole('button', { name: /login/i }));
await screen.findByText(/logging in/i);
expect(handleSubmit).toHaveBeenCalledWith({
email: 'test@example.com',
password: 'password123'
});
});
test('displays validation errors', () => {
render(<LoginForm onSubmit={jest.fn()} />);
fireEvent.click(screen.getByRole('button', { name: /login/i }));
expect(screen.getByText(/email is required/i)).toBeInTheDocument();
expect(screen.getByText(/password is required/i)).toBeInTheDocument();
});
});
// Phase 3: Debug - Found that form validation wasn't triggering
// Fix: Add form validation logic with proper error state management
// Phase 4: Validate - Component renders and validates correctly