name	when-testing-code-use-testing-framework
description	- Skill ID: when-testing-code-use-testing-framework
version	1.0.0
category	quality
tags	quality, testing, validation
author	ruv

When to Use This Skill

Use this skill when:

Code quality issues are detected (violations, smells, anti-patterns)
Audit requirements mandate systematic review (compliance, release gates)
Review needs arise (pre-merge, production hardening, refactoring preparation)
Quality metrics indicate degradation (test coverage drop, complexity increase)
Theater detection is needed (mock data, stubs, incomplete implementations)

When NOT to Use This Skill

Do NOT use this skill for:

Simple formatting fixes (use linter/prettier directly)
Non-code files (documentation, configuration without logic)
Trivial changes (typo fixes, comment updates)
Generated code (build artifacts, vendor dependencies)
Third-party libraries (focus on application code)

Success Criteria

This skill succeeds when:

Violations Detected: All quality issues found with ZERO false negatives
False Positive Rate: <5% (95%+ findings are genuine issues)
Actionable Feedback: Every finding includes file path, line number, and fix guidance
Root Cause Identified: Issues traced to underlying causes, not just symptoms
Fix Verification: Proposed fixes validated against codebase constraints

Edge Cases and Limitations

Handle these edge cases carefully:

Empty Files: May trigger false positives - verify intent (stub vs intentional)
Generated Code: Skip or flag as low priority (auto-generated files)
Third-Party Libraries: Exclude from analysis (vendor/, node_modules/)
Domain-Specific Patterns: What looks like violation may be intentional (DSLs)
Legacy Code: Balance ideal standards with pragmatic technical debt management

Quality Analysis Guardrails

CRITICAL RULES - ALWAYS FOLLOW:

NEVER approve code without evidence: Require actual execution, not assumptions
ALWAYS provide line numbers: Every finding MUST include file:line reference
VALIDATE findings against multiple perspectives: Cross-check with complementary tools
DISTINGUISH symptoms from root causes: Report underlying issues, not just manifestations
AVOID false confidence: Flag uncertain findings as "needs manual review"
PRESERVE context: Show surrounding code (5 lines before/after minimum)
TRACK false positives: Learn from mistakes to improve detection accuracy

Evidence-Based Validation

Use multiple validation perspectives:

Static Analysis: Code structure, patterns, metrics (connascence, complexity)
Dynamic Analysis: Execution behavior, test results, runtime characteristics
Historical Analysis: Git history, past bug patterns, change frequency
Peer Review: Cross-validation with other quality skills (functionality-audit, theater-detection)
Domain Expertise: Leverage .claude/expertise/{domain}.yaml if available

Validation Threshold: Findings require 2+ confirming signals before flagging as violations.

Integration with Quality Pipeline

This skill integrates with:

Pre-Phase: Load domain expertise (.claude/expertise/{domain}.yaml)
Parallel Skills: functionality-audit, theater-detection-audit, style-audit
Post-Phase: Store findings in Memory MCP with WHO/WHEN/PROJECT/WHY tags
Feedback Loop: Learnings feed dogfooding-system for continuous improvement

Testing Framework Skill - Complete SOP

Metadata

Skill ID: when-testing-code-use-testing-framework
Version: 1.0.0
Category: Testing
Complexity: HIGH
Agents Required: tester, coder, code-analyzer
Estimated Time: 2-4 hours

Purpose

Implement comprehensive testing infrastructure following industry best practices with proper test strategy, unit tests, integration tests, E2E tests, coverage analysis, and CI/CD integration.

Trigger Conditions

Activate this skill when:

User explicitly requests "test this code" or "add tests"
New feature implementation requires test coverage
Code changes need test validation
Setting up testing infrastructure for a project
Improving test coverage metrics
Implementing TDD workflow
Preparing code for production deployment

5-Phase Testing Framework SOP

Phase 1: Test Strategy Design (20 minutes)

Objective: Analyze codebase and design comprehensive test strategy

Agent: code-analyzer

Actions:

Analyze codebase structure and identify testable units
Determine appropriate testing frameworks (Jest, Vitest, Playwright)
Define coverage goals (minimum 80% for critical paths)
Identify test categories needed:
- Unit tests (isolated functions/methods)
- Integration tests (component interactions)
- E2E tests (user workflows)
- Performance tests (if applicable)
Create test plan document with priorities

Hooks:

npx claude-flow@alpha hooks pre-task --description "Test strategy design"
npx claude-flow@alpha hooks session-restore --session-id "testing-framework"

Outputs:

Test strategy document (docs/TEST_STRATEGY.md)
Coverage goals and metrics
Testing framework selections
Test file structure plan

Memory Storage:

npx claude-flow@alpha memory store \
  --key "testing/strategy" \
  --value "$(cat docs/TEST_STRATEGY.md)" \
  --tags "testing,strategy,phase-1"

When to Use This Skill

Use this skill when:

Code quality issues are detected (violations, smells, anti-patterns)
Audit requirements mandate systematic review (compliance, release gates)
Review needs arise (pre-merge, production hardening, refactoring preparation)
Quality metrics indicate degradation (test coverage drop, complexity increase)
Theater detection is needed (mock data, stubs, incomplete implementations)

When NOT to Use This Skill

Do NOT use this skill for:

Simple formatting fixes (use linter/prettier directly)
Non-code files (documentation, configuration without logic)
Trivial changes (typo fixes, comment updates)
Generated code (build artifacts, vendor dependencies)
Third-party libraries (focus on application code)

Success Criteria

This skill succeeds when:

Violations Detected: All quality issues found with ZERO false negatives
False Positive Rate: <5% (95%+ findings are genuine issues)
Actionable Feedback: Every finding includes file path, line number, and fix guidance
Root Cause Identified: Issues traced to underlying causes, not just symptoms
Fix Verification: Proposed fixes validated against codebase constraints

Edge Cases and Limitations

Handle these edge cases carefully:

Empty Files: May trigger false positives - verify intent (stub vs intentional)
Generated Code: Skip or flag as low priority (auto-generated files)
Third-Party Libraries: Exclude from analysis (vendor/, node_modules/)
Domain-Specific Patterns: What looks like violation may be intentional (DSLs)
Legacy Code: Balance ideal standards with pragmatic technical debt management

Quality Analysis Guardrails

CRITICAL RULES - ALWAYS FOLLOW:

NEVER approve code without evidence: Require actual execution, not assumptions
ALWAYS provide line numbers: Every finding MUST include file:line reference
VALIDATE findings against multiple perspectives: Cross-check with complementary tools
DISTINGUISH symptoms from root causes: Report underlying issues, not just manifestations
AVOID false confidence: Flag uncertain findings as "needs manual review"
PRESERVE context: Show surrounding code (5 lines before/after minimum)
TRACK false positives: Learn from mistakes to improve detection accuracy

Evidence-Based Validation

Use multiple validation perspectives:

Static Analysis: Code structure, patterns, metrics (connascence, complexity)
Dynamic Analysis: Execution behavior, test results, runtime characteristics
Historical Analysis: Git history, past bug patterns, change frequency
Peer Review: Cross-validation with other quality skills (functionality-audit, theater-detection)
Domain Expertise: Leverage .claude/expertise/{domain}.yaml if available

Validation Threshold: Findings require 2+ confirming signals before flagging as violations.

Integration with Quality Pipeline

This skill integrates with:

Pre-Phase: Load domain expertise (.claude/expertise/{domain}.yaml)
Parallel Skills: functionality-audit, theater-detection-audit, style-audit
Post-Phase: Store findings in Memory MCP with WHO/WHEN/PROJECT/WHY tags
Feedback Loop: Learnings feed dogfooding-system for continuous improvement

Phase 2: Unit Test Implementation (60 minutes)

Objective: Create isolated unit tests for all functions and methods

Agent: tester

Actions:

Set up testing framework configuration
Create test utilities and helpers
Write unit tests for each module:
- Test happy paths
- Test edge cases
- Test error handling
- Mock external dependencies
Organize tests to mirror source structure
Ensure tests are atomic and independent

Test Patterns:

Jest/Vitest Unit Test Example:

// tests/unit/utils/calculator.test.js
import { describe, it, expect, beforeEach } from 'vitest';
import { Calculator } from '@/utils/calculator';

describe('Calculator', () => {
  let calculator;

  beforeEach(() => {
    calculator = new Calculator();
  });

  describe('add', () => {
    it('should add two positive numbers correctly', () => {
      expect(calculator.add(2, 3)).toBe(5);
    });

    it('should handle negative numbers', () => {
      expect(calculator.add(-2, 3)).toBe(1);
      expect(calculator.add(-2, -3)).toBe(-5);
    });

    it('should handle zero values', () => {
      expect(calculator.add(0, 5)).toBe(5);
      expect(calculator.add(5, 0)).toBe(5);
    });

    it('should throw error for non-numeric inputs', () => {
      expect(() => calculator.add('a', 2)).toThrow('Invalid input');
    });
  });

  describe('divide', () => {
    it('should divide numbers correctly', () => {
      expect(calculator.divide(10, 2)).toBe(5);
    });

    it('should handle decimal results', () => {
      expect(calculator.divide(5, 2)).toBe(2.5);
    });

    it('should throw error when dividing by zero', () => {
      expect(() => calculator.divide(10, 0)).toThrow('Division by zero');
    });
  });
});

Hooks:

npx claude-flow@alpha hooks post-edit \
  --file "tests/unit/calculator.test.js" \
  --memory-key "testing/unit-tests"

Outputs:

Unit test files in tests/unit/
Test configuration (vitest.config.js or jest.config.js)
Test utilities (tests/helpers/)
Mock files (tests/mocks/)

File Structure:

tests/
├── unit/
│   ├── utils/
│   │   ├── calculator.test.js
│   │   └── validator.test.js
│   ├── services/
│   │   └── userService.test.js
│   └── models/
│       └── User.test.js
├── helpers/
│   ├── testUtils.js
│   └── mockData.js
└── mocks/
    ├── api.js
    └── database.js

When to Use This Skill

Use this skill when:

Code quality issues are detected (violations, smells, anti-patterns)
Audit requirements mandate systematic review (compliance, release gates)
Review needs arise (pre-merge, production hardening, refactoring preparation)
Quality metrics indicate degradation (test coverage drop, complexity increase)
Theater detection is needed (mock data, stubs, incomplete implementations)

When NOT to Use This Skill

Do NOT use this skill for:

Simple formatting fixes (use linter/prettier directly)
Non-code files (documentation, configuration without logic)
Trivial changes (typo fixes, comment updates)
Generated code (build artifacts, vendor dependencies)
Third-party libraries (focus on application code)

Success Criteria

This skill succeeds when:

Violations Detected: All quality issues found with ZERO false negatives
False Positive Rate: <5% (95%+ findings are genuine issues)
Actionable Feedback: Every finding includes file path, line number, and fix guidance
Root Cause Identified: Issues traced to underlying causes, not just symptoms
Fix Verification: Proposed fixes validated against codebase constraints

Edge Cases and Limitations

Handle these edge cases carefully:

Empty Files: May trigger false positives - verify intent (stub vs intentional)
Generated Code: Skip or flag as low priority (auto-generated files)
Third-Party Libraries: Exclude from analysis (vendor/, node_modules/)
Domain-Specific Patterns: What looks like violation may be intentional (DSLs)
Legacy Code: Balance ideal standards with pragmatic technical debt management

Quality Analysis Guardrails

CRITICAL RULES - ALWAYS FOLLOW:

NEVER approve code without evidence: Require actual execution, not assumptions
ALWAYS provide line numbers: Every finding MUST include file:line reference
VALIDATE findings against multiple perspectives: Cross-check with complementary tools
DISTINGUISH symptoms from root causes: Report underlying issues, not just manifestations
AVOID false confidence: Flag uncertain findings as "needs manual review"
PRESERVE context: Show surrounding code (5 lines before/after minimum)
TRACK false positives: Learn from mistakes to improve detection accuracy

Evidence-Based Validation

Use multiple validation perspectives:

Static Analysis: Code structure, patterns, metrics (connascence, complexity)
Dynamic Analysis: Execution behavior, test results, runtime characteristics
Historical Analysis: Git history, past bug patterns, change frequency
Peer Review: Cross-validation with other quality skills (functionality-audit, theater-detection)
Domain Expertise: Leverage .claude/expertise/{domain}.yaml if available

Validation Threshold: Findings require 2+ confirming signals before flagging as violations.

Integration with Quality Pipeline

This skill integrates with:

Pre-Phase: Load domain expertise (.claude/expertise/{domain}.yaml)
Parallel Skills: functionality-audit, theater-detection-audit, style-audit
Post-Phase: Store findings in Memory MCP with WHO/WHEN/PROJECT/WHY tags
Feedback Loop: Learnings feed dogfooding-system for continuous improvement

Phase 3: Integration Test Creation (60 minutes)

Objective: Test component interactions and data flow

Agent: tester

Actions:

Identify integration points (API calls, database operations, services)
Set up test databases or mock services
Create integration tests for:
- API endpoints
- Service layer interactions
- Database operations
- External service integrations
Use real dependencies where feasible
Implement proper setup/teardown

Integration Test Example:

// tests/integration/api/users.test.js
import { describe, it, expect, beforeAll, afterAll, beforeEach } from 'vitest';
import request from 'supertest';
import { app } from '@/app';
import { database } from '@/config/database';
import { createTestUser, clearDatabase } from '@/tests/helpers/testUtils';

describe('User API Integration Tests', () => {
  beforeAll(async () => {
    await database.connect();
  });

  afterAll(async () => {
    await database.disconnect();
  });

  beforeEach(async () => {
    await clearDatabase();
  });

  describe('POST /api/users', () => {
    it('should create a new user successfully', async () => {
      const userData = {
        email: 'test@example.com',
        name: 'Test User',
        password: 'SecurePass123!'
      };

      const response = await request(app)
        .post('/api/users')
        .send(userData)
        .expect(201);

      expect(response.body).toMatchObject({
        id: expect.any(String),
        email: userData.email,
        name: userData.name
      });
      expect(response.body).not.toHaveProperty('password');
    });

    it('should return 400 for invalid email format', async () => {
      const userData = {
        email: 'invalid-email',
        name: 'Test User',
        password: 'SecurePass123!'
      };

      const response = await request(app)
        .post('/api/users')
        .send(userData)
        .expect(400);

      expect(response.body.error).toMatch(/invalid email/i);
    });

    it('should return 409 for duplicate email', async () => {
      const userData = {
        email: 'test@example.com',
        name: 'Test User',
        password: 'SecurePass123!'
      };

      await createTestUser(userData);

      const response = await request(app)
        .post('/api/users')
        .send(userData)
        .expect(409);

      expect(response.body.error).toMatch(/email already exists/i);
    });
  });

  describe('GET /api/users/:id', () => {
    it('should retrieve user by id', async () => {
      const user = await createTestUser({
        email: 'test@example.com',
        name: 'Test User'
      });

      const response = await request(app)
        .get(`/api/users/${user.id}`)
        .expect(200);

      expect(response.body).toMatchObject({
        id: user.id,
        email: user.email,
        name: user.name
      });
    });

    it('should return 404 for non-existent user', async () => {
      const response = await request(app)
        .get('/api/users/non-existent-id')
        .expect(404);

      expect(response.body.error).toMatch(/user not found/i);
    });
  });
});

Service Integration Test Example:

// tests/integration/services/orderService.test.js
import { describe, it, expect, beforeEach } from 'vitest';
import { OrderService } from '@/services/OrderService';
import { UserService } from '@/services/UserService';
import { ProductService } from '@/services/ProductService';
import { database } from '@/config/database';
import { clearDatabase, createTestUser, createTestProduct } from '@/tests/helpers/testUtils';

describe('OrderService Integration', () => {
  let orderService;
  let userService;
  let productService;

  beforeEach(async () => {
    await clearDatabase();
    orderService = new OrderService();
    userService = new UserService();
    productService = new ProductService();
  });

  it('should create order with valid user and products', async () => {
    const user = await createTestUser();
    const product1 = await createTestProduct({ price: 10.00 });
    const product2 = await createTestProduct({ price: 20.00 });

    const order = await orderService.createOrder({
      userId: user.id,
      items: [
        { productId: product1.id, quantity: 2 },
        { productId: product2.id, quantity: 1 }
      ]
    });

    expect(order).toMatchObject({
      id: expect.any(String),
      userId: user.id,
      total: 40.00,
      status: 'pending'
    });
    expect(order.items).toHaveLength(2);
  });

  it('should update inventory when order is created', async () => {
    const user = await createTestUser();
    const product = await createTestProduct({
      price: 10.00,
      inventory: 100
    });

    await orderService.createOrder({
      userId: user.id,
      items: [{ productId: product.id, quantity: 5 }]
    });

    const updatedProduct = await productService.getById(product.id);
    expect(updatedProduct.inventory).toBe(95);
  });

  it('should fail when product inventory is insufficient', async () => {
    const user = await createTestUser();
    const product = await createTestProduct({
      price: 10.00,
      inventory: 2
    });

    await expect(
      orderService.createOrder({
        userId: user.id,
        items: [{ productId: product.id, quantity: 5 }]
      })
    ).rejects.toThrow(/insufficient inventory/i);
  });
});

Outputs:

Integration test files in tests/integration/
Test database configurations
API test utilities
Service mocks and stubs

When to Use This Skill

Use this skill when:

Code quality issues are detected (violations, smells, anti-patterns)
Audit requirements mandate systematic review (compliance, release gates)
Review needs arise (pre-merge, production hardening, refactoring preparation)
Quality metrics indicate degradation (test coverage drop, complexity increase)
Theater detection is needed (mock data, stubs, incomplete implementations)

When NOT to Use This Skill

Do NOT use this skill for:

Simple formatting fixes (use linter/prettier directly)
Non-code files (documentation, configuration without logic)
Trivial changes (typo fixes, comment updates)
Generated code (build artifacts, vendor dependencies)
Third-party libraries (focus on application code)

Success Criteria

This skill succeeds when:

Violations Detected: All quality issues found with ZERO false negatives
False Positive Rate: <5% (95%+ findings are genuine issues)
Actionable Feedback: Every finding includes file path, line number, and fix guidance
Root Cause Identified: Issues traced to underlying causes, not just symptoms
Fix Verification: Proposed fixes validated against codebase constraints

Edge Cases and Limitations

Handle these edge cases carefully:

Empty Files: May trigger false positives - verify intent (stub vs intentional)
Generated Code: Skip or flag as low priority (auto-generated files)
Third-Party Libraries: Exclude from analysis (vendor/, node_modules/)
Domain-Specific Patterns: What looks like violation may be intentional (DSLs)
Legacy Code: Balance ideal standards with pragmatic technical debt management

Quality Analysis Guardrails

CRITICAL RULES - ALWAYS FOLLOW:

NEVER approve code without evidence: Require actual execution, not assumptions
ALWAYS provide line numbers: Every finding MUST include file:line reference
VALIDATE findings against multiple perspectives: Cross-check with complementary tools
DISTINGUISH symptoms from root causes: Report underlying issues, not just manifestations
AVOID false confidence: Flag uncertain findings as "needs manual review"
PRESERVE context: Show surrounding code (5 lines before/after minimum)
TRACK false positives: Learn from mistakes to improve detection accuracy

Evidence-Based Validation

Use multiple validation perspectives:

Static Analysis: Code structure, patterns, metrics (connascence, complexity)
Dynamic Analysis: Execution behavior, test results, runtime characteristics
Historical Analysis: Git history, past bug patterns, change frequency
Peer Review: Cross-validation with other quality skills (functionality-audit, theater-detection)
Domain Expertise: Leverage .claude/expertise/{domain}.yaml if available

Validation Threshold: Findings require 2+ confirming signals before flagging as violations.

Integration with Quality Pipeline

This skill integrates with:

Pre-Phase: Load domain expertise (.claude/expertise/{domain}.yaml)
Parallel Skills: functionality-audit, theater-detection-audit, style-audit
Post-Phase: Store findings in Memory MCP with WHO/WHEN/PROJECT/WHY tags
Feedback Loop: Learnings feed dogfooding-system for continuous improvement

Phase 4: E2E Test Setup (60 minutes)

Objective: Create end-to-end tests simulating real user workflows

Agent: tester

Actions:

Set up E2E testing framework (Playwright, Cypress)
Configure test environments
Create page object models
Write E2E tests for critical user journeys:
- Authentication flows
- Core feature workflows
- Error scenarios
Implement visual regression tests (if applicable)

Playwright E2E Test Example:

// tests/e2e/user-registration.spec.js
import { test, expect } from '@playwright/test';

test.describe('User Registration Flow', () => {
  test.beforeEach(async ({ page }) => {
    await page.goto('/register');
  });

  test('should register new user successfully', async ({ page }) => {
    // Fill registration form
    await page.fill('input[name="email"]', 'newuser@example.com');
    await page.fill('input[name="name"]', 'New User');
    await page.fill('input[name="password"]', 'SecurePass123!');
    await page.fill('input[name="confirmPassword"]', 'SecurePass123!');

    // Submit form
    await page.click('button[type="submit"]');

    // Verify success
    await expect(page.locator('.success-message')).toContainText(
      'Registration successful'
    );
    await expect(page).toHaveURL('/dashboard');

    // Verify user is logged in
    await expect(page.locator('.user-menu')).toContainText('New User');
  });

  test('should show validation errors for invalid input', async ({ page }) => {
    await page.fill('input[name="email"]', 'invalid-email');
    await page.fill('input[name="password"]', '123');
    await page.click('button[type="submit"]');

    await expect(page.locator('.error-message')).toContainText(
      'Invalid email format'
    );
    await expect(page.locator('.error-message')).toContainText(
      'Password must be at least 8 characters'
    );
  });

  test('should show error for duplicate email', async ({ page }) => {
    // First registration
    await page.fill('input[name="email"]', 'existing@example.com');
    await page.fill('input[name="name"]', 'Existing User');
    await page.fill('input[name="password"]', 'SecurePass123!');
    await page.fill('input[name="confirmPassword"]', 'SecurePass123!');
    await page.click('button[type="submit"]');

    // Logout and try again with same email
    await page.click('.logout-button');
    await page.goto('/register');

    await page.fill('input[name="email"]', 'existing@example.com');
    await page.fill('input[name="name"]', 'Another User');
    await page.fill('input[name="password"]', 'SecurePass123!');
    await page.fill('input[name="confirmPassword"]', 'SecurePass123!');
    await page.click('button[type="submit"]');

    await expect(page.locator('.error-message')).toContainText(
      'Email already registered'
    );
  });
});

test.describe('User Login Flow', () => {
  test('should login existing user successfully', async ({ page }) => {
    await page.goto('/login');

    await page.fill('input[name="email"]', 'testuser@example.com');
    await page.fill('input[name="password"]', 'TestPass123!');
    await page.click('button[type="submit"]');

    await expect(page).toHaveURL('/dashboard');
    await expect(page.locator('.welcome-message')).toContainText('Welcome back');
  });

  test('should show error for invalid credentials', async ({ page }) => {
    await page.goto('/login');

    await page.fill('input[name="email"]', 'wrong@example.com');
    await page.fill('input[name="password"]', 'WrongPassword');
    await page.click('button[type="submit"]');

    await expect(page.locator('.error-message')).toContainText(
      'Invalid email or password'
    );
    await expect(page).toHaveURL('/login');
  });
});

Page Object Model Example:

// tests/e2e/pages/RegistrationPage.js
export class RegistrationPage {
  constructor(page) {
    this.page = page;
    this.emailInput = page.locator('input[name="email"]');
    this.nameInput = page.locator('input[name="name"]');
    this.passwordInput = page.locator('input[name="password"]');
    this.confirmPasswordInput = page.locator('input[name="confirmPassword"]');
    this.submitButton = page.locator('button[type="submit"]');
    this.successMessage = page.locator('.success-message');
    this.errorMessage = page.locator('.error-message');
  }

  async goto() {
    await this.page.goto('/register');
  }

  async register(email, name, password, confirmPassword = password) {
    await this.emailInput.fill(email);
    await this.nameInput.fill(name);
    await this.passwordInput.fill(password);
    await this.confirmPasswordInput.fill(confirmPassword);
    await this.submitButton.click();
  }

  async expectSuccess() {
    await expect(this.successMessage).toBeVisible();
    await expect(this.page).toHaveURL('/dashboard');
  }

  async expectError(message) {
    await expect(this.errorMessage).toContainText(message);
  }
}

Outputs:

E2E test files in tests/e2e/
Page object models in tests/e2e/pages/
Playwright/Cypress configuration
Test fixtures and utilities

When to Use This Skill

Use this skill when:

Code quality issues are detected (violations, smells, anti-patterns)
Audit requirements mandate systematic review (compliance, release gates)
Review needs arise (pre-merge, production hardening, refactoring preparation)
Quality metrics indicate degradation (test coverage drop, complexity increase)
Theater detection is needed (mock data, stubs, incomplete implementations)

When NOT to Use This Skill

Do NOT use this skill for:

Simple formatting fixes (use linter/prettier directly)
Non-code files (documentation, configuration without logic)
Trivial changes (typo fixes, comment updates)
Generated code (build artifacts, vendor dependencies)
Third-party libraries (focus on application code)

Success Criteria

This skill succeeds when:

Violations Detected: All quality issues found with ZERO false negatives
False Positive Rate: <5% (95%+ findings are genuine issues)
Actionable Feedback: Every finding includes file path, line number, and fix guidance
Root Cause Identified: Issues traced to underlying causes, not just symptoms
Fix Verification: Proposed fixes validated against codebase constraints

Edge Cases and Limitations

Handle these edge cases carefully:

Empty Files: May trigger false positives - verify intent (stub vs intentional)
Generated Code: Skip or flag as low priority (auto-generated files)
Third-Party Libraries: Exclude from analysis (vendor/, node_modules/)
Domain-Specific Patterns: What looks like violation may be intentional (DSLs)
Legacy Code: Balance ideal standards with pragmatic technical debt management

Quality Analysis Guardrails

CRITICAL RULES - ALWAYS FOLLOW:

NEVER approve code without evidence: Require actual execution, not assumptions
ALWAYS provide line numbers: Every finding MUST include file:line reference
VALIDATE findings against multiple perspectives: Cross-check with complementary tools
DISTINGUISH symptoms from root causes: Report underlying issues, not just manifestations
AVOID false confidence: Flag uncertain findings as "needs manual review"
PRESERVE context: Show surrounding code (5 lines before/after minimum)
TRACK false positives: Learn from mistakes to improve detection accuracy

Evidence-Based Validation

Use multiple validation perspectives:

Static Analysis: Code structure, patterns, metrics (connascence, complexity)
Dynamic Analysis: Execution behavior, test results, runtime characteristics
Historical Analysis: Git history, past bug patterns, change frequency
Peer Review: Cross-validation with other quality skills (functionality-audit, theater-detection)
Domain Expertise: Leverage .claude/expertise/{domain}.yaml if available

Validation Threshold: Findings require 2+ confirming signals before flagging as violations.

Integration with Quality Pipeline

This skill integrates with:

Pre-Phase: Load domain expertise (.claude/expertise/{domain}.yaml)
Parallel Skills: functionality-audit, theater-detection-audit, style-audit
Post-Phase: Store findings in Memory MCP with WHO/WHEN/PROJECT/WHY tags
Feedback Loop: Learnings feed dogfooding-system for continuous improvement

Phase 5: Coverage Analysis & CI/CD Integration (40 minutes)

Objective: Analyze test coverage, generate reports, and integrate with CI/CD

Agent: coder

Actions:

Configure coverage collection
Generate coverage reports (HTML, JSON, LCOV)
Identify untested code paths
Set up CI/CD pipeline for automated testing
Configure coverage thresholds
Create pre-commit hooks for testing

Coverage Configuration:

// vitest.config.js
import { defineConfig } from 'vitest/config';
import path from 'path';

export default defineConfig({
  test: {
    globals: true,
    environment: 'node',
    coverage: {
      provider: 'v8',
      reporter: ['text', 'html', 'json', 'lcov'],
      all: true,
      include: ['src/**/*.{js,ts,jsx,tsx}'],
      exclude: [
        'src/**/*.test.{js,ts,jsx,tsx}',
        'src/**/*.spec.{js,ts,jsx,tsx}',
        'src/**/index.{js,ts}',
        'src/**/*.d.ts',
        'src/tests/**'
      ],
      thresholds: {
        lines: 80,
        functions: 80,
        branches: 75,
        statements: 80
      }
    },
    setupFiles: ['./tests/setup.js'],
    testTimeout: 10000
  },
  resolve: {
    alias: {
      '@': path.resolve(__dirname, './src')
    }
  }
});

GitHub Actions CI Configuration:

# .github/workflows/test.yml
name: Test Suite

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main, develop]

jobs:
  test:
    runs-on: ubuntu-latest

    strategy:
      matrix:
        node-version: [18.x, 20.x]

    steps:
      - uses: actions/checkout@v4

      - name: Setup Node.js ${{ matrix.node-version }}
        uses: actions/setup-node@v4
        with:
          node-version: ${{ matrix.node-version }}
          cache: 'npm'

      - name: Install dependencies
        run: npm ci

      - name: Run unit tests
        run: npm run test:unit

      - name: Run integration tests
        run: npm run test:integration
        env:
          DATABASE_URL: postgresql://test:test@localhost:5432/testdb

      - name: Run E2E tests
        run: npm run test:e2e

      - name: Generate coverage report
        run: npm run test:coverage

      - name: Upload coverage to Codecov
        uses: codecov/codecov-action@v3
        with:
          files: ./coverage/lcov.info
          flags: unittests
          name: codecov-umbrella

      - name: Check coverage thresholds
        run: npm run test:coverage:check

  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20.x'
      - run: npm ci
      - run: npm run lint
      - run: npm run typecheck

Package.json Scripts:

{
  "scripts": {
    "test": "vitest",
    "test:unit": "vitest run --config vitest.unit.config.js",
    "test:integration": "vitest run --config vitest.integration.config.js",
    "test:e2e": "playwright test",
    "test:coverage": "vitest run --coverage",
    "test:coverage:check": "vitest run --coverage --run",
    "test:watch": "vitest watch",
    "test:ui": "vitest --ui",
    "test:e2e:ui": "playwright test --ui",
    "test:e2e:debug": "playwright test --debug"
  }
}

Pre-commit Hook:

#!/bin/bash
# .husky/pre-commit

echo "Running tests before commit..."

# Run unit tests
npm run test:unit
if [ $? -ne 0 ]; then
  echo "Unit tests failed. Commit aborted."
  exit 1
fi

# Run linting
npm run lint
if [ $? -ne 0 ]; then
  echo "Linting failed. Commit aborted."
  exit 1
fi

echo "All checks passed. Proceeding with commit."

Hooks:

npx claude-flow@alpha hooks post-task --task-id "testing-framework"
npx claude-flow@alpha hooks session-end --export-metrics true

Outputs:

Coverage reports in coverage/
CI/CD configuration files
Pre-commit hooks
Test documentation

Memory Storage:

npx claude-flow@alpha memory store \
  --key "testing/coverage" \
  --value "$(cat coverage/coverage-summary.json)" \
  --tags "testing,coverage,metrics"

When to Use This Skill

Use this skill when:

Code quality issues are detected (violations, smells, anti-patterns)
Audit requirements mandate systematic review (compliance, release gates)
Review needs arise (pre-merge, production hardening, refactoring preparation)
Quality metrics indicate degradation (test coverage drop, complexity increase)
Theater detection is needed (mock data, stubs, incomplete implementations)

When NOT to Use This Skill

Do NOT use this skill for:

Simple formatting fixes (use linter/prettier directly)
Non-code files (documentation, configuration without logic)
Trivial changes (typo fixes, comment updates)
Generated code (build artifacts, vendor dependencies)
Third-party libraries (focus on application code)

Success Criteria

This skill succeeds when:

Violations Detected: All quality issues found with ZERO false negatives
False Positive Rate: <5% (95%+ findings are genuine issues)
Actionable Feedback: Every finding includes file path, line number, and fix guidance
Root Cause Identified: Issues traced to underlying causes, not just symptoms
Fix Verification: Proposed fixes validated against codebase constraints

Edge Cases and Limitations

Handle these edge cases carefully:

Empty Files: May trigger false positives - verify intent (stub vs intentional)
Generated Code: Skip or flag as low priority (auto-generated files)
Third-Party Libraries: Exclude from analysis (vendor/, node_modules/)
Domain-Specific Patterns: What looks like violation may be intentional (DSLs)
Legacy Code: Balance ideal standards with pragmatic technical debt management

Quality Analysis Guardrails

CRITICAL RULES - ALWAYS FOLLOW:

NEVER approve code without evidence: Require actual execution, not assumptions
ALWAYS provide line numbers: Every finding MUST include file:line reference
VALIDATE findings against multiple perspectives: Cross-check with complementary tools
DISTINGUISH symptoms from root causes: Report underlying issues, not just manifestations
AVOID false confidence: Flag uncertain findings as "needs manual review"
PRESERVE context: Show surrounding code (5 lines before/after minimum)
TRACK false positives: Learn from mistakes to improve detection accuracy

Evidence-Based Validation

Use multiple validation perspectives:

Static Analysis: Code structure, patterns, metrics (connascence, complexity)
Dynamic Analysis: Execution behavior, test results, runtime characteristics
Historical Analysis: Git history, past bug patterns, change frequency
Peer Review: Cross-validation with other quality skills (functionality-audit, theater-detection)
Domain Expertise: Leverage .claude/expertise/{domain}.yaml if available

Validation Threshold: Findings require 2+ confirming signals before flagging as violations.

Integration with Quality Pipeline

This skill integrates with:

Pre-Phase: Load domain expertise (.claude/expertise/{domain}.yaml)
Parallel Skills: functionality-audit, theater-detection-audit, style-audit
Post-Phase: Store findings in Memory MCP with WHO/WHEN/PROJECT/WHY tags
Feedback Loop: Learnings feed dogfooding-system for continuous improvement

Success Criteria

All phases completed successfully
Unit test coverage ≥ 80% for critical paths
Integration tests cover all API endpoints
E2E tests cover critical user journeys
All tests pass in CI/CD pipeline
Coverage reports generated and accessible
Pre-commit hooks configured
Test documentation complete

Quality Gates

Phase 1: Test strategy document approved Phase 2: Unit tests pass with >80% coverage Phase 3: Integration tests pass with real dependencies Phase 4: E2E tests pass in headless mode Phase 5: CI/CD pipeline executes successfully

Rollback Strategy

If any phase fails:

Document failure reason in memory
Rollback to previous working state
Notify orchestrator for replanning
Store lessons learned for future attempts

Performance Metrics

Test execution time: <5 minutes for full suite
Coverage collection overhead: <10% additional time
CI/CD pipeline duration: <10 minutes
Test flakiness rate: <1%

Documentation Requirements

Test strategy document
Test coverage report
Testing best practices guide
Troubleshooting guide
CI/CD setup instructions

Core Principles

Test Pyramid Balance - Maintain proper test distribution: many fast unit tests, fewer integration tests, minimal E2E tests. This ensures comprehensive coverage while keeping test suites fast and maintainable.
Test Independence - Each test must run in isolation without dependencies on other tests or shared state. Tests should pass consistently regardless of execution order or parallelization.
Fail Fast with Clear Feedback - Tests should fail quickly when issues arise and provide actionable error messages. Every failure should pinpoint the exact problem and suggest next steps for resolution.

Anti-Patterns

Anti-Pattern	Why It Fails	Better Approach
Flaky Tests - Tests that pass/fail non-deterministically	Erodes trust in test suite, wastes developer time investigating false failures, masks real bugs	Fix race conditions, eliminate time dependencies, use proper test isolation, mock external services consistently
Testing Implementation Details - Tests coupled to internal code structure rather than behavior	Brittle tests break on refactoring, high maintenance cost, discourages code improvement	Test public APIs and observable behavior, focus on contract not implementation, use integration tests over unit tests when appropriate
Missing Assertions - Tests that execute code but don't verify outcomes	False confidence in code correctness, bugs slip through, tests provide no value	Every test needs explicit assertions (expect/toBe), verify both success cases and error conditions, ensure minimum 2 assertions per test

Conclusion

The testing framework skill provides a comprehensive 5-phase workflow for building production-ready test infrastructure that ensures code quality and reliability. By following the systematic approach from test strategy design through CI/CD integration, teams can achieve robust coverage across unit, integration, and E2E testing layers. The skill emphasizes proper test architecture, framework selection, and automation while maintaining the test pyramid balance that keeps suites fast and maintainable.

Successful testing implementation requires commitment to test independence, clear failure feedback, and continuous coverage improvement. By avoiding anti-patterns like flaky tests, implementation-detail coupling, and missing assertions, teams build test suites that provide genuine confidence in code correctness. The integration of coverage analysis, pre-commit hooks, and CI/CD pipelines ensures testing becomes an automated quality gate rather than a manual burden. This systematic approach transforms testing from a checkbox exercise into a powerful mechanism for catching bugs early, enabling confident refactoring, and shipping reliable software to production.

Install Skill

SKILL.md

When to Use This Skill

When NOT to Use This Skill

Success Criteria

Edge Cases and Limitations

Quality Analysis Guardrails

Evidence-Based Validation

Integration with Quality Pipeline

Testing Framework Skill - Complete SOP

Metadata

Purpose

Trigger Conditions

5-Phase Testing Framework SOP

Phase 1: Test Strategy Design (20 minutes)

When to Use This Skill

When NOT to Use This Skill

Success Criteria

Edge Cases and Limitations

Quality Analysis Guardrails

Evidence-Based Validation

Integration with Quality Pipeline

Phase 2: Unit Test Implementation (60 minutes)

When to Use This Skill

When NOT to Use This Skill

Success Criteria

Edge Cases and Limitations

Quality Analysis Guardrails

Evidence-Based Validation

Integration with Quality Pipeline

Phase 3: Integration Test Creation (60 minutes)

When to Use This Skill

When NOT to Use This Skill

Success Criteria

Edge Cases and Limitations

Quality Analysis Guardrails

Evidence-Based Validation

Integration with Quality Pipeline

Phase 4: E2E Test Setup (60 minutes)

When to Use This Skill

When NOT to Use This Skill

Success Criteria

Edge Cases and Limitations

Quality Analysis Guardrails

Evidence-Based Validation

Integration with Quality Pipeline

Phase 5: Coverage Analysis & CI/CD Integration (40 minutes)

When to Use This Skill

When NOT to Use This Skill

Success Criteria

Edge Cases and Limitations

Quality Analysis Guardrails

Evidence-Based Validation

Integration with Quality Pipeline

Success Criteria

Quality Gates

Rollback Strategy

Performance Metrics

Documentation Requirements

Core Principles

Anti-Patterns

Conclusion