name	technology-benchmarking
description	Expert at comparing libraries and frameworks through parallel implementation, performance benchmarking, code characteristic analysis, and evidence-based technology selection. Activate when evaluating technologies, comparing alternatives, or making technology adoption decisions.

Technology Benchmarking Skill

Expert knowledge in systematic technology comparison through parallel implementation, rigorous performance benchmarking, code characteristic analysis, and evidence-based decision-making for technology adoption.

Documentation Output Locations

This skill generates outputs in the following .orchestr8/docs/ locations:

Technology comparison reports: .orchestr8/docs/research/comparisons/
Performance benchmarks: .orchestr8/docs/performance/benchmarks/
Architecture recommendations: .orchestr8/docs/architecture/design/

Output Naming Convention

All outputs follow the pattern: [type]-[name]-YYYY-MM-DD.md

Example outputs:

.orchestr8/docs/research/assumptions/validation-microservices-2025-01-15.md
.orchestr8/docs/research/poc/poc-event-sourcing-2025-01-15.md
.orchestr8/docs/patterns/library/pattern-factory-2025-01-15.md

When to Use This Skill

Use technology-benchmarking for:

✅ Comparing multiple libraries or frameworks for a specific use case
✅ Evaluating technology choices before major adoption
✅ Performance benchmarking of alternative solutions
✅ Analyzing code characteristics (bundle size, API ergonomics, learning curve)
✅ Making evidence-based technology decisions
✅ Validating technology migration decisions
✅ Creating technology comparison reports for stakeholders
✅ Building proof-of-concept implementations for comparison

Less critical for:

❌ Already-decided technology choices
❌ Simple library updates within same ecosystem
❌ Single-option scenarios (no alternatives)
❌ Technology choices dictated by organizational standards

Core Benchmarking Methodology

Phase 1: Define Comparison Criteria

Objective: Establish clear, measurable criteria for technology comparison.

Criteria Categories:

interface BenchmarkCriteria {
  performance: {
    throughput: boolean;        // Requests/operations per second
    latency: boolean;           // Response time (p50, p95, p99)
    memoryUsage: boolean;       // RAM consumption
    cpuUsage: boolean;          // CPU utilization
    bundleSize: boolean;        // Frontend: bundle impact
    coldStart: boolean;         // Serverless: cold start time
  };

  developer_experience: {
    learningCurve: boolean;     // Time to productivity
    apiErgonomics: boolean;     // API design quality
    documentation: boolean;     // Docs quality and completeness
    typeScript: boolean;        // TypeScript support quality
    debugging: boolean;         // Debugging experience
    tooling: boolean;           // IDE support, linters, etc.
  };

  ecosystem: {
    maturity: boolean;          // Library age and stability
    communitySize: boolean;     // GitHub stars, npm downloads
    maintenance: boolean;       // Update frequency, responsiveness
    plugins: boolean;           // Ecosystem of extensions
    migration: boolean;         // Migration path complexity
  };

  code_characteristics: {
    loc: boolean;               // Lines of code for same feature
    complexity: boolean;        // Cyclomatic complexity
    testability: boolean;       // Ease of testing
    readability: boolean;       // Code readability score
    boilerplate: boolean;       // Amount of boilerplate required
  };

  compatibility: {
    browsers: boolean;          // Browser support matrix
    nodeVersions: boolean;      // Node.js version support
    frameworks: boolean;        // Framework compatibility
    platforms: boolean;         // Platform support (web, mobile, etc.)
  };

  security: {
    vulnerabilities: boolean;   // Known CVEs
    dependencies: boolean;      // Dependency count and health
    updates: boolean;           // Security patch responsiveness
    audit: boolean;             // npm audit / security scanning
  };
}

Example Criteria Set:

# Comparing React State Management Libraries
## Benchmark: Redux vs Zustand vs Jotai vs Recoil

### Selected Criteria:
**Performance (40%)**
- ⚡ Re-render count (critical)
- ⚡ Update latency (important)
- ⚡ Bundle size (critical for mobile)

**Developer Experience (35%)**
- 📚 Learning curve (important)
- 🎨 API ergonomics (critical)
- 📝 TypeScript support (critical)
- 🐛 Debugging tools (important)

**Code Characteristics (15%)**
- 📏 Lines of code (important)
- 🧩 Boilerplate required (critical)
- ✅ Testability (important)

**Ecosystem (10%)**
- 👥 Community size (important)
- 🔧 Maintenance (critical)
- 🔌 Plugin ecosystem (nice-to-have)

Expected Outputs:

Weighted criteria matrix
Stakeholder-approved criteria
Measurement methodology
Success thresholds

Phase 2: Parallel Implementation

Objective: Implement identical features using each technology candidate.

Implementation Strategy:

# Create isolated implementation directories
mkdir -p benchmark/
cd benchmark/

# Technology A
mkdir redux-implementation/
cd redux-implementation/
npm init -y
npm install redux react-redux @reduxjs/toolkit
# Implement feature...

# Technology B
cd ..
mkdir zustand-implementation/
cd zustand-implementation/
npm init -y
npm install zustand
# Implement same feature...

# Technology C
cd ..
mkdir jotai-implementation/
cd jotai-implementation/
npm init -y
npm install jotai
# Implement same feature...

# Technology D
cd ..
mkdir recoil-implementation/
cd recoil-implementation/
npm init -y
npm install recoil
# Implement same feature...

Feature Parity Checklist:

# Implementation Checklist
Ensure all implementations have:
- ✅ Same feature set (exact functionality)
- ✅ Same user interactions (identical UX)
- ✅ Same edge cases handled
- ✅ Same error handling
- ✅ Same optimization level (all basic, or all optimized)
- ✅ Same test coverage
- ✅ Same production readiness

Example: Todo App State Management

// Redux Implementation
// store.ts
import { configureStore, createSlice } from '@reduxjs/toolkit';

const todosSlice = createSlice({
  name: 'todos',
  initialState: [],
  reducers: {
    addTodo: (state, action) => {
      state.push({ id: Date.now(), text: action.payload, completed: false });
    },
    toggleTodo: (state, action) => {
      const todo = state.find(t => t.id === action.payload);
      if (todo) todo.completed = !todo.completed;
    },
    removeTodo: (state, action) => {
      return state.filter(t => t.id !== action.payload);
    }
  }
});

export const { addTodo, toggleTodo, removeTodo } = todosSlice.actions;
export const store = configureStore({ reducer: { todos: todosSlice.reducer } });

// TodoList.tsx (43 lines)
import { useSelector, useDispatch } from 'react-redux';
import { addTodo, toggleTodo, removeTodo } from './store';

export function TodoList() {
  const todos = useSelector((state) => state.todos);
  const dispatch = useDispatch();

  return (
    <div>
      <input onKeyPress={(e) => {
        if (e.key === 'Enter') {
          dispatch(addTodo(e.target.value));
          e.target.value = '';
        }
      }} />
      <ul>
        {todos.map(todo => (
          <li key={todo.id}>
            <input
              type="checkbox"
              checked={todo.completed}
              onChange={() => dispatch(toggleTodo(todo.id))}
            />
            <span>{todo.text}</span>
            <button onClick={() => dispatch(removeTodo(todo.id))}>Delete</button>
          </li>
        ))}
      </ul>
    </div>
  );
}

// Zustand Implementation
// store.ts
import create from 'zustand';

export const useTodoStore = create((set) => ({
  todos: [],
  addTodo: (text) => set((state) => ({
    todos: [...state.todos, { id: Date.now(), text, completed: false }]
  })),
  toggleTodo: (id) => set((state) => ({
    todos: state.todos.map(t => t.id === id ? { ...t, completed: !t.completed } : t)
  })),
  removeTodo: (id) => set((state) => ({
    todos: state.todos.filter(t => t.id !== id)
  }))
}));

// TodoList.tsx (37 lines)
import { useTodoStore } from './store';

export function TodoList() {
  const { todos, addTodo, toggleTodo, removeTodo } = useTodoStore();

  return (
    <div>
      <input onKeyPress={(e) => {
        if (e.key === 'Enter') {
          addTodo(e.target.value);
          e.target.value = '';
        }
      }} />
      <ul>
        {todos.map(todo => (
          <li key={todo.id}>
            <input
              type="checkbox"
              checked={todo.completed}
              onChange={() => toggleTodo(todo.id)}
            />
            <span>{todo.text}</span>
            <button onClick={() => removeTodo(todo.id)}>Delete</button>
          </li>
        ))}
      </ul>
    </div>
  );
}

Expected Outputs:

Functionally identical implementations
Side-by-side code comparison
Lines of code metrics
Boilerplate analysis

Phase 3: Performance Benchmarking

Objective: Measure and compare performance characteristics objectively.

Benchmark Suite Setup:

// benchmark/performance-suite.ts
import { performance } from 'perf_hooks';
import Benchmark from 'benchmark';

interface BenchmarkResult {
  name: string;
  opsPerSecond: number;
  mean: number;
  standardDeviation: number;
  marginOfError: number;
  relativeMarginOfError: number;
  sampleSize: number;
}

class TechnologyBenchmark {
  private suite = new Benchmark.Suite();
  private results: BenchmarkResult[] = [];

  add(name: string, fn: () => void): this {
    this.suite.add(name, fn);
    return this;
  }

  async run(): Promise<BenchmarkResult[]> {
    return new Promise((resolve) => {
      this.suite
        .on('cycle', (event: any) => {
          const bench = event.target;
          this.results.push({
            name: bench.name,
            opsPerSecond: bench.hz,
            mean: bench.stats.mean,
            standardDeviation: bench.stats.deviation,
            marginOfError: bench.stats.moe,
            relativeMarginOfError: bench.stats.rme,
            sampleSize: bench.stats.sample.length
          });
        })
        .on('complete', () => {
          resolve(this.results);
        })
        .run({ async: true });
    });
  }
}

// Usage
const benchmark = new TechnologyBenchmark();

benchmark
  .add('Redux: Add 1000 todos', () => {
    for (let i = 0; i < 1000; i++) {
      store.dispatch(addTodo(`Todo ${i}`));
    }
  })
  .add('Zustand: Add 1000 todos', () => {
    for (let i = 0; i < 1000; i++) {
      useTodoStore.getState().addTodo(`Todo ${i}`);
    }
  });

const results = await benchmark.run();
console.table(results);

Performance Metrics to Collect:

interface PerformanceMetrics {
  // Throughput
  operationsPerSecond: number;

  // Latency
  latency: {
    p50: number;      // Median
    p95: number;      // 95th percentile
    p99: number;      // 99th percentile
    max: number;      // Worst case
  };

  // Resource Usage
  memory: {
    initial: number;      // Baseline memory (MB)
    peak: number;         // Peak memory (MB)
    average: number;      // Average during test (MB)
    leaks: boolean;       // Memory leak detected?
  };

  cpu: {
    average: number;      // Average CPU % during test
    peak: number;         // Peak CPU %
  };

  // Frontend Specific
  frontend?: {
    bundleSize: {
      minified: number;         // Minified size (KB)
      gzipped: number;          // Gzipped size (KB)
      treeshaken: number;       // After tree-shaking (KB)
    };
    renderCount: number;        // Number of re-renders
    loadTime: number;           // Initial load time (ms)
  };

  // Backend Specific
  backend?: {
    coldStart: number;          // Cold start time (ms)
    warmStart: number;          // Warm start time (ms)
    connectionPool: number;     // Connection pool efficiency
  };
}

Benchmarking Tools:

# Frontend Benchmarking
# Bundle size analysis
npx webpack-bundle-analyzer build/stats.json

# Lighthouse performance
npx lighthouse https://localhost:3000 --output html --output-path ./report.html

# React DevTools Profiler (programmatic)
npm install --save-dev react-addons-perf

# Backend Benchmarking
# HTTP load testing
npm install -g autocannon
autocannon -c 100 -d 30 http://localhost:3000/api/todos

# Detailed profiling
node --prof app.js
node --prof-process isolate-*.log > profile.txt

# Memory leak detection
node --inspect app.js
# Use Chrome DevTools Memory Profiler

# Database query performance
npm install -g clinic
clinic doctor -- node app.js

React Component Re-render Tracking:

// benchmark/render-counter.tsx
import { useEffect, useRef } from 'react';

export function useRenderCount(componentName: string) {
  const renderCount = useRef(0);

  useEffect(() => {
    renderCount.current += 1;
    console.log(`${componentName} rendered ${renderCount.current} times`);
  });

  return renderCount.current;
}

// Usage in TodoList
export function TodoList() {
  const renderCount = useRenderCount('TodoList');
  // ... rest of component
}

Expected Outputs:

Performance comparison table
Latency distribution graphs
Resource usage charts
Bundle size comparison
Re-render analysis

Phase 4: Code Characteristic Analysis

Objective: Analyze qualitative aspects of code with each technology.

Analysis Dimensions:

interface CodeCharacteristics {
  // Quantity
  linesOfCode: {
    application: number;        // Application code
    tests: number;              // Test code
    config: number;             // Configuration
    total: number;              // Total
  };

  // Complexity
  complexity: {
    cyclomatic: number;         // Cyclomatic complexity
    cognitive: number;          // Cognitive complexity
    nesting: number;            // Max nesting depth
  };

  // Maintainability
  maintainability: {
    duplication: number;        // Code duplication %
    comments: number;           // Comment ratio %
    readability: number;        // Readability score (0-100)
  };

  // Type Safety
  typeSafety: {
    typeScript: boolean;        // Native TS support?
    typeCoverage: number;       // Type coverage %
    typeErrors: number;         // Type errors count
    anyUsage: number;           // Usage of 'any' type
  };

  // Testing
  testability: {
    testable: boolean;          // Easy to test?
    mockingRequired: boolean;   // Mocking needed?
    testLOC: number;           // Lines in tests
    coveragePossible: number;   // Max achievable coverage %
  };

  // API Design
  api: {
    verbosity: number;          // API verbosity (1-10)
    consistency: number;        // API consistency (1-10)
    discoverability: number;    // How easy to discover features (1-10)
    idiomaticness: number;      // How idiomatic for language (1-10)
  };
}

Code Metrics Collection:

# Lines of Code
cloc redux-implementation/ zustand-implementation/ jotai-implementation/

# Complexity Metrics
npx complexity-report --format json redux-implementation/src/ > redux-complexity.json
npx complexity-report --format json zustand-implementation/src/ > zustand-complexity.json

# Type Coverage
npx type-coverage --detail redux-implementation/src/
npx type-coverage --detail zustand-implementation/src/

# Maintainability Index
npm install -g radon-cli
radon cc -a redux-implementation/src/ > redux-maintainability.txt

# Code Duplication
jscpd redux-implementation/src/ zustand-implementation/src/

Developer Experience Survey:

# Developer Experience Survey
After implementing same feature with each technology, rate 1-10:

## Redux
- **Learning Curve**: How long to become productive? (1=months, 10=minutes)
- **API Ergonomics**: How pleasant is the API? (1=frustrating, 10=delightful)
- **Boilerplate**: How much boilerplate required? (1=excessive, 10=minimal)
- **Debugging**: How easy to debug? (1=nightmare, 10=trivial)
- **Type Safety**: How good is TypeScript integration? (1=fighting types, 10=types help)
- **Documentation**: How good are docs? (1=nonexistent, 10=comprehensive)
- **Confidence**: How confident in production readiness? (1=scared, 10=rock solid)

## Zustand
[Same questions...]

## Jotai
[Same questions...]

## Recoil
[Same questions...]

Expected Outputs:

Code metrics comparison table
Developer experience ratings
API design comparison
Type safety analysis
Testability assessment

Phase 5: Ecosystem Analysis

Objective: Evaluate the broader ecosystem and long-term viability.

Ecosystem Metrics:

interface EcosystemMetrics {
  // Popularity
  popularity: {
    githubStars: number;
    npmDownloads: number;        // Weekly downloads
    stackOverflowQuestions: number;
    trendDirection: 'rising' | 'stable' | 'declining';
  };

  // Maturity
  maturity: {
    firstRelease: Date;
    latestRelease: Date;
    majorVersions: number;
    breakingChanges: number;     // In last 2 years
    stability: 'experimental' | 'beta' | 'stable' | 'mature';
  };

  // Maintenance
  maintenance: {
    commitFrequency: number;     // Commits per month
    issueResponseTime: number;   // Hours to first response
    prMergeTime: number;         // Hours to merge
    openIssues: number;
    closedIssues: number;
    maintainers: number;
    lastCommit: Date;
  };

  // Dependencies
  dependencies: {
    directDeps: number;
    totalDeps: number;           // Including transitive
    vulnerabilities: number;
    outdatedDeps: number;
    depHealth: number;           // 0-100 score
  };

  // Community
  community: {
    contributors: number;
    companyBacked: boolean;
    corporations: string[];      // Companies using
    tutorials: number;           // Quality tutorials available
    books: number;              // Books published
    courses: number;            // Online courses
  };

  // Ecosystem
  plugins: {
    official: number;            // Official plugins
    community: number;           // Community plugins
    quality: number;             // Avg quality score (0-100)
  };
}

Data Collection Script:

// benchmark/ecosystem-analysis.ts
import { Octokit } from '@octokit/rest';
import axios from 'axios';

async function analyzeEcosystem(repo: string, npmPackage: string) {
  const octokit = new Octokit({ auth: process.env.GITHUB_TOKEN });

  // GitHub metrics
  const { data: repoData } = await octokit.repos.get({
    owner: repo.split('/')[0],
    repo: repo.split('/')[1]
  });

  const { data: commits } = await octokit.repos.listCommits({
    owner: repo.split('/')[0],
    repo: repo.split('/')[1],
    per_page: 100
  });

  // NPM metrics
  const npmData = await axios.get(`https://api.npmjs.org/downloads/point/last-week/${npmPackage}`);
  const npmInfo = await axios.get(`https://registry.npmjs.org/${npmPackage}`);

  // Stack Overflow metrics
  const soData = await axios.get(
    `https://api.stackexchange.com/2.3/search?order=desc&sort=activity&tagged=${npmPackage}&site=stackoverflow`
  );

  return {
    popularity: {
      githubStars: repoData.stargazers_count,
      npmDownloads: npmData.data.downloads,
      stackOverflowQuestions: soData.data.total,
      trendDirection: calculateTrend(commits)
    },
    maturity: {
      firstRelease: new Date(repoData.created_at),
      latestRelease: new Date(npmInfo.data.time.modified),
      majorVersions: countMajorVersions(npmInfo.data.versions)
    },
    maintenance: {
      commitFrequency: commits.length / 3, // Last ~3 months
      openIssues: repoData.open_issues_count,
      lastCommit: new Date(commits[0].commit.author.date)
    }
  };
}

// Compare all technologies
const technologies = [
  { repo: 'reduxjs/redux', npm: 'redux' },
  { repo: 'pmndrs/zustand', npm: 'zustand' },
  { repo: 'pmndrs/jotai', npm: 'jotai' },
  { repo: 'facebookexperimental/Recoil', npm: 'recoil' }
];

const ecosystemComparison = await Promise.all(
  technologies.map(t => analyzeEcosystem(t.repo, t.npm))
);

console.table(ecosystemComparison);

Expected Outputs:

Ecosystem comparison matrix
Trend analysis charts
Maintenance health report
Community size comparison
Long-term viability assessment

Phase 6: Evidence-Based Decision

Objective: Synthesize all data into actionable recommendation.

Decision Matrix:

interface DecisionMatrix {
  technology: string;
  scores: {
    performance: number;          // 0-100
    developerExperience: number;  // 0-100
    codeQuality: number;          // 0-100
    ecosystem: number;            // 0-100
    security: number;             // 0-100
  };
  weights: {
    performance: number;          // e.g., 0.3 (30%)
    developerExperience: number;  // e.g., 0.25 (25%)
    codeQuality: number;          // e.g., 0.2 (20%)
    ecosystem: number;            // e.g., 0.15 (15%)
    security: number;             // e.g., 0.1 (10%)
  };
  weightedScore: number;          // Final score 0-100
  rank: number;                   // 1st, 2nd, 3rd, etc.
  recommendation: string;         // Recommend | Consider | Avoid
}

function calculateWeightedScore(
  scores: DecisionMatrix['scores'],
  weights: DecisionMatrix['weights']
): number {
  return (
    scores.performance * weights.performance +
    scores.developerExperience * weights.developerExperience +
    scores.codeQuality * weights.codeQuality +
    scores.ecosystem * weights.ecosystem +
    scores.security * weights.security
  );
}

Recommendation Report Template:

# Technology Benchmark Report
## State Management Library Comparison

**Date**: 2025-01-15
**Evaluator**: Engineering Team
**Use Case**: E-commerce product catalog state management

---

## Executive Summary

After comprehensive benchmarking of 4 state management libraries (Redux, Zustand, Jotai, Recoil), we recommend **Zustand** for our use case.

**Key Findings:**
- ✅ Zustand: Best overall balance of performance, DX, and simplicity
- ⚠️ Redux: Excellent ecosystem but significant boilerplate overhead
- ⚠️ Jotai: Great DX but smaller ecosystem and newer/less proven
- ❌ Recoil: Experimental status and Facebook-specific concerns

**Weighted Score:**
1. Zustand: 87/100
2. Jotai: 79/100
3. Redux: 76/100
4. Recoil: 68/100

---

## Detailed Comparison

### Performance (Weight: 30%)

| Metric | Redux | Zustand | Jotai | Recoil |
|--------|-------|---------|-------|--------|
| **Ops/sec** | 8,450 | 12,300 | 11,800 | 9,200 |
| **Re-renders** | 47 | 12 | 15 | 18 |
| **Bundle Size** | 42 KB | 3.2 KB | 4.1 KB | 18 KB |
| **Memory Usage** | 18 MB | 8 MB | 9 MB | 12 MB |
| **Score** | 72 | 95 | 91 | 78 |

**Winner**: Zustand (95/100)
- 45% faster than Redux
- 87% smaller bundle size
- 74% fewer re-renders

### Developer Experience (Weight: 25%)

| Aspect | Redux | Zustand | Jotai | Recoil |
|--------|-------|---------|-------|--------|
| **Learning Curve** | 4/10 | 9/10 | 8/10 | 7/10 |
| **API Ergonomics** | 6/10 | 10/10 | 9/10 | 8/10 |
| **Boilerplate** | 3/10 | 10/10 | 9/10 | 8/10 |
| **TypeScript** | 8/10 | 9/10 | 10/10 | 7/10 |
| **Debugging** | 9/10 | 7/10 | 7/10 | 8/10 |
| **Score** | 60 | 90 | 86 | 76 |

**Winner**: Zustand (90/100)
- Minimal boilerplate (6 lines vs 43 lines in Redux)
- Intuitive hook-based API
- Excellent TypeScript inference

### Code Quality (Weight: 20%)

| Metric | Redux | Zustand | Jotai | Recoil |
|--------|-------|---------|-------|--------|
| **Lines of Code** | 127 | 68 | 74 | 81 |
| **Complexity** | 12 | 4 | 5 | 6 |
| **Testability** | 9/10 | 10/10 | 9/10 | 8/10 |
| **Readability** | 7/10 | 10/10 | 9/10 | 8/10 |
| **Score** | 75 | 95 | 88 | 82 |

**Winner**: Zustand (95/100)
- 46% less code than Redux
- 67% lower complexity
- Most readable implementation

### Ecosystem (Weight: 15%)

| Metric | Redux | Zustand | Jotai | Recoil |
|--------|-------|---------|-------|--------|
| **GitHub Stars** | 60.5k | 38.2k | 16.8k | 19.2k |
| **Weekly Downloads** | 8.2M | 2.1M | 580k | 310k |
| **Maturity** | 10/10 | 8/10 | 6/10 | 5/10 |
| **Maintenance** | 9/10 | 10/10 | 9/10 | 6/10 |
| **Plugins** | 10/10 | 7/10 | 6/10 | 5/10 |
| **Score** | 95 | 82 | 68 | 58 |

**Winner**: Redux (95/100)
- Most mature and battle-tested
- Largest ecosystem
- Most corporate adoption

### Security (Weight: 10%)

| Metric | Redux | Zustand | Jotai | Recoil |
|--------|-------|---------|-------|--------|
| **Vulnerabilities** | 0 | 0 | 0 | 0 |
| **Dependencies** | 3 | 0 | 0 | 8 |
| **Audit Score** | 95 | 100 | 100 | 85 |
| **Score** | 95 | 100 | 100 | 85 |

**Winner**: Zustand & Jotai (100/100)
- Zero dependencies
- No known vulnerabilities

---

## Final Recommendation

### Primary Recommendation: Zustand

**Rationale:**
- Best overall balance across all criteria
- Exceptional developer experience (90/100)
- Top performance (95/100)
- Excellent code quality (95/100)
- Growing, healthy ecosystem (82/100)
- Perfect security score (100/100)

**Trade-offs:**
- Smaller ecosystem than Redux (but sufficient for our needs)
- Fewer learning resources (but simple enough to not be a blocker)

**Adoption Plan:**
1. Pilot Zustand in 2 new features (Sprint 1-2)
2. Gather team feedback
3. Migration guide for existing Redux stores (Sprint 3)
4. Gradual migration over 6 months

### Alternative: Jotai

If atomic state management is required:
- Similar performance to Zustand
- More granular control
- Great TypeScript support
- Trade-off: Smaller ecosystem, newer library

### Not Recommended

**Redux**: Excessive boilerplate for our use case
**Recoil**: Experimental status, Facebook-specific concerns

---

## Appendices

### Appendix A: Full Performance Data
[Detailed charts and raw numbers]

### Appendix B: Code Samples
[Side-by-side implementations]

### Appendix C: Team Survey Results
[Developer feedback]

### Appendix D: Migration Estimate
[Effort and timeline for migration]

Expected Outputs:

Executive summary with recommendation
Detailed comparison across all dimensions
Decision matrix with scores
Risk assessment
Adoption roadmap

Benchmarking Workflows

Workflow 1: Quick Comparison (2-4 hours)

Goal: Rapid assessment for low-stakes decision.

Steps:

Define Criteria (30 min)
- Select 3-5 key criteria
- Weight by importance
Parallel Implementation (90 min)
- Build minimal viable example
- Same feature, 2-3 technologies
- Focus on speed, not perfection
Quick Benchmarks (45 min)
- Bundle size
- Lines of code
- Basic performance test
- Developer feeling
Decision (15 min)
- Calculate scores
- Make recommendation
- Document rationale

Total Time: ~3 hours Use For: Low-risk technology choices

Workflow 2: Comprehensive Benchmark (1-2 weeks)

Goal: Deep analysis for high-stakes decision.

Steps:

Criteria & Planning (4 hours)
- Stakeholder interviews
- Comprehensive criteria
- Weighted decision matrix
- Success metrics
Parallel Implementation (40 hours)
- Production-quality examples
- Full feature parity
- Edge cases handled
- Tests included
Performance Benchmarking (16 hours)
- Comprehensive test suite
- Load testing
- Resource profiling
- Multiple scenarios
Code Analysis (8 hours)
- Metrics collection
- Complexity analysis
- Maintainability assessment
- Developer survey
Ecosystem Research (8 hours)
- Popularity metrics
- Community health
- Long-term viability
- Migration costs
Report & Decision (8 hours)
- Synthesize findings
- Create presentation
- Stakeholder review
- Final decision

Total Time: ~80 hours (2 weeks with 2 engineers) Use For: Major technology adoption, framework migrations

Workflow 3: Continuous Benchmarking

Goal: Ongoing monitoring of technology choices.

Setup:

# .github/workflows/benchmark.yml
name: Technology Benchmarks

on:
  schedule:
    - cron: '0 0 * * 0'  # Weekly
  workflow_dispatch:      # Manual trigger

jobs:
  benchmark:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Run Performance Benchmarks
        run: npm run benchmark

      - name: Collect Metrics
        run: |
          npm run benchmark:bundle-size
          npm run benchmark:performance
          npm run benchmark:ecosystem

      - name: Compare to Baseline
        run: node scripts/compare-benchmarks.js

      - name: Generate Report
        run: node scripts/generate-report.js

      - name: Upload Results
        uses: actions/upload-artifact@v3
        with:
          name: benchmark-results
          path: benchmark-results/

      - name: Alert on Regression
        if: steps.compare.outputs.regression == 'true'
        run: |
          # Send alert to Slack/email
          curl -X POST $SLACK_WEBHOOK -d '{"text":"Performance regression detected!"}'

Benefits:

Detect performance regressions
Track ecosystem health
Validate technology choices over time
Early warning for library issues

Best Practices

DO ✅

Benchmarking:

Use identical features across all implementations
Run benchmarks multiple times for statistical validity
Test on production-like hardware and data
Warm up before measuring (JIT compilation)
Measure what matters to your use case
Include both synthetic and real-world scenarios
Document testing methodology clearly
Account for network conditions (if applicable)

Implementation:

Same optimization level across all implementations
Production-ready code quality
Include error handling
Write tests for each implementation
Use latest stable versions
Follow official best practices for each tech
Consult documentation and experts
Pair program if learning new tech

Decision Making:

Weight criteria based on actual business priorities
Include stakeholders in criteria selection
Consider total cost of ownership (not just dev time)
Factor in team expertise and learning curve
Assess long-term viability and exit costs
Test migration path before committing
Plan for gradual adoption
Document decision rationale

DON'T ❌

Benchmarking:

Don't use toy examples for production decisions
Don't benchmark on developer machines (inconsistent)
Don't cherry-pick favorable results
Don't ignore outliers without investigation
Don't test pre-release or alpha versions
Don't compare apples to oranges (different features)
Don't skip warming up (cold start bias)
Don't forget to test failure scenarios

Implementation:

Don't optimize one implementation more than others
Don't use outdated patterns or anti-patterns
Don't skip tests to save time
Don't ignore official recommendations
Don't mix stable and experimental versions
Don't make unfair comparisons (expert vs novice code)
Don't forget edge cases
Don't skip error handling

Decision Making:

Don't let personal preferences override data
Don't ignore ecosystem health for shiny new tech
Don't underestimate migration costs
Don't forget about hiring and onboarding
Don't chase benchmarks at expense of DX
Don't commit to a technology without escape plan
Don't skip stakeholder buy-in
Don't make decision without considering team skill

Remember

Fair Comparison: Identical features, same optimization, production quality
Measure What Matters: Focus on criteria relevant to your use case
Multiple Dimensions: Performance is just one aspect; consider DX, ecosystem, etc.
Statistical Validity: Run benchmarks multiple times, account for variance
Real-World Scenarios: Synthetic benchmarks miss important details
Total Cost: Include learning curve, migration, maintenance in decision
Team Input: Technology choice affects team productivity and satisfaction
Evidence Over Opinion: Let data guide decision, not hype or bias

Technology benchmarking transforms subjective technology debates into objective, data-driven decisions, ensuring your team adopts technologies that truly deliver value for your specific context and constraints.

technology-benchmarking

Install Skill

SKILL.md

Technology Benchmarking Skill

Documentation Output Locations

Output Naming Convention

When to Use This Skill

Core Benchmarking Methodology

Phase 1: Define Comparison Criteria

Phase 2: Parallel Implementation

Phase 3: Performance Benchmarking

Phase 4: Code Characteristic Analysis

Phase 5: Ecosystem Analysis

Phase 6: Evidence-Based Decision

Benchmarking Workflows

Workflow 1: Quick Comparison (2-4 hours)

Workflow 2: Comprehensive Benchmark (1-2 weeks)

Workflow 3: Continuous Benchmarking

Best Practices

DO ✅

DON'T ❌

Remember