Claude Code Plugins

Community-maintained marketplace

Feedback

parallel-execution-optimizer

@marcusgoll/Spec-Flow
16
0

Identify and execute independent operations in parallel for 3-5x speedup. Auto-analyzes task dependencies, groups into batches, launches parallel Task() calls. Applies to /optimize (5 checks), /ship pre-flight (5 checks), /design-variations (N screens), /implement (task batching). Auto-triggers when detecting multiple independent operations in a phase.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name parallel-execution-optimizer
description Identify and execute independent operations in parallel for 3-5x speedup. Auto-analyzes task dependencies, groups into batches, launches parallel Task() calls. Applies to /optimize (5 checks), /ship pre-flight (5 checks), /design-variations (N screens), /implement (task batching). Auto-triggers when detecting multiple independent operations in a phase.
The parallel-execution-optimizer skill transforms sequential workflows into concurrent execution patterns, dramatically reducing wall-clock time for phases with multiple independent operations.

Traditional sequential execution wastes time:

  • /optimize runs 5 quality checks sequentially (10-15 minutes)
  • /ship runs 5 pre-flight checks sequentially (8-12 minutes)
  • /implement processes tasks one-by-one despite no dependencies
  • Design variations generated sequentially when all could run in parallel

This skill analyzes operation dependencies, groups independent work into batches, and orchestrates parallel execution using multiple Task() agent calls in a single message. The result: 3-5x faster phase completion with zero compromise on quality or correctness.

When you detect multiple independent operations, send **a single message** with multiple tool calls:

Sequential (slow):

  • Send message with Task call for security-sentry
  • Wait for response
  • Send message with Task call for performance-profiler
  • Wait for response
  • Send message with Task call for accessibility-auditor
  • Total: 15 minutes

Parallel (fast):

  • Send ONE message with 3 Task calls (security-sentry, performance-profiler, accessibility-auditor)
  • All three run concurrently
  • Total: 5 minutes
  1. /optimize phase: Run 5 quality checks in parallel (security, performance, accessibility, code-review, type-safety)
  2. /ship pre-flight: Run 5 deployment checks in parallel (env-vars, build, docker, CI-config, dependency-audit)
  3. /implement: Process independent task batches in parallel layers
  4. Design variations: Generate multiple mockup variations concurrently
  5. Research phase: Fetch multiple documentation sources concurrently
**Identify independent operations**

Scan the current phase for operations that:

  • Read different files/data sources
  • Don't modify shared state
  • Have no sequential dependencies
  • Can produce results independently

Examples:

  • Quality checks (security scan + performance test + accessibility audit)
  • File reads (spec.md + plan.md + tasks.md)
  • API documentation fetches (Stripe docs + Twilio docs + SendGrid docs)
  • Test suite runs (unit tests + integration tests + E2E tests)
**Analyze dependencies**

Build a dependency graph:

  • Layer 0: Operations with no dependencies (can run immediately)
  • Layer 1: Operations depending only on Layer 0 outputs
  • Layer 2: Operations depending on Layer 1 outputs
  • etc.

Example (/optimize):

Layer 0 (parallel):
  - security-sentry (reads codebase)
  - performance-profiler (reads codebase + runs benchmarks)
  - accessibility-auditor (reads UI components)
  - type-enforcer (reads TypeScript files)
  - dependency-curator (reads package.json)

Layer 1 (after Layer 0):
  - Generate optimization-report.md (combines all Layer 0 results)
**Group into batches**

Create batches for each layer:

  • All Layer 0 operations in single message (parallel execution)
  • Wait for Layer 0 completion
  • All Layer 1 operations in single message
  • Continue through layers

Batch size considerations:

  • Optimal: 3-5 operations per batch (balanced parallelism)
  • Maximum: 8 operations (avoid overwhelming system)
  • Minimum: 2 operations (below 2, parallelism has no benefit)
**Execute parallel batches**

Send a single message with multiple tool calls for each batch.

Critical requirements:

  • Must be a single message with multiple tool use blocks
  • Each tool call must be complete and independent
  • Do not use placeholders or forward references
  • Each agent must have all required context in its prompt

See references/execution-patterns.md for detailed examples.

**Aggregate results**

After each batch completes:

  • Collect results from all parallel operations
  • Check for failures or blocking issues
  • Decide whether to proceed to next layer
  • Aggregate findings into unified report

Failure handling:

  • If any operation blocks (critical security issue), halt pipeline
  • If operations have warnings (minor performance issue), continue but log
  • If operations fail (agent error), retry individually or escalate
**Operation**: Run 5 quality gates in parallel

Dependency graph:

Layer 0 (parallel - 5 operations):
  1. security-sentry → Scan for vulnerabilities, secrets, auth issues
  2. performance-profiler → Benchmark API endpoints, detect N+1 queries
  3. accessibility-auditor → WCAG 2.1 AA compliance (if UI feature)
  4. type-enforcer → TypeScript strict mode compliance
  5. dependency-curator → npm audit, outdated packages

Layer 1 (sequential - 1 operation):
  6. Generate optimization-report.md (aggregates Layer 0 findings)

Time savings:

  • Sequential: ~15 minutes (3 min per check)
  • Parallel: ~5 minutes (longest check + aggregation)
  • Speedup: 3x

See references/optimize-phase-parallelization.md for implementation details.

**Operation**: Run 5 pre-flight checks in parallel

Dependency graph:

Layer 0 (parallel - 5 operations):
  1. Check environment variables (read .env.example vs .env)
  2. Validate production build (npm run build)
  3. Check Docker configuration (docker-compose.yml, Dockerfile)
  4. Validate CI configuration (.github/workflows/*.yml)
  5. Run dependency audit (npm audit --production)

Layer 1 (sequential - 1 operation):
  6. Update state.yaml with pre-flight results

Time savings:

  • Sequential: ~12 minutes
  • Parallel: ~4 minutes (build is longest operation)
  • Speedup: 3x

See references/ship-preflight-parallelization.md.

**Operation**: Execute independent task batches in parallel

Dependency analysis:

  1. Read tasks.md
  2. Build dependency graph from task relationships
  3. Identify tasks with no dependencies (Layer 0)
  4. Group tasks by layer

Example (15 tasks):

Layer 0 (4 tasks - parallel):
  T001: Create User model
  T002: Create Product model
  T005: Setup test framework
  T008: Create API client utility

Layer 1 (3 tasks - parallel, depend on Layer 0):
  T003: User CRUD endpoints (needs T001)
  T004: Product CRUD endpoints (needs T002)
  T009: Write User model tests (needs T001, T005)

Layer 2 (2 tasks - parallel):
  T006: User-Product relationship (needs T001, T002)
  T010: Write Product model tests (needs T002, T005)

Layer 3 (sequential):
  T007: Integration tests (needs all above)

Execution:

  • Batch 1: Launch 4 agents for Layer 0 tasks (parallel)
  • Batch 2: Launch 3 agents for Layer 1 tasks (parallel)
  • Batch 3: Launch 2 agents for Layer 2 tasks (parallel)
  • Batch 4: Single agent for Layer 3

Time savings:

  • Sequential: 15 tasks × 20 min = 300 minutes (5 hours)
  • Parallel: 4 batches × 30 min = 120 minutes (2 hours)
  • Speedup: 2.5x

See references/implement-phase-parallelization.md.

**Operation**: Generate multiple design mockups in parallel

Use case: User wants to see 3 different homepage layouts

Sequential approach (slow):

  1. Generate variation A
  2. Generate variation B
  3. Generate variation C Total: 15 minutes

Parallel approach (fast):

  1. Launch 3 design agents in single message (A, B, C variants) Total: 5 minutes (all generate concurrently)

Speedup: 3x

See references/design-variations-parallelization.md.

Two operations are independent if:
  1. Read-only access to shared resources: Both only read the same files (safe to parallelize)
  2. Disjoint file access: They read/write completely different files
  3. No temporal dependencies: Neither requires the other's output
  4. Idempotent operations: Running them in any order produces same result

Two operations are dependent if:

  1. Write-after-read: Operation B reads file that Operation A writes
  2. Write-after-write: Both write to same file (race condition)
  3. Data dependency: Operation B needs Operation A's output as input
  4. Order-dependent side effects: Operations modify shared state
**Independent** (safe to parallelize):
  • Multiple quality checks reading codebase
  • Multiple file reads (spec.md, plan.md, tasks.md)
  • Multiple API documentation fetches
  • Multiple test suite runs (if isolated)
  • Multiple lint checks on different file types

Dependent (must sequence):

  • Generate code → Run tests on generated code
  • Fetch API docs → Generate client based on docs
  • Write file → Read file back for validation
  • Create database schema → Run migrations
  • Build project → Deploy built artifacts
**Shared mutable state**: If operations modify the same git branch, database, or filesystem location, they CANNOT run in parallel safely.

Resource contention: Even if logically independent, operations competing for same resource (CPU, memory, network) may not see speedup. Monitor system resources.

Cascading failures: If one parallel operation fails and others depend on it indirectly, you may need to cancel or retry the batch.

Automatically apply parallel execution when you detect:
  1. Multiple quality checks: ≥3 independent checks in /optimize or /ship
  2. Multiple file reads: ≥3 files to read that don't depend on each other
  3. Multiple API calls: ≥2 external API documentation fetches
  4. Batch task processing: ≥5 tasks in /implement with identifiable layers
  5. Multiple test suites: Unit, integration, E2E running independently
  6. Multiple design variations: ≥2 mockup/prototype variants requested
  7. Multiple research queries: ≥3 web searches or documentation lookups
Do NOT parallelize when:
  1. Sequential dependencies exist: Operation B needs Operation A's output
  2. Shared state modification: Operations write to same files/database
  3. Small operation count: <2 independent operations (no benefit)
  4. Complex coordination needed: Results must be merged in specific order
  5. User explicitly requests sequential: "Do X, then Y, then Z"
Scan for these phrases in phase workflows:
  • "Run these checks: A, B, C, D, E" → Parallel candidate
  • "Generate variations: X, Y, Z" → Parallel candidate
  • "Fetch documentation for: Service1, Service2, Service3" → Parallel candidate
  • "Execute tasks: T001, T002, T003 (no dependencies)" → Parallel candidate

When detected, immediately analyze dependencies and propose parallel execution strategy.

**Context**: Running /optimize on a feature with UI components

Sequential execution (15 minutes):

1. Launch security-sentry (3 min)
2. Wait for completion
3. Launch performance-profiler (4 min)
4. Wait for completion
5. Launch accessibility-auditor (3 min)
6. Wait for completion
7. Launch type-enforcer (2 min)
8. Wait for completion
9. Launch dependency-curator (2 min)
10. Wait for completion
11. Aggregate results (1 min)
Total: 15 minutes

Parallel execution (5 minutes):

1. Launch 5 agents in SINGLE message:
   - security-sentry
   - performance-profiler
   - accessibility-auditor
   - type-enforcer
   - dependency-curator
2. All run concurrently (longest is 4 min)
3. Aggregate results (1 min)
Total: 5 minutes

Implementation: See examples/optimize-phase-parallel.md

**Context**: Running pre-flight checks before deployment

Sequential execution (12 minutes):

1. Check env vars (1 min)
2. Run build (5 min)
3. Check Docker config (2 min)
4. Validate CI config (2 min)
5. Dependency audit (2 min)
Total: 12 minutes

Parallel execution (6 minutes):

1. Launch 5 checks in SINGLE message (all concurrent)
2. Longest operation is build (5 min)
3. Update workflow state (1 min)
Total: 6 minutes

Implementation: See examples/ship-preflight-parallel.md

**Context**: 12 tasks with dependency graph in /implement phase

Task dependencies:

T001 (User model) → no deps
T002 (Product model) → no deps
T003 (User endpoints) → depends on T001
T004 (Product endpoints) → depends on T002
T005 (User tests) → depends on T001, T003
T006 (Product tests) → depends on T002, T004
T007 (Integration tests) → depends on T003, T004

Parallel execution plan:

Batch 1 (Layer 0): T001, T002 (parallel - 2 tasks)
Batch 2 (Layer 1): T003, T004 (parallel - 2 tasks, wait for Batch 1)
Batch 3 (Layer 2): T005, T006 (parallel - 2 tasks, wait for Batch 2)
Batch 4 (Layer 3): T007 (sequential - 1 task, wait for Batch 3)

Time savings:

  • Sequential: 7 tasks × 20 min = 140 minutes
  • Parallel: 4 batches × 25 min = 100 minutes
  • Speedup: 1.4x

Implementation: See examples/implement-batching-parallel.md

**Problem**: Sending multiple messages rapidly, thinking they'll run in parallel

Wrong approach:

Send message 1: Launch agent A
Send message 2: Launch agent B
Send message 3: Launch agent C

These execute sequentially because each message waits for the previous to complete.

Correct approach:

Send ONE message with 3 tool calls (A, B, C)

Rule: Multiple tool calls in a SINGLE message = parallel. Multiple messages = sequential.

**Problem**: Parallelizing dependent operations causing race conditions

Wrong approach:

Parallel batch:
- Generate User model code
- Write tests for User model (needs generated code)

Second operation will fail because code doesn't exist yet.

Correct approach:

Batch 1 (sequential): Generate User model code
Batch 2 (sequential): Write tests for User model

Rule: Always build dependency graph first. Never parallelize dependent operations.

**Problem**: Launching 20 agents in parallel, overwhelming system

Wrong approach:

Launch 20 agents in single message (all tasks at once)

System resources exhausted, agents may fail or slow down dramatically.

Correct approach:

Batch 1: Launch 5 agents (Layer 0)
Batch 2: Launch 5 agents (Layer 1)
Batch 3: Launch 5 agents (Layer 2)
Batch 4: Launch 5 agents (Layer 3)

Rule: Keep batches to 3-8 operations. More layers is better than huge batches.

**Problem**: Using parallel execution for operations taking <30 seconds each

Wrong approach:

Parallel batch:
- Read spec.md (5 seconds)
- Read plan.md (5 seconds)

Overhead of parallel coordination exceeds time savings.

Correct approach:

Sequential:
- Read spec.md
- Read plan.md

Rule: Only parallelize operations taking ≥1 minute each. Below that, sequential is fine.

After applying parallel execution optimization:
  1. Wall-clock time reduced: Phase completes 2-5x faster than sequential baseline
  2. All operations successful: No failures due to race conditions or dependencies
  3. Results identical: Parallel execution produces same output as sequential
  4. No resource exhaustion: System handles parallel load without failures
  5. Clear dependency graph: Can explain why operations were grouped into specific batches
**Before parallelization**:
  1. Run phase sequentially
  2. Record total time
  3. Record all outputs (files, reports, state changes)

After parallelization:

  1. Run phase with parallel batches
  2. Record total time
  3. Record all outputs

Validate:

  • Time reduced by expected factor (2-5x)
  • Outputs identical (diff files, compare checksums)
  • No errors or warnings introduced
  • Workflow state updated correctly

Rollback if:

  • Parallel version produces different outputs
  • Failures or race conditions occur
  • Time savings <20% (not worth complexity)
For deeper topics, see reference files:

Execution patterns: references/execution-patterns.md

  • Correct vs incorrect parallel execution patterns
  • Message structure for parallel tool calls
  • Handling tool call failures

Phase-specific guides:

Dependency analysis: references/dependency-analysis-guide.md

  • Building dependency graphs
  • Detecting hidden dependencies
  • Handling edge cases

Troubleshooting: references/troubleshooting.md

  • Common failures and fixes
  • Performance not improving
  • Race condition debugging
The parallel-execution-optimizer skill is successfully applied when:
  1. Dependency graph created: All operations analyzed for dependencies before execution
  2. Batches identified: Independent operations grouped into parallel execution batches
  3. Single message per batch: Each batch executed via ONE message with multiple tool calls
  4. Time savings achieved: 2-5x speedup compared to sequential execution
  5. Correctness maintained: Parallel execution produces identical results to sequential
  6. No race conditions: No failures due to shared state or missing dependencies
  7. Appropriate scope: Only applied when ≥2 operations taking ≥1 minute each
  8. Clear documentation: Execution plan explained (layers, batches, expected speedup)