parallel-execution-optimizer

name	parallel-execution-optimizer
description	Identify and execute independent operations in parallel for 3-5x speedup. Auto-analyzes task dependencies, groups into batches, launches parallel Task() calls. Applies to /optimize (5 checks), /ship pre-flight (5 checks), /design-variations (N screens), /implement (task batching). Auto-triggers when detecting multiple independent operations in a phase.

The parallel-execution-optimizer skill transforms sequential workflows into concurrent execution patterns, dramatically reducing wall-clock time for phases with multiple independent operations.

Traditional sequential execution wastes time:

/optimize runs 5 quality checks sequentially (10-15 minutes)
/ship runs 5 pre-flight checks sequentially (8-12 minutes)
/implement processes tasks one-by-one despite no dependencies
Design variations generated sequentially when all could run in parallel

This skill analyzes operation dependencies, groups independent work into batches, and orchestrates parallel execution using multiple Task() agent calls in a single message. The result: 3-5x faster phase completion with zero compromise on quality or correctness.

When you detect multiple independent operations, send **a single message** with multiple tool calls:

Sequential (slow):

Send message with Task call for security-sentry
Wait for response
Send message with Task call for performance-profiler
Wait for response
Send message with Task call for accessibility-auditor
Total: 15 minutes

Parallel (fast):

Send ONE message with 3 Task calls (security-sentry, performance-profiler, accessibility-auditor)
All three run concurrently
Total: 5 minutes

/optimize phase: Run 5 quality checks in parallel (security, performance, accessibility, code-review, type-safety)
/ship pre-flight: Run 5 deployment checks in parallel (env-vars, build, docker, CI-config, dependency-audit)
/implement: Process independent task batches in parallel layers
Design variations: Generate multiple mockup variations concurrently
Research phase: Fetch multiple documentation sources concurrently

**Identify independent operations**

Scan the current phase for operations that:

Read different files/data sources
Don't modify shared state
Have no sequential dependencies
Can produce results independently

Examples:

Quality checks (security scan + performance test + accessibility audit)
File reads (spec.md + plan.md + tasks.md)
API documentation fetches (Stripe docs + Twilio docs + SendGrid docs)
Test suite runs (unit tests + integration tests + E2E tests)

**Analyze dependencies**

Build a dependency graph:

Layer 0: Operations with no dependencies (can run immediately)
Layer 1: Operations depending only on Layer 0 outputs
Layer 2: Operations depending on Layer 1 outputs
etc.

Example (/optimize):

Layer 0 (parallel):
  - security-sentry (reads codebase)
  - performance-profiler (reads codebase + runs benchmarks)
  - accessibility-auditor (reads UI components)
  - type-enforcer (reads TypeScript files)
  - dependency-curator (reads package.json)

Layer 1 (after Layer 0):
  - Generate optimization-report.md (combines all Layer 0 results)

**Group into batches**

Create batches for each layer:

All Layer 0 operations in single message (parallel execution)
Wait for Layer 0 completion
All Layer 1 operations in single message
Continue through layers

Batch size considerations:

Optimal: 3-5 operations per batch (balanced parallelism)
Maximum: 8 operations (avoid overwhelming system)
Minimum: 2 operations (below 2, parallelism has no benefit)

**Execute parallel batches**

Send a single message with multiple tool calls for each batch.

Critical requirements:

Must be a single message with multiple tool use blocks
Each tool call must be complete and independent
Do not use placeholders or forward references
Each agent must have all required context in its prompt

See references/execution-patterns.md for detailed examples.

**Aggregate results**

After each batch completes:

Collect results from all parallel operations
Check for failures or blocking issues
Decide whether to proceed to next layer
Aggregate findings into unified report

Failure handling:

If any operation blocks (critical security issue), halt pipeline
If operations have warnings (minor performance issue), continue but log
If operations fail (agent error), retry individually or escalate

**Operation**: Run 5 quality gates in parallel

Dependency graph:

Layer 0 (parallel - 5 operations):
  1. security-sentry → Scan for vulnerabilities, secrets, auth issues
  2. performance-profiler → Benchmark API endpoints, detect N+1 queries
  3. accessibility-auditor → WCAG 2.1 AA compliance (if UI feature)
  4. type-enforcer → TypeScript strict mode compliance
  5. dependency-curator → npm audit, outdated packages

Layer 1 (sequential - 1 operation):
  6. Generate optimization-report.md (aggregates Layer 0 findings)

Time savings:

Sequential: ~15 minutes (3 min per check)
Parallel: ~5 minutes (longest check + aggregation)
Speedup: 3x

See references/optimize-phase-parallelization.md for implementation details.

**Operation**: Run 5 pre-flight checks in parallel

Dependency graph:

Layer 0 (parallel - 5 operations):
  1. Check environment variables (read .env.example vs .env)
  2. Validate production build (npm run build)
  3. Check Docker configuration (docker-compose.yml, Dockerfile)
  4. Validate CI configuration (.github/workflows/*.yml)
  5. Run dependency audit (npm audit --production)

Layer 1 (sequential - 1 operation):
  6. Update state.yaml with pre-flight results

Time savings:

Sequential: ~12 minutes
Parallel: ~4 minutes (build is longest operation)
Speedup: 3x

See references/ship-preflight-parallelization.md.

**Operation**: Execute independent task batches in parallel

Dependency analysis:

Read tasks.md
Build dependency graph from task relationships
Identify tasks with no dependencies (Layer 0)
Group tasks by layer

Example (15 tasks):

Layer 0 (4 tasks - parallel):
  T001: Create User model
  T002: Create Product model
  T005: Setup test framework
  T008: Create API client utility

Layer 1 (3 tasks - parallel, depend on Layer 0):
  T003: User CRUD endpoints (needs T001)
  T004: Product CRUD endpoints (needs T002)
  T009: Write User model tests (needs T001, T005)

Layer 2 (2 tasks - parallel):
  T006: User-Product relationship (needs T001, T002)
  T010: Write Product model tests (needs T002, T005)

Layer 3 (sequential):
  T007: Integration tests (needs all above)

Execution:

Batch 1: Launch 4 agents for Layer 0 tasks (parallel)
Batch 2: Launch 3 agents for Layer 1 tasks (parallel)
Batch 3: Launch 2 agents for Layer 2 tasks (parallel)
Batch 4: Single agent for Layer 3

Time savings:

Sequential: 15 tasks × 20 min = 300 minutes (5 hours)
Parallel: 4 batches × 30 min = 120 minutes (2 hours)
Speedup: 2.5x

See references/implement-phase-parallelization.md.

**Operation**: Generate multiple design mockups in parallel

Use case: User wants to see 3 different homepage layouts

Sequential approach (slow):

Generate variation A
Generate variation B
Generate variation C Total: 15 minutes

Parallel approach (fast):

Launch 3 design agents in single message (A, B, C variants) Total: 5 minutes (all generate concurrently)

Speedup: 3x

See references/design-variations-parallelization.md.

Two operations are independent if:

Read-only access to shared resources: Both only read the same files (safe to parallelize)
Disjoint file access: They read/write completely different files
No temporal dependencies: Neither requires the other's output
Idempotent operations: Running them in any order produces same result

Two operations are dependent if:

Write-after-read: Operation B reads file that Operation A writes
Write-after-write: Both write to same file (race condition)
Data dependency: Operation B needs Operation A's output as input
Order-dependent side effects: Operations modify shared state

**Independent** (safe to parallelize):

Multiple quality checks reading codebase
Multiple file reads (spec.md, plan.md, tasks.md)
Multiple API documentation fetches
Multiple test suite runs (if isolated)
Multiple lint checks on different file types

Dependent (must sequence):

Generate code → Run tests on generated code
Fetch API docs → Generate client based on docs
Write file → Read file back for validation
Create database schema → Run migrations
Build project → Deploy built artifacts

**Shared mutable state**: If operations modify the same git branch, database, or filesystem location, they CANNOT run in parallel safely.

Resource contention: Even if logically independent, operations competing for same resource (CPU, memory, network) may not see speedup. Monitor system resources.

Cascading failures: If one parallel operation fails and others depend on it indirectly, you may need to cancel or retry the batch.

Automatically apply parallel execution when you detect:

Multiple quality checks: ≥3 independent checks in /optimize or /ship
Multiple file reads: ≥3 files to read that don't depend on each other
Multiple API calls: ≥2 external API documentation fetches
Batch task processing: ≥5 tasks in /implement with identifiable layers
Multiple test suites: Unit, integration, E2E running independently
Multiple design variations: ≥2 mockup/prototype variants requested
Multiple research queries: ≥3 web searches or documentation lookups

Do NOT parallelize when:

Sequential dependencies exist: Operation B needs Operation A's output
Shared state modification: Operations write to same files/database
Small operation count: <2 independent operations (no benefit)
Complex coordination needed: Results must be merged in specific order
User explicitly requests sequential: "Do X, then Y, then Z"

Scan for these phrases in phase workflows:

"Run these checks: A, B, C, D, E" → Parallel candidate
"Generate variations: X, Y, Z" → Parallel candidate
"Fetch documentation for: Service1, Service2, Service3" → Parallel candidate
"Execute tasks: T001, T002, T003 (no dependencies)" → Parallel candidate

When detected, immediately analyze dependencies and propose parallel execution strategy.

**Context**: Running /optimize on a feature with UI components

Sequential execution (15 minutes):

1. Launch security-sentry (3 min)
2. Wait for completion
3. Launch performance-profiler (4 min)
4. Wait for completion
5. Launch accessibility-auditor (3 min)
6. Wait for completion
7. Launch type-enforcer (2 min)
8. Wait for completion
9. Launch dependency-curator (2 min)
10. Wait for completion
11. Aggregate results (1 min)
Total: 15 minutes

Parallel execution (5 minutes):

1. Launch 5 agents in SINGLE message:
   - security-sentry
   - performance-profiler
   - accessibility-auditor
   - type-enforcer
   - dependency-curator
2. All run concurrently (longest is 4 min)
3. Aggregate results (1 min)
Total: 5 minutes

Implementation: See examples/optimize-phase-parallel.md

**Context**: Running pre-flight checks before deployment

Sequential execution (12 minutes):

1. Check env vars (1 min)
2. Run build (5 min)
3. Check Docker config (2 min)
4. Validate CI config (2 min)
5. Dependency audit (2 min)
Total: 12 minutes

Parallel execution (6 minutes):

1. Launch 5 checks in SINGLE message (all concurrent)
2. Longest operation is build (5 min)
3. Update workflow state (1 min)
Total: 6 minutes

Implementation: See examples/ship-preflight-parallel.md

**Context**: 12 tasks with dependency graph in /implement phase

Task dependencies:

T001 (User model) → no deps
T002 (Product model) → no deps
T003 (User endpoints) → depends on T001
T004 (Product endpoints) → depends on T002
T005 (User tests) → depends on T001, T003
T006 (Product tests) → depends on T002, T004
T007 (Integration tests) → depends on T003, T004

Parallel execution plan:

Batch 1 (Layer 0): T001, T002 (parallel - 2 tasks)
Batch 2 (Layer 1): T003, T004 (parallel - 2 tasks, wait for Batch 1)
Batch 3 (Layer 2): T005, T006 (parallel - 2 tasks, wait for Batch 2)
Batch 4 (Layer 3): T007 (sequential - 1 task, wait for Batch 3)

Time savings:

Sequential: 7 tasks × 20 min = 140 minutes
Parallel: 4 batches × 25 min = 100 minutes
Speedup: 1.4x

Implementation: See examples/implement-batching-parallel.md

**Problem**: Sending multiple messages rapidly, thinking they'll run in parallel

Wrong approach:

Send message 1: Launch agent A
Send message 2: Launch agent B
Send message 3: Launch agent C

These execute sequentially because each message waits for the previous to complete.

Correct approach:

Send ONE message with 3 tool calls (A, B, C)

Rule: Multiple tool calls in a SINGLE message = parallel. Multiple messages = sequential.

**Problem**: Parallelizing dependent operations causing race conditions

Wrong approach:

Parallel batch:
- Generate User model code
- Write tests for User model (needs generated code)

Second operation will fail because code doesn't exist yet.

Correct approach:

Batch 1 (sequential): Generate User model code
Batch 2 (sequential): Write tests for User model

Rule: Always build dependency graph first. Never parallelize dependent operations.

**Problem**: Launching 20 agents in parallel, overwhelming system

Wrong approach:

Launch 20 agents in single message (all tasks at once)

System resources exhausted, agents may fail or slow down dramatically.

Correct approach:

Batch 1: Launch 5 agents (Layer 0)
Batch 2: Launch 5 agents (Layer 1)
Batch 3: Launch 5 agents (Layer 2)
Batch 4: Launch 5 agents (Layer 3)

Rule: Keep batches to 3-8 operations. More layers is better than huge batches.

**Problem**: Using parallel execution for operations taking <30 seconds each

Wrong approach:

Parallel batch:
- Read spec.md (5 seconds)
- Read plan.md (5 seconds)

Overhead of parallel coordination exceeds time savings.

Correct approach:

Sequential:
- Read spec.md
- Read plan.md

Rule: Only parallelize operations taking ≥1 minute each. Below that, sequential is fine.

After applying parallel execution optimization:

Wall-clock time reduced: Phase completes 2-5x faster than sequential baseline
All operations successful: No failures due to race conditions or dependencies
Results identical: Parallel execution produces same output as sequential
No resource exhaustion: System handles parallel load without failures
Clear dependency graph: Can explain why operations were grouped into specific batches

**Before parallelization**:

Run phase sequentially
Record total time
Record all outputs (files, reports, state changes)

After parallelization:

Run phase with parallel batches
Record total time
Record all outputs

Validate:

Time reduced by expected factor (2-5x)
Outputs identical (diff files, compare checksums)
No errors or warnings introduced
Workflow state updated correctly

Rollback if:

Parallel version produces different outputs
Failures or race conditions occur
Time savings <20% (not worth complexity)

For deeper topics, see reference files:

Execution patterns: references/execution-patterns.md

Correct vs incorrect parallel execution patterns
Message structure for parallel tool calls
Handling tool call failures

Phase-specific guides:

Dependency analysis: references/dependency-analysis-guide.md

Building dependency graphs
Detecting hidden dependencies
Handling edge cases

Troubleshooting: references/troubleshooting.md

Common failures and fixes
Performance not improving
Race condition debugging

The parallel-execution-optimizer skill is successfully applied when:

Dependency graph created: All operations analyzed for dependencies before execution
Batches identified: Independent operations grouped into parallel execution batches
Single message per batch: Each batch executed via ONE message with multiple tool calls
Time savings achieved: 2-5x speedup compared to sequential execution
Correctness maintained: Parallel execution produces identical results to sequential
No race conditions: No failures due to shared state or missing dependencies
Appropriate scope: Only applied when ≥2 operations taking ≥1 minute each
Clear documentation: Execution plan explained (layers, batches, expected speedup)

parallel-execution-optimizer

Install Skill

SKILL.md