| name | parallel-execution-optimizer |
| description | Identify and execute independent operations in parallel for 3-5x speedup. Auto-analyzes task dependencies, groups into batches, launches parallel Task() calls. Applies to /optimize (5 checks), /ship pre-flight (5 checks), /design-variations (N screens), /implement (task batching). Auto-triggers when detecting multiple independent operations in a phase. |
Traditional sequential execution wastes time:
- /optimize runs 5 quality checks sequentially (10-15 minutes)
- /ship runs 5 pre-flight checks sequentially (8-12 minutes)
- /implement processes tasks one-by-one despite no dependencies
- Design variations generated sequentially when all could run in parallel
This skill analyzes operation dependencies, groups independent work into batches, and orchestrates parallel execution using multiple Task() agent calls in a single message. The result: 3-5x faster phase completion with zero compromise on quality or correctness.
Sequential (slow):
- Send message with Task call for security-sentry
- Wait for response
- Send message with Task call for performance-profiler
- Wait for response
- Send message with Task call for accessibility-auditor
- Total: 15 minutes
Parallel (fast):
- Send ONE message with 3 Task calls (security-sentry, performance-profiler, accessibility-auditor)
- All three run concurrently
- Total: 5 minutes
- /optimize phase: Run 5 quality checks in parallel (security, performance, accessibility, code-review, type-safety)
- /ship pre-flight: Run 5 deployment checks in parallel (env-vars, build, docker, CI-config, dependency-audit)
- /implement: Process independent task batches in parallel layers
- Design variations: Generate multiple mockup variations concurrently
- Research phase: Fetch multiple documentation sources concurrently
Scan the current phase for operations that:
- Read different files/data sources
- Don't modify shared state
- Have no sequential dependencies
- Can produce results independently
Examples:
- Quality checks (security scan + performance test + accessibility audit)
- File reads (spec.md + plan.md + tasks.md)
- API documentation fetches (Stripe docs + Twilio docs + SendGrid docs)
- Test suite runs (unit tests + integration tests + E2E tests)
Build a dependency graph:
- Layer 0: Operations with no dependencies (can run immediately)
- Layer 1: Operations depending only on Layer 0 outputs
- Layer 2: Operations depending on Layer 1 outputs
- etc.
Example (/optimize):
Layer 0 (parallel):
- security-sentry (reads codebase)
- performance-profiler (reads codebase + runs benchmarks)
- accessibility-auditor (reads UI components)
- type-enforcer (reads TypeScript files)
- dependency-curator (reads package.json)
Layer 1 (after Layer 0):
- Generate optimization-report.md (combines all Layer 0 results)
Create batches for each layer:
- All Layer 0 operations in single message (parallel execution)
- Wait for Layer 0 completion
- All Layer 1 operations in single message
- Continue through layers
Batch size considerations:
- Optimal: 3-5 operations per batch (balanced parallelism)
- Maximum: 8 operations (avoid overwhelming system)
- Minimum: 2 operations (below 2, parallelism has no benefit)
Send a single message with multiple tool calls for each batch.
Critical requirements:
- Must be a single message with multiple tool use blocks
- Each tool call must be complete and independent
- Do not use placeholders or forward references
- Each agent must have all required context in its prompt
See references/execution-patterns.md for detailed examples.
After each batch completes:
- Collect results from all parallel operations
- Check for failures or blocking issues
- Decide whether to proceed to next layer
- Aggregate findings into unified report
Failure handling:
- If any operation blocks (critical security issue), halt pipeline
- If operations have warnings (minor performance issue), continue but log
- If operations fail (agent error), retry individually or escalate
Dependency graph:
Layer 0 (parallel - 5 operations):
1. security-sentry → Scan for vulnerabilities, secrets, auth issues
2. performance-profiler → Benchmark API endpoints, detect N+1 queries
3. accessibility-auditor → WCAG 2.1 AA compliance (if UI feature)
4. type-enforcer → TypeScript strict mode compliance
5. dependency-curator → npm audit, outdated packages
Layer 1 (sequential - 1 operation):
6. Generate optimization-report.md (aggregates Layer 0 findings)
Time savings:
- Sequential: ~15 minutes (3 min per check)
- Parallel: ~5 minutes (longest check + aggregation)
- Speedup: 3x
See references/optimize-phase-parallelization.md for implementation details.
Dependency graph:
Layer 0 (parallel - 5 operations):
1. Check environment variables (read .env.example vs .env)
2. Validate production build (npm run build)
3. Check Docker configuration (docker-compose.yml, Dockerfile)
4. Validate CI configuration (.github/workflows/*.yml)
5. Run dependency audit (npm audit --production)
Layer 1 (sequential - 1 operation):
6. Update state.yaml with pre-flight results
Time savings:
- Sequential: ~12 minutes
- Parallel: ~4 minutes (build is longest operation)
- Speedup: 3x
Dependency analysis:
- Read tasks.md
- Build dependency graph from task relationships
- Identify tasks with no dependencies (Layer 0)
- Group tasks by layer
Example (15 tasks):
Layer 0 (4 tasks - parallel):
T001: Create User model
T002: Create Product model
T005: Setup test framework
T008: Create API client utility
Layer 1 (3 tasks - parallel, depend on Layer 0):
T003: User CRUD endpoints (needs T001)
T004: Product CRUD endpoints (needs T002)
T009: Write User model tests (needs T001, T005)
Layer 2 (2 tasks - parallel):
T006: User-Product relationship (needs T001, T002)
T010: Write Product model tests (needs T002, T005)
Layer 3 (sequential):
T007: Integration tests (needs all above)
Execution:
- Batch 1: Launch 4 agents for Layer 0 tasks (parallel)
- Batch 2: Launch 3 agents for Layer 1 tasks (parallel)
- Batch 3: Launch 2 agents for Layer 2 tasks (parallel)
- Batch 4: Single agent for Layer 3
Time savings:
- Sequential: 15 tasks × 20 min = 300 minutes (5 hours)
- Parallel: 4 batches × 30 min = 120 minutes (2 hours)
- Speedup: 2.5x
Use case: User wants to see 3 different homepage layouts
Sequential approach (slow):
- Generate variation A
- Generate variation B
- Generate variation C Total: 15 minutes
Parallel approach (fast):
- Launch 3 design agents in single message (A, B, C variants) Total: 5 minutes (all generate concurrently)
Speedup: 3x
- Read-only access to shared resources: Both only read the same files (safe to parallelize)
- Disjoint file access: They read/write completely different files
- No temporal dependencies: Neither requires the other's output
- Idempotent operations: Running them in any order produces same result
Two operations are dependent if:
- Write-after-read: Operation B reads file that Operation A writes
- Write-after-write: Both write to same file (race condition)
- Data dependency: Operation B needs Operation A's output as input
- Order-dependent side effects: Operations modify shared state
- Multiple quality checks reading codebase
- Multiple file reads (spec.md, plan.md, tasks.md)
- Multiple API documentation fetches
- Multiple test suite runs (if isolated)
- Multiple lint checks on different file types
Dependent (must sequence):
- Generate code → Run tests on generated code
- Fetch API docs → Generate client based on docs
- Write file → Read file back for validation
- Create database schema → Run migrations
- Build project → Deploy built artifacts
Resource contention: Even if logically independent, operations competing for same resource (CPU, memory, network) may not see speedup. Monitor system resources.
Cascading failures: If one parallel operation fails and others depend on it indirectly, you may need to cancel or retry the batch.
- Multiple quality checks: ≥3 independent checks in /optimize or /ship
- Multiple file reads: ≥3 files to read that don't depend on each other
- Multiple API calls: ≥2 external API documentation fetches
- Batch task processing: ≥5 tasks in /implement with identifiable layers
- Multiple test suites: Unit, integration, E2E running independently
- Multiple design variations: ≥2 mockup/prototype variants requested
- Multiple research queries: ≥3 web searches or documentation lookups
- Sequential dependencies exist: Operation B needs Operation A's output
- Shared state modification: Operations write to same files/database
- Small operation count: <2 independent operations (no benefit)
- Complex coordination needed: Results must be merged in specific order
- User explicitly requests sequential: "Do X, then Y, then Z"
- "Run these checks: A, B, C, D, E" → Parallel candidate
- "Generate variations: X, Y, Z" → Parallel candidate
- "Fetch documentation for: Service1, Service2, Service3" → Parallel candidate
- "Execute tasks: T001, T002, T003 (no dependencies)" → Parallel candidate
When detected, immediately analyze dependencies and propose parallel execution strategy.
Sequential execution (15 minutes):
1. Launch security-sentry (3 min)
2. Wait for completion
3. Launch performance-profiler (4 min)
4. Wait for completion
5. Launch accessibility-auditor (3 min)
6. Wait for completion
7. Launch type-enforcer (2 min)
8. Wait for completion
9. Launch dependency-curator (2 min)
10. Wait for completion
11. Aggregate results (1 min)
Total: 15 minutes
Parallel execution (5 minutes):
1. Launch 5 agents in SINGLE message:
- security-sentry
- performance-profiler
- accessibility-auditor
- type-enforcer
- dependency-curator
2. All run concurrently (longest is 4 min)
3. Aggregate results (1 min)
Total: 5 minutes
Implementation: See examples/optimize-phase-parallel.md
Sequential execution (12 minutes):
1. Check env vars (1 min)
2. Run build (5 min)
3. Check Docker config (2 min)
4. Validate CI config (2 min)
5. Dependency audit (2 min)
Total: 12 minutes
Parallel execution (6 minutes):
1. Launch 5 checks in SINGLE message (all concurrent)
2. Longest operation is build (5 min)
3. Update workflow state (1 min)
Total: 6 minutes
Implementation: See examples/ship-preflight-parallel.md
Task dependencies:
T001 (User model) → no deps
T002 (Product model) → no deps
T003 (User endpoints) → depends on T001
T004 (Product endpoints) → depends on T002
T005 (User tests) → depends on T001, T003
T006 (Product tests) → depends on T002, T004
T007 (Integration tests) → depends on T003, T004
Parallel execution plan:
Batch 1 (Layer 0): T001, T002 (parallel - 2 tasks)
Batch 2 (Layer 1): T003, T004 (parallel - 2 tasks, wait for Batch 1)
Batch 3 (Layer 2): T005, T006 (parallel - 2 tasks, wait for Batch 2)
Batch 4 (Layer 3): T007 (sequential - 1 task, wait for Batch 3)
Time savings:
- Sequential: 7 tasks × 20 min = 140 minutes
- Parallel: 4 batches × 25 min = 100 minutes
- Speedup: 1.4x
Implementation: See examples/implement-batching-parallel.md
Wrong approach:
Send message 1: Launch agent A
Send message 2: Launch agent B
Send message 3: Launch agent C
These execute sequentially because each message waits for the previous to complete.
Correct approach:
Send ONE message with 3 tool calls (A, B, C)
Rule: Multiple tool calls in a SINGLE message = parallel. Multiple messages = sequential.
Wrong approach:
Parallel batch:
- Generate User model code
- Write tests for User model (needs generated code)
Second operation will fail because code doesn't exist yet.
Correct approach:
Batch 1 (sequential): Generate User model code
Batch 2 (sequential): Write tests for User model
Rule: Always build dependency graph first. Never parallelize dependent operations.
Wrong approach:
Launch 20 agents in single message (all tasks at once)
System resources exhausted, agents may fail or slow down dramatically.
Correct approach:
Batch 1: Launch 5 agents (Layer 0)
Batch 2: Launch 5 agents (Layer 1)
Batch 3: Launch 5 agents (Layer 2)
Batch 4: Launch 5 agents (Layer 3)
Rule: Keep batches to 3-8 operations. More layers is better than huge batches.
Wrong approach:
Parallel batch:
- Read spec.md (5 seconds)
- Read plan.md (5 seconds)
Overhead of parallel coordination exceeds time savings.
Correct approach:
Sequential:
- Read spec.md
- Read plan.md
Rule: Only parallelize operations taking ≥1 minute each. Below that, sequential is fine.
- Wall-clock time reduced: Phase completes 2-5x faster than sequential baseline
- All operations successful: No failures due to race conditions or dependencies
- Results identical: Parallel execution produces same output as sequential
- No resource exhaustion: System handles parallel load without failures
- Clear dependency graph: Can explain why operations were grouped into specific batches
- Run phase sequentially
- Record total time
- Record all outputs (files, reports, state changes)
After parallelization:
- Run phase with parallel batches
- Record total time
- Record all outputs
Validate:
- Time reduced by expected factor (2-5x)
- Outputs identical (diff files, compare checksums)
- No errors or warnings introduced
- Workflow state updated correctly
Rollback if:
- Parallel version produces different outputs
- Failures or race conditions occur
- Time savings <20% (not worth complexity)
Execution patterns: references/execution-patterns.md
- Correct vs incorrect parallel execution patterns
- Message structure for parallel tool calls
- Handling tool call failures
Phase-specific guides:
- references/optimize-phase-parallelization.md
- references/ship-preflight-parallelization.md
- references/implement-phase-parallelization.md
- references/design-variations-parallelization.md
Dependency analysis: references/dependency-analysis-guide.md
- Building dependency graphs
- Detecting hidden dependencies
- Handling edge cases
Troubleshooting: references/troubleshooting.md
- Common failures and fixes
- Performance not improving
- Race condition debugging
- Dependency graph created: All operations analyzed for dependencies before execution
- Batches identified: Independent operations grouped into parallel execution batches
- Single message per batch: Each batch executed via ONE message with multiple tool calls
- Time savings achieved: 2-5x speedup compared to sequential execution
- Correctness maintained: Parallel execution produces identical results to sequential
- No race conditions: No failures due to shared state or missing dependencies
- Appropriate scope: Only applied when ≥2 operations taking ≥1 minute each
- Clear documentation: Execution plan explained (layers, batches, expected speedup)