name	condition-based-waiting
description	Use when tests have race conditions, timing dependencies, or inconsistent pass/fail behavior - replaces arbitrary timeouts with condition polling to wait for actual state changes, eliminating flaky tests with quantitative reliability tracking

Condition-Based Waiting

Overview

Flaky tests often guess at timing with arbitrary delays. This creates race conditions where tests pass on fast machines but fail under load or in CI.

Core principle: Wait for the actual condition you care about, not a guess about how long it takes.

Shannon enhancement: Track flakiness quantitatively and learn optimal wait patterns.

When to Use

digraph when_to_use {
    "Test uses setTimeout/sleep?" [shape=diamond];
    "Testing timing behavior?" [shape=diamond];
    "Document WHY timeout needed" [shape=box];
    "Use condition-based waiting" [shape=box];

    "Test uses setTimeout/sleep?" -> "Testing timing behavior?" [label="yes"];
    "Testing timing behavior?" -> "Document WHY timeout needed" [label="yes"];
    "Testing timing behavior?" -> "Use condition-based waiting" [label="no"];
}

Use when:

Tests have arbitrary delays (setTimeout, sleep, time.sleep())
Tests are flaky (pass sometimes, fail under load)
Tests timeout when run in parallel
Waiting for async operations to complete
Flakiness score > 0.1 (Shannon metric)

Don't use when:

Testing actual timing behavior (debounce, throttle intervals)
Always document WHY if using arbitrary timeout

Core Pattern

// ❌ BEFORE: Guessing at timing
await new Promise(r => setTimeout(r, 50));
const result = getResult();
expect(result).toBeDefined();

// ✅ AFTER: Waiting for condition
await waitFor(() => getResult() !== undefined);
const result = getResult();
expect(result).toBeDefined();

Quick Patterns

Scenario	Pattern
Wait for event	`waitFor(() => events.find(e => e.type === 'DONE'))`
Wait for state	`waitFor(() => machine.state === 'ready')`
Wait for count	`waitFor(() => items.length >= 5)`
Wait for file	`waitFor(() => fs.existsSync(path))`
Complex condition	`waitFor(() => obj.ready && obj.value > 10)`

Implementation

Generic polling function:

async function waitFor<T>(
  condition: () => T | undefined | null | false,
  description: string,
  timeoutMs = 5000
): Promise<T> {
  const startTime = Date.now();

  while (true) {
    const result = condition();
    if (result) {
      // Shannon: Track successful wait
      trackWaitSuccess(description, Date.now() - startTime);
      return result;
    }

    if (Date.now() - startTime > timeoutMs) {
      // Shannon: Track timeout failure
      trackWaitTimeout(description, timeoutMs);
      throw new Error(`Timeout waiting for ${description} after ${timeoutMs}ms`);
    }

    await new Promise(r => setTimeout(r, 10)); // Poll every 10ms
  }
}

// Shannon tracking helpers
function trackWaitSuccess(description: string, durationMs: number) {
  serena.write_memory(`test_reliability/waits/${test_name}`, {
    condition: description,
    duration_ms: durationMs,
    success: true,
    timestamp: new Date().toISOString()
  });
}

function trackWaitTimeout(description: string, timeoutMs: number) {
  serena.write_memory(`test_reliability/waits/${test_name}`, {
    condition: description,
    timeout_ms: timeoutMs,
    success: false,
    timestamp: new Date().toISOString()
  });
}

See @example.ts for complete implementation with domain-specific helpers (waitForEvent, waitForEventCount, waitForEventMatch) from actual debugging session.

Common Mistakes

❌ Polling too fast: setTimeout(check, 1) - wastes CPU ✅ Fix: Poll every 10ms

❌ No timeout: Loop forever if condition never met ✅ Fix: Always include timeout with clear error

❌ Stale data: Cache state before loop ✅ Fix: Call getter inside loop for fresh data

❌ Not tracking flakiness: No visibility into test stability ✅ Fix: Use Shannon tracking to measure reliability

When Arbitrary Timeout IS Correct

// Tool ticks every 100ms - need 2 ticks to verify partial output
await waitForEvent(manager, 'TOOL_STARTED'); // First: wait for condition
await new Promise(r => setTimeout(r, 200));   // Then: wait for timed behavior
// 200ms = 2 ticks at 100ms intervals - documented and justified

Requirements:

First wait for triggering condition
Based on known timing (not guessing)
Comment explaining WHY

Shannon Enhancement: Quantitative Flakiness Tracking

Flakiness Score Formula:

# Track test runs over time
test_runs = serena.query_memory(f"test_reliability/tests/{test_name}/*")

total_runs = len(test_runs)
failures = len([r for r in test_runs if not r["success"]])

# Flakiness score: 0.00 (perfect) to 1.00 (always fails)
flakiness_score = failures / total_runs if total_runs > 0 else 0.0

# Classifications:
# 0.00-0.05: STABLE (excellent)
# 0.05-0.10: ACCEPTABLE (monitor)
# 0.10-0.25: FLAKY (needs condition-based-waiting)
# 0.25+:     BROKEN (urgent fix required)

Track per test:

test_metrics = {
    "test_name": test_name,
    "total_runs": 100,
    "failures": 8,
    "flakiness_score": 0.08,
    "status": "ACCEPTABLE",
    "avg_duration_ms": 245,
    "timeout_rate": 0.02,
    "last_failure": ISO_timestamp,
    "recommendations": [
        "Consider condition-based-waiting for async operations",
        "Monitor timeout rate"
    ]
}

serena.write_memory(f"test_reliability/tests/{test_name}/metrics", test_metrics)

Shannon Enhancement: Optimal Wait Pattern Learning

Learn from historical data:

# Query historical wait times for similar conditions
wait_history = serena.query_memory("test_reliability/waits/*:condition~'database ready'")

# Calculate optimal timeout
optimal_timeout = calculate_optimal_timeout(wait_history)

# Typical wait patterns:
patterns = {
    "p50": percentile(wait_history, 0.50),  # 50% complete within
    "p95": percentile(wait_history, 0.95),  # 95% complete within
    "p99": percentile(wait_history, 0.99),  # 99% complete within
    "max": max([w["duration_ms"] for w in wait_history])
}

# Recommend timeout based on p99 + buffer
recommended_timeout = patterns["p99"] * 1.5

Example output:

Database ready condition:
  P50: 120ms (50% of waits complete)
  P95: 380ms (95% of waits complete)
  P99: 520ms (99% of waits complete)

Recommended timeout: 780ms (p99 × 1.5 buffer)
Current timeout: 5000ms (too long, wastes time on failures)

SUGGESTION: Set timeout to 800ms for faster failure detection

Shannon Enhancement: MCP Integration

For web testing with Puppeteer:

// Use Puppeteer's built-in waitFor capabilities
import { Page } from 'puppeteer';

async function waitForSelector(page: Page, selector: string) {
  // Shannon: Track Puppeteer wait metrics
  const startTime = Date.now();

  try {
    const element = await page.waitForSelector(selector, { timeout: 5000 });

    // Track success
    trackWaitSuccess(`selector: ${selector}`, Date.now() - startTime);

    return element;
  } catch (error) {
    // Track timeout
    trackWaitTimeout(`selector: ${selector}`, 5000);
    throw error;
  }
}

For complex async scenarios:

// Use Sequential MCP for deep analysis of why test is flaky
if (flakiness_score > 0.10) {
  const analysis = await sequential.analyze({
    prompt: `Analyze why test "${test_name}" has ${flakiness_score} flakiness.
             Review recent failures and suggest condition-based-waiting improvements.`,
    context: test_runs.slice(-10)  // Last 10 runs
  });

  console.log("Sequential Analysis:", analysis.recommendations);
}

Shannon Enhancement: Automated Flakiness Detection

Pre-commit hook integration:

#!/bin/bash
# hooks/pre-commit-test-check.sh

# Run tests with tracking
npm test

# Query flaky tests
FLAKY_TESTS=$(serena_cli query "test_reliability/tests/*:flakiness_score>0.10" --format json)

if [ -n "$FLAKY_TESTS" ]; then
  echo "⚠️  FLAKY TESTS DETECTED:"
  echo "$FLAKY_TESTS" | jq -r '.[] | "  - \(.test_name): \(.flakiness_score) flakiness"'
  echo ""
  echo "RECOMMENDATION: Apply condition-based-waiting skill"
  echo "See: /shannon:skill condition-based-waiting"

  exit 1
fi

Real-World Impact

From debugging session (2025-10-03):

Fixed 15 flaky tests across 3 files
Pass rate: 60% → 100%
Execution time: 40% faster
No more race conditions

Shannon tracking proves this:

# Query before/after metrics
before = serena.query_memory("test_reliability/2025-10-02/*")
after = serena.query_memory("test_reliability/2025-10-04/*")

improvement = {
    "avg_flakiness_before": 0.42,
    "avg_flakiness_after": 0.00,
    "tests_fixed": 15,
    "avg_duration_before": 2450,  # ms
    "avg_duration_after": 1470,   # ms (40% faster)
    "speedup_percent": 40
}

Integration with Other Skills

This skill works with:

test-driven-development - Write flakiness-free tests from the start
testing-anti-patterns - Arbitrary timeouts are anti-pattern
systematic-debugging - When test is flaky, apply this skill

Shannon integration:

Serena MCP - Track all test reliability metrics
Puppeteer MCP - For web UI condition waits
Sequential MCP - Deep analysis of flakiness patterns

The Bottom Line

Arbitrary timeouts = guessing. Condition polling = knowing.

Shannon's quantitative tracking turns test reliability from hope into science.

Measure flakiness. Learn patterns. Wait for conditions, not guesses.

condition-based-waiting

Install Skill

SKILL.md