Claude Code Plugins

Community-maintained marketplace

Feedback

Implicit feedback scoring, confidence decay, and anti-pattern detection. Use when understanding how the swarm plugin learns from outcomes, implementing learning loops, or debugging why patterns are being promoted or deprecated. Unique to opencode-swarm-plugin.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name learning-systems
description Implicit feedback scoring, confidence decay, and anti-pattern detection. Use when understanding how the swarm plugin learns from outcomes, implementing learning loops, or debugging why patterns are being promoted or deprecated. Unique to opencode-swarm-plugin.

Learning Systems

The swarm plugin learns from task outcomes to improve decomposition quality over time. Three interconnected systems track pattern effectiveness: implicit feedback scoring, confidence decay, and pattern maturity progression.

Implicit Feedback Scoring

Convert task outcomes into learning signals without explicit user feedback.

What Gets Scored

Duration signals:

  • Fast (<5 min) = helpful (1.0)
  • Medium (5-30 min) = neutral (0.6)
  • Slow (>30 min) = harmful (0.2)

Error signals:

  • 0 errors = helpful (1.0)
  • 1-2 errors = neutral (0.6)
  • 3+ errors = harmful (0.2)

Retry signals:

  • 0 retries = helpful (1.0)
  • 1 retry = neutral (0.7)
  • 2+ retries = harmful (0.3)

Success signal:

  • Success = 1.0 (40% weight)
  • Failure = 0.0

Weighted Score Calculation

rawScore = success * 0.4 + duration * 0.2 + errors * 0.2 + retries * 0.2;

Thresholds:

  • rawScore >= 0.7 → helpful
  • rawScore <= 0.4 → harmful
  • 0.4 < rawScore < 0.7 → neutral

Recording Outcomes

Call swarm_record_outcome after subtask completion:

swarm_record_outcome({
  bead_id: "bd-123.1",
  duration_ms: 180000, // 3 minutes
  error_count: 0,
  retry_count: 0,
  success: true,
  files_touched: ["src/auth.ts"],
  strategy: "file-based",
});

Fields tracked:

  • bead_id - subtask identifier
  • duration_ms - time from start to completion
  • error_count - errors encountered (from ErrorAccumulator)
  • retry_count - number of retry attempts
  • success - whether subtask completed successfully
  • files_touched - modified file paths
  • strategy - decomposition strategy used (optional)
  • failure_mode - classification if success=false (optional)
  • failure_details - error context (optional)

Confidence Decay

Evaluation criteria weights fade unless revalidated. Prevents stale patterns from dominating future decompositions.

Half-Life Formula

decayed_value = raw_value * 0.5^(age_days / 90)

Decay timeline:

  • Day 0: 100% weight
  • Day 90: 50% weight
  • Day 180: 25% weight
  • Day 270: 12.5% weight

Criterion Weight Calculation

Aggregate decayed feedback events:

helpfulSum = sum(helpful_events.map((e) => e.raw_value * decay(e.timestamp)));
harmfulSum = sum(harmful_events.map((e) => e.raw_value * decay(e.timestamp)));
weight = max(0.1, helpfulSum / (helpfulSum + harmfulSum));

Weight floor: minimum 0.1 prevents complete zeroing

Revalidation

Recording new feedback resets decay timer for that criterion:

{
  criterion: "type_safe",
  weight: 0.85,
  helpful_count: 12,
  harmful_count: 3,
  last_validated: "2024-12-12T00:00:00Z",  // Reset on new feedback
  half_life_days: 90,
}

When Criteria Get Deprecated

total = helpful_count + harmful_count;
harmfulRatio = harmful_count / total;

if (total >= 3 && harmfulRatio > 0.3) {
  // Deprecate criterion - reduce impact to 0
}

Pattern Maturity States

Patterns progress through lifecycle based on feedback accumulation:

candidateestablishedproven (or deprecated)

State Transitions

candidate (initial state):

  • Total feedback < 3 events
  • Not enough data to judge
  • Multiplier: 0.5x

established:

  • Total feedback >= 3 events
  • Has track record but not proven
  • Multiplier: 1.0x

proven:

  • Decayed helpful >= 5 AND
  • Harmful ratio < 15%
  • Multiplier: 1.5x

deprecated:

  • Harmful ratio > 30% AND
  • Total feedback >= 3 events
  • Multiplier: 0x (excluded)

Decay Applied to State Calculation

State determination uses decayed counts, not raw counts:

const { decayedHelpful, decayedHarmful } =
  calculateDecayedCounts(feedbackEvents);
const total = decayedHelpful + decayedHarmful;
const harmfulRatio = decayedHarmful / total;

// State logic applies to decayed values

Old feedback matters less. Pattern must maintain recent positive signal to stay proven.

Manual State Changes

Promote to proven:

promotePattern(maturity); // External validation confirms effectiveness

Deprecate:

deprecatePattern(maturity, "Causes file conflicts in 80% of cases");

Cannot promote deprecated patterns. Must reset.

Multipliers in Decomposition

Apply maturity multiplier to pattern scores:

const multipliers = {
  candidate: 0.5,
  established: 1.0,
  proven: 1.5,
  deprecated: 0,
};

pattern_score = base_score * multipliers[maturity.state];

Proven patterns get 50% boost, deprecated patterns excluded entirely.

Anti-Pattern Inversion

Failed patterns auto-convert to anti-patterns at >60% failure rate.

Inversion Threshold

const total = pattern.success_count + pattern.failure_count;

if (total >= 3 && pattern.failure_count / total >= 0.6) {
  invertToAntiPattern(pattern, reason);
}

Minimum observations: 3 total (prevents hasty inversion) Failure ratio: 60% (3+ failures in 5 attempts)

Inversion Process

Original pattern:

{
  id: "pattern-123",
  content: "Split by file type",
  kind: "pattern",
  is_negative: false,
  success_count: 2,
  failure_count: 5,
}

Inverted anti-pattern:

{
  id: "anti-pattern-123",
  content: "AVOID: Split by file type. Failed 5/7 times (71% failure rate)",
  kind: "anti_pattern",
  is_negative: true,
  success_count: 2,
  failure_count: 5,
  reason: "Failed 5/7 times (71% failure rate)",
}

Recording Observations

Track pattern outcomes to accumulate success/failure counts:

recordPatternObservation(
  pattern,
  success: true,  // or false
  beadId: "bd-123.1",
)

// Returns:
{
  pattern: updatedPattern,
  inversion?: {
    original: pattern,
    inverted: antiPattern,
    reason: "Failed 5/7 times (71% failure rate)",
  }
}

Pattern Extraction

Auto-detect strategies from decomposition descriptions:

extractPatternsFromDescription(
  "We'll split by file type, one file per subtask",
);

// Returns: ["Split by file type", "One file per subtask"]

Detected strategies:

  • Split by file type
  • Split by component
  • Split by layer (UI/logic/data)
  • Split by feature
  • One file per subtask
  • Handle shared types first
  • Separate API routes
  • Tests alongside implementation
  • Tests in separate subtask
  • Maximize parallelization
  • Sequential execution order
  • Respect dependency chain

Using Anti-Patterns in Prompts

Format for decomposition prompt inclusion:

formatAntiPatternsForPrompt(patterns);

Output:

## Anti-Patterns to Avoid

Based on past failures, avoid these decomposition strategies:

- AVOID: Split by file type. Failed 12/15 times (80% failure rate)
- AVOID: One file per subtask. Failed 8/10 times (80% failure rate)

Error Accumulator

Track errors during subtask execution for retry prompts and outcome scoring.

Error Types

type ErrorType =
  | "validation" // Schema/type errors
  | "timeout" // Task exceeded time limit
  | "conflict" // File reservation conflicts
  | "tool_failure" // Tool invocation failed
  | "unknown"; // Unclassified

Recording Errors

errorAccumulator.recordError(
  beadId: "bd-123.1",
  errorType: "validation",
  message: "Type error in src/auth.ts",
  options: {
    stack_trace: "...",
    tool_name: "typecheck",
    context: "After adding OAuth types",
  }
)

Generating Error Context

Format accumulated errors for retry prompts:

const context = await errorAccumulator.getErrorContext(
  beadId: "bd-123.1",
  includeResolved: false,
)

Output:

## Previous Errors

The following errors were encountered during execution:

### validation (2 errors)

- **Type error in src/auth.ts**
  - Context: After adding OAuth types
  - Tool: typecheck
  - Time: 12/12/2024, 10:30 AM

- **Missing import in src/session.ts**
  - Tool: typecheck
  - Time: 12/12/2024, 10:35 AM

**Action Required**: Address these errors before proceeding. Consider:

- What caused each error?
- How can you prevent similar errors?
- Are there patterns across error types?

Resolving Errors

Mark errors resolved after fixing:

await errorAccumulator.resolveError(errorId);

Resolved errors excluded from retry context by default.

Error Statistics

Get error counts for outcome tracking:

const stats = await errorAccumulator.getErrorStats("bd-123.1")

// Returns:
{
  total: 5,
  unresolved: 2,
  by_type: {
    validation: 3,
    timeout: 1,
    tool_failure: 1,
  }
}

Use total for error_count in outcome signals.

Using the Learning System

Integration Points

1. During decomposition (swarm_plan_prompt):

  • Query CASS for similar tasks
  • Load pattern maturity records
  • Include proven patterns in prompt
  • Exclude deprecated patterns

2. During execution:

  • ErrorAccumulator tracks errors
  • Record retry attempts
  • Track duration from start to completion

3. After completion (swarm_complete):

  • Record outcome signals
  • Score implicit feedback
  • Update pattern observations
  • Check for anti-pattern inversions
  • Update maturity states

Full Workflow Example

// 1. Decomposition phase
const cass_results = cass_search({ query: "user authentication", limit: 5 });
const patterns = loadPatterns(); // Get maturity records
const prompt = swarm_plan_prompt({
  task: "Add OAuth",
  context: formatPatternsWithMaturityForPrompt(patterns),
  query_cass: true,
});

// 2. Execution phase
const errorAccumulator = new ErrorAccumulator();
const startTime = Date.now();

try {
  // Work happens...
  await implement_subtask();
} catch (error) {
  await errorAccumulator.recordError(
    bead_id,
    classifyError(error),
    error.message,
  );
  retryCount++;
}

// 3. Completion phase
const duration = Date.now() - startTime;
const errorStats = await errorAccumulator.getErrorStats(bead_id);

swarm_record_outcome({
  bead_id,
  duration_ms: duration,
  error_count: errorStats.total,
  retry_count: retryCount,
  success: true,
  files_touched: modifiedFiles,
  strategy: "file-based",
});

// 4. Learning updates
const scored = scoreImplicitFeedback({
  bead_id,
  duration_ms: duration,
  error_count: errorStats.total,
  retry_count: retryCount,
  success: true,
  timestamp: new Date().toISOString(),
  strategy: "file-based",
});

// Update patterns
for (const pattern of extractedPatterns) {
  const { pattern: updated, inversion } = recordPatternObservation(
    pattern,
    scored.type === "helpful",
    bead_id,
  );

  if (inversion) {
    console.log(`Pattern inverted: ${inversion.reason}`);
    storeAntiPattern(inversion.inverted);
  }
}

Configuration Tuning

Adjust thresholds based on project characteristics:

const learningConfig = {
  halfLifeDays: 90, // Decay speed
  minFeedbackForAdjustment: 3, // Min observations for weight adjustment
  maxHarmfulRatio: 0.3, // Max harmful % before deprecating criterion
  fastCompletionThresholdMs: 300000, // 5 min = fast
  slowCompletionThresholdMs: 1800000, // 30 min = slow
  maxErrorsForHelpful: 2, // Max errors before marking harmful
};

const antiPatternConfig = {
  minObservations: 3, // Min before inversion
  failureRatioThreshold: 0.6, // 60% failure triggers inversion
  antiPatternPrefix: "AVOID: ",
};

const maturityConfig = {
  minFeedback: 3, // Min for leaving candidate state
  minHelpful: 5, // Decayed helpful threshold for proven
  maxHarmful: 0.15, // Max 15% harmful for proven
  deprecationThreshold: 0.3, // 30% harmful triggers deprecation
  halfLifeDays: 90,
};

Debugging Pattern Issues

Why is pattern not proven?

Check decayed counts:

const feedback = await getFeedback(patternId);
const { decayedHelpful, decayedHarmful } = calculateDecayedCounts(feedback);

console.log({ decayedHelpful, decayedHarmful });
// Need: decayedHelpful >= 5 AND harmfulRatio < 0.15

Why was pattern inverted?

Check observation counts:

const total = pattern.success_count + pattern.failure_count;
const failureRatio = pattern.failure_count / total;

console.log({ total, failureRatio });
// Inverts if: total >= 3 AND failureRatio >= 0.6

Why is criterion weight low?

Check feedback events:

const events = await getFeedbackByCriterion("type_safe");
const weight = calculateCriterionWeight(events);

console.log(weight);
// Shows: helpful vs harmful counts, last_validated date

Storage Interfaces

FeedbackStorage

Persist feedback events for criterion weight calculation:

interface FeedbackStorage {
  store(event: FeedbackEvent): Promise<void>;
  getByCriterion(criterion: string): Promise<FeedbackEvent[]>;
  getByBead(beadId: string): Promise<FeedbackEvent[]>;
  getAll(): Promise<FeedbackEvent[]>;
}

ErrorStorage

Persist errors for retry prompts:

interface ErrorStorage {
  store(entry: ErrorEntry): Promise<void>;
  getByBead(beadId: string): Promise<ErrorEntry[]>;
  getUnresolvedByBead(beadId: string): Promise<ErrorEntry[]>;
  markResolved(id: string): Promise<void>;
  getAll(): Promise<ErrorEntry[]>;
}

PatternStorage

Persist decomposition patterns:

interface PatternStorage {
  store(pattern: DecompositionPattern): Promise<void>;
  get(id: string): Promise<DecompositionPattern | null>;
  getAll(): Promise<DecompositionPattern[]>;
  getAntiPatterns(): Promise<DecompositionPattern[]>;
  getByTag(tag: string): Promise<DecompositionPattern[]>;
  findByContent(content: string): Promise<DecompositionPattern[]>;
}

MaturityStorage

Persist pattern maturity records:

interface MaturityStorage {
  store(maturity: PatternMaturity): Promise<void>;
  get(patternId: string): Promise<PatternMaturity | null>;
  getAll(): Promise<PatternMaturity[]>;
  getByState(state: MaturityState): Promise<PatternMaturity[]>;
  storeFeedback(feedback: MaturityFeedback): Promise<void>;
  getFeedback(patternId: string): Promise<MaturityFeedback[]>;
}

In-memory implementations provided for testing. Production should use persistent storage (file-based JSONL or SQLite).