name	reflect
description	Extract learnings from session corrections and patterns, update skill files with persistent memory. Implements Loop 1.5 - per-session micro-learning between execution and meta-optimization.
allowed-tools	Read, Write, Edit, Bash, Glob, Grep, Task, TodoWrite
model	sonnet
x-version	1.0.4
x-category	tooling
x-vcl-compliance	v3.1.1
x-cognitive-frames	HON, MOR, COM, CLS, EVD, ASP, SPC
x-loop	1.5
x-last-reflection	Thu Jan 08 2026 15:25:00 GMT+0000 (Coordinated Universal Time)
x-reflection-count	4

L1 Improvement

Created as new skill following Skill Forge v3.2 required sections
Implements Loop 1.5 (Session Reflection) to fill gap between Loop 1 (Execution) and Loop 3 (Meta-Loop)
Integrates with Memory MCP, Skill Forge, and Ralph Wiggum stop hooks

TIER 1: CRITICAL SECTIONS

Overview

The Reflect skill solves a fundamental limitation of LLMs: they don't learn from session to session. Every conversation starts from zero, causing the same mistakes to recur and forcing users to repeat corrections endlessly.

Philosophy: Corrections are signals. Approvals are confirmations. Both should be captured, classified, and persisted into skill files where they become permanent knowledge that survives across sessions.

Methodology: 7-phase extraction and update pipeline that:

Detects learning signals in conversation
Maps them to invoked skills
Classifies confidence levels (VERIX-aligned)
Proposes skill file updates
Applies changes via Skill Forge patterns
Stores in Memory MCP for Meta-Loop aggregation
Commits to Git for version tracking

Value Proposition: Correct once, never again. Transform ephemeral session corrections into persistent skill improvements that compound over time.

Core Principles

The Reflect skill operates on 5 core principles:

Principle 1: Signals Over Commands

Corrections are the strongest learning signals. When a user says "No, use X instead", this is more valuable than explicit instructions because it reveals a gap between expectation and delivery.

In practice:

Parse conversation for correction patterns (negation + alternative)
Weight corrections higher than approvals
Track correction frequency per skill

Principle 2: Evidence-Based Confidence

All learnings must have VERIX-aligned confidence ceilings. Don't overclaim certainty from limited evidence.

In practice:

HIGH (0.90): Explicit "always/never" rules from user
MEDIUM (0.75): Patterns confirmed across 2+ occurrences
LOW (0.55): Single observations requiring review

Principle 3: Skill File as Memory

Store learnings in SKILL.md, not in embeddings. Skill files are human-readable, version-controlled, and immediately effective.

In practice:

Add LEARNED PATTERNS section to skill files
Increment x-version on each update
Track x-last-reflection timestamp

Principle 4: Safe by Default

Preview all changes before applying. HIGH confidence changes require explicit approval; automation only for MEDIUM/LOW.

In practice:

Show diff preview before any edit
Require Y/N for HIGH confidence learnings
Enable auto-apply only when reflect-on is active

Principle 5: Feed the Meta-Loop

Session learnings aggregate into system optimization. Micro-learning feeds macro-optimization.

In practice:

Store all learnings in Memory MCP
Tag with WHO/WHEN/PROJECT/WHY
Meta-Loop queries and aggregates every 3 days

When to Use

Use Reflect when:

You've corrected Claude's output during a session
A skill produced good results you want to reinforce
You notice recurring mistakes across sessions
You want to capture style or preference cues
Session is ending and you want to preserve learnings
You see explicit rules emerge ("always X", "never Y")

Do NOT use Reflect when:

Conversation is trivial (< 5 exchanges)
No skills were invoked in session
User explicitly says "don't remember this"
Target is the eval-harness (FORBIDDEN - stays frozen)
Changes would bypass existing safety gates

Main Workflow

Phase 1: Signal Detection

Purpose: Scan conversation for learning signals Agent: intent-parser (from registry)

Input Contract:

inputs:
  conversation_context: string  # Full session transcript
  invoked_skills: list[string]  # Skills used in session

Process:

Parse conversation for signal patterns
Classify each signal by type and strength
Extract the learning content

Signal Types:

Type	Pattern	Confidence
Correction	"No, use X", "That's wrong", "Actually..."	HIGH (0.90)
Explicit Rule	"Always do X", "Never do Y"	HIGH (0.90)
Approval	"Perfect", "Yes, exactly", "That's right"	MEDIUM (0.75)
Rejection	User rejected proposed solution	MEDIUM (0.75)
Style Cue	Formatting or naming preferences	LOW (0.55)
Observation	Implicit preference detected	LOW (0.55)

Output Contract:

outputs:
  signals: list[Signal]
  Signal:
    type: correction|explicit_rule|approval|rejection|style_cue|observation
    content: string  # The actual learning
    context: string  # Surrounding context
    confidence: float  # 0.55-0.90
    ground: string  # Evidence source

Phase 2: Skill Mapping

Purpose: Map signals to the skills they apply to Agent: skill-mapper (custom logic)

Process:

Parse conversation for Skill() invocations
Check command history for /skill-name calls
Match each signal to its relevant skill
Handle multi-skill sessions (separate updates per skill)

Output Contract:

outputs:
  skill_signals: dict[skill_name, list[Signal]]

Phase 3: Confidence Classification

Purpose: Apply VERIX-aligned confidence levels Agent: prompt-architect patterns

Classification Rules:

HIGH   [conf:0.90] = Explicit "never/always" rules
                      Direct corrections with clear alternative
                      User used emphatic language

MEDIUM [conf:0.75] = Successful patterns (2+ confirmations)
                      Single strong approval
                      Rejection with implicit preference

LOW    [conf:0.55] = Single observations
                      Style cues without explicit statement
                      Inferred preferences

Ceiling Enforcement:

Never exceed 0.95 (observation ceiling)
Report-based learnings max 0.70
Inference-based learnings max 0.70

Phase 4: Change Proposal

Purpose: Generate proposed skill file updates Agent: skill-forge patterns

Process:

Read current SKILL.md for target skill
Check if LEARNED PATTERNS section exists (create if not)
Generate diff showing proposed additions
Format commit message

Output Format:

## Proposed Updates

**Skill: {skill_name}** (v{old} -> v{new})

### Signals Detected
- {count} corrections (HIGH)
- {count} approvals (MEDIUM)
- {count} observations (LOW)

### Diff Preview
```diff
+ ### High Confidence [conf:0.90]
+ - {learning content} [ground:{source}:{date}]

Commit Message

reflect({skill}): [{LEVEL}] {description}

[Y] Accept [N] Reject [E] Edit with natural language


#### Phase 5: Apply Updates
**Purpose**: Safely update skill files
**Agent**: skill-forge

**Process**:
1. If approved (manual) or auto-mode enabled:
2. Read skill file
3. Find or create LEARNED PATTERNS section
4. Append new learnings under appropriate confidence level
5. Increment x-version in frontmatter
6. Set x-last-reflection to current timestamp
7. Increment x-reflection-count
8. Write updated file

**LEARNED PATTERNS Section Format**:
```markdown
## LEARNED PATTERNS

### High Confidence [conf:0.90]
- ALWAYS check for SQL injection vulnerabilities [ground:user-correction:2026-01-05]
- NEVER use inline styles in components [ground:user-correction:2026-01-03]

### Medium Confidence [conf:0.75]
- Prefer async/await over .then() chains [ground:approval-pattern:3-sessions]
- Use descriptive variable names in examples [ground:approval-pattern:2-sessions]

### Low Confidence [conf:0.55]
- User may prefer verbose error messages [ground:observation:1-session]

Phase 6: Memory MCP Storage

Purpose: Persist learnings for Meta-Loop aggregation Agent: memory-mcp integration

Storage Format:

{
  "WHO": "reflect-skill:{session_id}",
  "WHEN": "{ISO8601_timestamp}",
  "PROJECT": "{project_name}",
  "WHY": "session-learning",
  "x-skill": "{skill_name}",
  "x-version-before": "{old_version}",
  "x-version-after": "{new_version}",
  "x-signals": {
    "corrections": 2,
    "approvals": 1,
    "observations": 1
  },
  "x-learnings": [
    {
      "content": "ALWAYS check for SQL injection",
      "confidence": 0.90,
      "ground": "user-correction",
      "category": "HIGH"
    }
  ]
}

Storage Path: sessions/reflect/{project}/{skill}/{timestamp}

Phase 7: Git Commit (Optional)

Purpose: Version the skill evolution Agent: bash git commands

Commit Format:

reflect({skill_name}): [{LEVEL}] {description}

- Added {n} learnings from session
- Confidence levels: HIGH:{n}, MEDIUM:{n}, LOW:{n}
- Evidence: user-correction, approval-pattern, observation

Generated by reflect skill v1.0.0

TIER 2: ESSENTIAL SECTIONS

Pattern Recognition

Different session types require different reflection approaches:

Debugging Session

Patterns: "bug", "fix", "error", "not working" Common Corrections: Framework choice, error handling patterns, edge cases Key Focus: What was the root cause? What pattern prevents recurrence? Approach: Extract diagnostic insights and prevention rules

Code Review Session

Patterns: "review", "check", "looks good", "change this" Common Corrections: Style violations, security concerns, naming Key Focus: What standards emerged? What was consistently flagged? Approach: Extract style rules and security patterns

Feature Development Session

Patterns: "build", "create", "implement", "add" Common Corrections: Architecture choices, component usage, API patterns Key Focus: What design decisions worked? What was rejected? Approach: Extract architectural preferences and component rules

Documentation Session

Patterns: "document", "explain", "readme", "describe" Common Corrections: Tone, structure, level of detail Key Focus: What style resonated? What format preferred? Approach: Extract documentation style guide entries

Advanced Techniques

Multi-Session Pattern Detection

Track signals across sessions to identify recurring patterns:

Query Memory MCP for similar signals in past 7 days
If same correction appears 3+ times, escalate to HIGH confidence
Detect conflicting signals and flag for resolution

Negative Space Analysis

Learn from what was NOT corrected:

If user didn't correct a pattern, it's implicitly approved
Track approval-by-silence for frequently used patterns
Lower confidence (0.55) but valuable signal

Skill Dependency Tracking

When correcting skill A, check impact on skills that depend on it:

Build dependency graph from skill index
Warn if update might conflict with downstream skills
Suggest propagating changes to related skills

Conflict Resolution

Handle contradictory signals:

Newer signals override older (recency bias)
Higher confidence overrides lower
If true conflict, ask user to resolve
Store conflict history for pattern analysis

Common Anti-Patterns

Anti-Pattern	Problem	Solution
Over-Learning	Capturing every small preference	Only persist signals that appear 2+ times or are explicit rules
Under-Confidence	All learnings at LOW confidence	Explicit "always/never" statements are HIGH; don't downgrade
Eval-Harness Modification	Attempting to update frozen harness	BLOCK: eval-harness never self-improves
Silent Updates	Applying changes without preview	ALWAYS show diff and require confirmation for HIGH
Orphan Learnings	Storing in Memory but not SKILL.md	Write to BOTH: skill file for immediate effect, Memory for aggregation
Version Skip	Not incrementing x-version	ALWAYS bump version on any skill file change

Practical Guidelines

Full vs Quick Mode

Full Mode (default for manual /reflect):

Scan entire conversation
Detect all signal types
Generate comprehensive diff
Require approval for each change
Commit to git with detailed message

Quick Mode (/reflect --quick or auto mode):

Focus on explicit corrections only
Skip style cues and observations
Auto-apply MEDIUM/LOW changes
Batch commit at end of session

Decision Points

When to ask user:

Conflicting signals detected
HIGH confidence change proposed
Same correction already exists (confirm override)
Signal maps to multiple skills

When to auto-apply:

reflect-on is enabled
Confidence is MEDIUM or LOW
No conflicts detected
Clear skill mapping

TIER 3: INTEGRATION SECTIONS

Cross-Skill Coordination

Upstream Skills (provide input)

Skill	When Used Before	What It Provides
intent-analyzer	Before reflection	Parsed user intent for signal context
prompt-architect	For constraint classification	HARD/SOFT/INFERRED distinction

Downstream Skills (use output)

Skill	When Used After	What It Does
skill-forge	After signal classification	Applies safe SKILL.md updates
bootstrap-loop	During Meta-Loop	Aggregates learnings for optimization

Parallel Skills (work together)

Skill	When Used Together	How They Coordinate
memory-manager	During storage phase	Stores in Memory MCP
github-integration	During commit phase	Handles git operations

MCP Requirements

Required:

memory-mcp: Store learnings for cross-session retrieval and Meta-Loop aggregation
- WHY: Central persistence layer for all learnings
- Tag: WHO=reflect-skill:{session}, WHY=session-learning

Optional:

sequential-thinking: For complex multi-signal analysis
- WHY: Helps reason through conflicting signals
vector-search: For finding similar past learnings
- WHY: Detect patterns across sessions

Input/Output Contracts

inputs:
  # Required
  trigger: manual | automatic  # How reflect was invoked

  # Optional
  skill_name: string  # Target specific skill (else detect from session)
  mode: full | quick  # Reflection depth
  auto_apply: boolean  # Skip approval for MEDIUM/LOW (requires reflect-on)

outputs:
  # Always returned
  signals_detected: list[Signal]
  skills_updated: list[string]
  learnings_stored: list[MemoryKey]

  # If changes made
  skill_diffs: dict[skill_name, diff_preview]
  version_changes: dict[skill_name, {old: string, new: string}]

  # If git enabled
  commit_hash: string
  commit_message: string

Recursive Improvement

Role in Meta-Loop: The reflect skill is ITSELF subject to Meta-Loop optimization:

Store self-reflections about signal detection accuracy
Track false positive rate (learnings user rejected)
Track coverage (signals user added manually)

Eval Harness Integration:

reflect skill is NOT part of frozen eval harness
CAN be improved via bootstrap-loop
MUST NOT modify eval-harness when reflecting on it

Memory Namespace:

Store at: skills/tooling/reflect/{project}/{timestamp}
Retrieve via: query for x-skill=reflect, WHY=self-improvement

TIER 4: CLOSURE SECTIONS

Examples

Example 1: Manual Reflection After Debugging Session

User: /reflect

Claude: Analyzing session for learning signals...

## Session Reflection Report

### Signals Detected
- 2 corrections (HIGH confidence)
- 1 approval (MEDIUM confidence)
- 1 observation (LOW confidence)

### Proposed Updates

**Skill: debug** (v2.1.0 -> v2.1.1)

```diff
+ ## LEARNED PATTERNS
+
+ ### High Confidence [conf:0.90]
+ - ALWAYS check for null pointer exceptions before accessing object properties [ground:user-correction:2026-01-05]
+ - Use structured logging instead of console.log in production code [ground:user-correction:2026-01-05]
+
+ ### Medium Confidence [conf:0.75]
+ - Prefer try-catch blocks around async operations [ground:approval-pattern:2026-01-05]
+
+ ### Low Confidence [conf:0.55]
+ - User may prefer detailed stack traces in error messages [ground:observation:2026-01-05]

Commit Message

reflect(debug): [HIGH] Add null check and structured logging rules

[Y] Accept [N] Reject [E] Edit with natural language

User: Y

Claude: Updates applied successfully.

Updated: skills/delivery/debug/SKILL.md (v2.1.0 -> v2.1.1)
Stored: sessions/reflect/my-project/debug/2026-01-05T10:30:00Z
Committed: reflect(debug): [HIGH] Add null check and structured logging rules

Confidence: 0.85 (ceiling: observation 0.95)


#### Example 2: Automatic Reflection via Stop Hook

[Session ending - reflect-on is enabled]

========================================== SESSION REFLECTION (Automatic)

Detected 3 learning signals across 2 skills.

Skill: code-review

[MEDIUM] Prefer early returns over nested conditionals

Skill: tester

[LOW] Integration tests should mock external APIs
[LOW] Use descriptive test names

Auto-applied 3 MEDIUM/LOW learnings. Stored in Memory MCP. Git commit: reflect(code-review,tester): [MEDIUM] Session learnings

Next session will include these patterns.


#### Example 3: Targeting Specific Skill

User: /reflect code-review

Claude: Reflecting on code-review skill specifically...

Signals for code-review

Found 4 signals:

[HIGH] "Never approve PRs with console.log statements" - user-correction
[MEDIUM] Security review phase appreciated - approval
[MEDIUM] Diff-only output format works well - approval
[LOW] May prefer bullet points over paragraphs - observation

Propose adding to skills/quality/code-review/SKILL.md? [Y] Accept [N] Reject [E] Edit


### Troubleshooting

| Problem | Solution |
|---------|----------|
| **No signals detected** | Ensure conversation had corrections or approvals. Trivial sessions may not have learnings. |
| **Skill not found** | Verify skill was invoked via Skill() or /command. Check skill-index.json. |
| **Memory MCP unavailable** | Learnings still applied to skill files. Retry memory storage later. |
| **Git commit failed** | Check git status. Ensure no merge conflicts. Manual commit may be needed. |
| **Conflicting learnings** | User must resolve. Show both versions and ask which to keep. |
| **Permission denied on skill file** | Check file permissions. May need elevated access. |
| **x-version not incrementing** | Ensure YAML frontmatter is valid. Check for parsing errors. |

### Conclusion

The Reflect skill transforms ephemeral session corrections into persistent knowledge by implementing **Loop 1.5** - a per-session micro-learning layer that bridges immediate execution (Loop 1) and long-term optimization (Loop 3).

Key capabilities:
- **Signal Detection**: Automatically identifies corrections, approvals, and patterns
- **Confidence Classification**: VERIX-aligned levels (HIGH/MEDIUM/LOW) prevent overclaiming
- **Safe Updates**: Preview-first approach with approval gates for critical changes
- **Memory Integration**: Feeds Meta-Loop for system-wide optimization
- **Version Control**: Git tracking enables rollback and evolution analysis

By capturing learnings at the session level and persisting them in skill files, the Reflect skill enables a self-improving development experience where corrections compound into expertise over time.

### Completion Verification

- [x] YAML frontmatter with x-version, x-category, x-vcl-compliance
- [x] Overview with philosophy, methodology, value proposition
- [x] Core Principles (5 principles with "In practice" items)
- [x] When to Use with use/don't-use criteria
- [x] Main Workflow with 7 phases, agents, input/output contracts
- [x] Pattern Recognition for different session types
- [x] Advanced Techniques (multi-session, negative space, dependencies, conflicts)
- [x] Common Anti-Patterns table with Problem/Solution
- [x] Practical Guidelines for full/quick modes
- [x] Cross-Skill Coordination (upstream/downstream/parallel)
- [x] MCP Requirements with WHY explanations
- [x] Input/Output Contracts in YAML
- [x] Recursive Improvement integration
- [x] Examples (3 complete scenarios)
- [x] Troubleshooting table
- [x] Conclusion summarizing value
- [x] Completion Verification checklist

Confidence: 0.85 (ceiling: observation 0.95) - New skill created following Skill Forge v3.2 required sections with full Tier 1-4 coverage.

---

## LEARNED PATTERNS

### High Confidence [conf:0.90] - CRITICAL

#### Skill Package Format
- Skills use `.skill` extension (NOT `.skill.zip`). The `.skill` file IS a zip archive with renamed extension.
- Correct location: `skills/packaged/` folder (NOT `skills/dist/`)
- [ground:user-correction:2026-01-08]

#### Multi-File Update Workflow
When updating a packaged skill with learned patterns, update ALL relevant files:

| File | Update Required | Content |
|------|-----------------|---------|
| SKILL.md | Always | Add to LEARNED PATTERNS section |
| CHANGELOG.md | Always | Add version entry with date and description |
| manifest.json | Always | Increment version number |
| quick-reference.md | If operational | Add new tips/workflows |
| readme.md | If scope changes | Update overview |

**Complete Workflow:**
```bash
# 1. Unzip .skill file
unzip skills/packaged/skill-name.skill -d /tmp/skill-update/skill-name

# 2. Update files: SKILL.md, CHANGELOG.md, manifest.json, quick-reference.md

# 3. Rezip with PowerShell (use cygpath for Windows paths)
WIN_PATH=$(cygpath -w /tmp/skill-update/skill-name)
WIN_ZIP=$(cygpath -w /tmp/skill-update/skill-name.zip)
powershell -Command "Compress-Archive -Path '$WIN_PATH\*' -DestinationPath '$WIN_ZIP' -Force"

# 4. Deploy with .skill extension
cp /tmp/skill-update/skill-name.zip skills/packaged/skill-name.skill

[ground:user-correction:2026-01-08]

Medium Confidence [conf:0.75]

Windows interop: PowerShell commands require Windows paths (C:...). Use cygpath -w /unix/path to convert Git Bash paths before invoking PowerShell Compress-Archive [ground:error-correction:2026-01-08]
File tool fallback: When Edit tool fails repeatedly with "unexpectedly modified" errors, use Bash heredoc (cat > file << 'EOF') as reliable alternative for file writes [ground:observation:pattern:2026-01-08]
MCP integration pattern: When integrating new components, expose in init.py all list, document config in .env, provide standalone test script, update README with usage examples [ground:approval:successful-pattern:2026-01-08]

Low Confidence [conf:0.55]

Self-test on creation session validates the workflow and demonstrates dogfooding [ground:observation:2026-01-05]
Python standalone scripts: Use Path(file).parent.parent to get project root, add to sys.path, and os.chdir() to project root before imports to avoid relative import issues [ground:observation:fix:2026-01-08]

Install Skill

SKILL.md

L1 Improvement

TIER 1: CRITICAL SECTIONS

Overview

Core Principles

Principle 1: Signals Over Commands

Principle 2: Evidence-Based Confidence

Principle 3: Skill File as Memory

Principle 4: Safe by Default

Principle 5: Feed the Meta-Loop

When to Use

Main Workflow

Phase 1: Signal Detection

Phase 2: Skill Mapping

Phase 3: Confidence Classification

Phase 4: Change Proposal

Commit Message

Phase 6: Memory MCP Storage

Phase 7: Git Commit (Optional)

TIER 2: ESSENTIAL SECTIONS

Pattern Recognition

Debugging Session

Code Review Session

Feature Development Session

Documentation Session

Advanced Techniques

Multi-Session Pattern Detection

Negative Space Analysis

Skill Dependency Tracking

Conflict Resolution

Common Anti-Patterns

Practical Guidelines

Full vs Quick Mode

Decision Points

TIER 3: INTEGRATION SECTIONS

Cross-Skill Coordination

Upstream Skills (provide input)

Downstream Skills (use output)

Parallel Skills (work together)

MCP Requirements

Input/Output Contracts

Recursive Improvement

TIER 4: CLOSURE SECTIONS

Examples

Example 1: Manual Reflection After Debugging Session

Commit Message

========================================== SESSION REFLECTION (Automatic)

Signals for code-review

Medium Confidence [conf:0.75]

Low Confidence [conf:0.55]