| name | cm |
| description | CASS Memory System - procedural memory for AI coding agents. Three-layer cognitive architecture transforms scattered sessions into persistent, cross-agent learnings with confidence decay, anti-pattern learning, and scientific validation. |
CM - CASS Memory System
Procedural memory for AI coding agents. Transforms scattered agent sessions into persistent, cross-agent memory so every agent learns from every other agent's experience.
Critical Concepts for AI Agents
The Three-Layer Cognitive Architecture
| Layer | Role | Storage | Tool |
|---|---|---|---|
| Episodic Memory | Raw session transcripts | ~/.local/share/cass/ |
cass |
| Working Memory | Session summaries | Diary entries | cm reflect |
| Procedural Memory | Distilled action rules | Playbook | cm |
Flow: Sessions → Diary summaries → Playbook rules
Why This Matters
Without cm, each agent session starts from zero. With cm:
- Rules that helped in past sessions get reinforced
- Anti-patterns that caused failures become explicit warnings
- Cross-agent learning means Agent B benefits from Agent A's mistakes
- Confidence decay naturally retires stale guidance
Quick Reference for AI Agents
Start of Session
# Get relevant rules and history for your task
cm context "implementing OAuth authentication" --json
# Output includes:
# - Relevant playbook rules with scores
# - Related diary entries
# - Gap analysis (uncovered areas)
During Work
# Find rules about a topic
cm similar "error handling" --json
# Check if a pattern is validated
cm validate "Always use prepared statements for SQL"
End of Session
# Record which rules helped
cm outcome success "RULE-123,RULE-456"
# Record which rules caused problems
cm outcome failure "RULE-789"
# Apply recorded outcomes
cm outcome-apply
Periodic Maintenance
# Extract new rules from recent sessions
cm reflect
# Find stale rules needing re-validation
cm stale --days 30
# System health
cm doctor
The ACE Pipeline
CM uses a four-stage pipeline to extract and curate rules:
Sessions → Generator → Reflector → Validator → Curator → Playbook
↓ ↓ ↓ ↓
Diary Candidates Evidence Final Rules
Entries (LLM) (LLM) (NO LLM!)
Stage Details
| Stage | Uses LLM | Purpose |
|---|---|---|
| Generator | Yes | Summarize sessions into diary entries |
| Reflector | Yes | Propose candidate rules from patterns |
| Validator | Yes | Check rules against historical evidence |
| Curator | NO | Deterministic merge into playbook |
CRITICAL: The Curator is intentionally LLM-free to prevent hallucinated provenance. All rule additions must trace to actual session evidence.
Confidence & Decay System
The Scoring Algorithm
Every rule has a confidence score that decays over time:
score = base_confidence × decay_factor × feedback_modifier
Where:
base_confidence: Initial confidence (0.0-1.0)decay_factor:0.5^(days_since_feedback / 90)(90-day half-life)feedback_modifier: Accumulated helpful/harmful signals
Decay Visualization
Day 0: ████████████████████ 1.00
Day 45: ██████████████ 0.71
Day 90: ██████████ 0.50 ← Half-life
Day 180: █████ 0.25
Day 270: ██ 0.125
Feedback Multipliers
| Feedback | Multiplier | Effect |
|---|---|---|
| Helpful | 1.0x | Standard positive reinforcement |
| Harmful | 4.0x | Aggressive penalty (asymmetric by design) |
Why asymmetric? Bad advice is more damaging than good advice is helpful. A harmful rule should decay 4x faster than a helpful one recovers.
Anti-Pattern Learning
When a rule accumulates too much harmful feedback, it doesn't just disappear—it inverts:
Original: "Always use global state for configuration"
↓ (harmful feedback accumulates)
Inverted: "⚠️ ANTI-PATTERN: Avoid global state for configuration"
The inverted anti-pattern becomes a warning that prevents future agents from making the same mistake.
Command Reference
cm context — Get Task-Relevant Memory
The primary command for starting any task.
# Basic context retrieval
cm context "implementing user authentication"
# JSON output for programmatic use
cm context "database migration" --json
# Deeper historical context
cm context "API refactoring" --depth deep
# Include gap analysis
cm context "payment processing" --gaps
Output includes:
- Top relevant playbook rules (ranked by score × relevance)
- Related diary entries from past sessions
- Gap analysis (categories with thin coverage)
- Suggested starter rules for uncovered areas
cm top — Highest-Scoring Rules
# Top 10 rules by confidence score
cm top 10
# JSON output
cm top 20 --json
# Filter by category
cm top 10 --category testing
cm similar — Find Related Rules
# Semantic search over playbook
cm similar "error handling patterns"
# With scores
cm similar "authentication flow" --scores
# JSON output
cm similar "database queries" --json
cm playbook — Manage the Playbook
# List all rules
cm playbook list
# Statistics
cm playbook stats
# Export for documentation
cm playbook export --format md > PLAYBOOK.md
cm playbook export --format json > playbook.json
# Import rules
cm playbook import rules.json
cm why — Rule Provenance
# Show evidence chain for a rule
cm why RULE-123
# Output shows:
# - Original session(s) that generated the rule
# - Diary entries that led to extraction
# - Feedback history
# - Confidence trajectory
cm mark — Provide Feedback
# Mark as helpful (reinforces rule)
cm mark RULE-123 --helpful
# Mark as harmful (penalizes rule, may trigger inversion)
cm mark RULE-123 --harmful
# With context
cm mark RULE-123 --helpful --reason "Prevented auth vulnerability"
# Undo feedback
cm undo RULE-123
cm reflect — Extract Rules from Sessions
# Process recent sessions
cm reflect
# Specific time range
cm reflect --since "7d"
cm reflect --since "2024-01-01"
# Dry run (show what would be extracted)
cm reflect --dry-run
# Force re-processing of already-reflected sessions
cm reflect --force
cm audit — Check Sessions Against Rules
# Audit recent sessions for rule violations
cm audit
# Specific time range
cm audit --since "24h"
# JSON output
cm audit --json
cm validate — Test a Proposed Rule
# Check if a rule has historical support
cm validate "Always use transactions for multi-table updates"
# Output shows:
# - Supporting evidence (sessions where this helped)
# - Contradicting evidence (sessions where this hurt)
# - Recommendation (add/skip/needs-more-data)
cm outcome — Record Session Results
# Record which rules helped
cm outcome success "RULE-123,RULE-456,RULE-789"
# Record which rules hurt
cm outcome failure "RULE-999"
# Apply all pending outcomes
cm outcome-apply
# Clear pending outcomes without applying
cm outcome-clear
cm stale — Find Stale Rules
# Find rules without recent feedback
cm stale
# Custom threshold
cm stale --days 60
# JSON output
cm stale --json
# Include decay projection
cm stale --project
cm forget — Deprecate Rules
# Soft-delete a rule
cm forget RULE-123
# With reason
cm forget RULE-123 --reason "No longer relevant after framework change"
# Force (skip confirmation)
cm forget RULE-123 --force
cm doctor — System Health
# Run diagnostics
cm doctor
# Auto-fix issues
cm doctor --fix
# JSON output
cm doctor --json
Checks:
- cass installation and accessibility
- Playbook integrity
- Diary consistency
- Configuration validity
- Session index freshness
cm usage — Usage Statistics
# Show usage stats
cm usage
# JSON output
cm usage --json
cm stats — Playbook Health Metrics
# Show playbook health
cm stats
# Output includes:
# - Total rules
# - Average confidence
# - Category distribution
# - Stale rule count
# - Anti-pattern count
Agent-Native Onboarding
CM includes a guided onboarding system that requires zero API calls:
cm onboard
The onboarding wizard:
- Explains the three-layer architecture
- Walks through basic commands
- Seeds initial rules from session history
- Sets up appropriate starter playbook
- Configures privacy preferences
No LLM required — onboarding works offline.
Starter Playbooks
Pre-built playbooks for common tech stacks:
# List available starters
cm starters
# Initialize with a starter
cm init --starter typescript
cm init --starter python
cm init --starter go
cm init --starter rust
Available starters:
typescript- TS/JS patterns, npm, testingpython- Python idioms, pip, pytestgo- Go conventions, modules, testingrust- Rust patterns, cargo, clippygeneral- Language-agnostic best practices
Gap Analysis
CM tracks which categories have thin coverage:
cm context "some task" --gaps
Gap analysis shows:
- Categories with few rules
- Categories with low-confidence rules
- Suggested areas for rule extraction
This helps agents identify blind spots in the collective memory.
Batch Rule Addition
For bulk importing rules:
# From JSON
cm playbook import rules.json
# From markdown
cm playbook import rules.md
# With validation
cm playbook import rules.json --validate
JSON format:
[
{
"content": "Always validate user input at API boundaries",
"category": "security",
"confidence": 0.8,
"source": "manual"
}
]
Data Models
PlaybookBullet
{
"id": "RULE-abc123",
"content": "Use parameterized queries for all database access",
"category": "security",
"confidence": 0.85,
"created_at": "2024-01-15T10:30:00Z",
"last_feedback": "2024-03-20T14:22:00Z",
"helpful_count": 12,
"harmful_count": 1,
"source_sessions": ["session-xyz", "session-abc"],
"is_anti_pattern": false,
"maturity": "validated"
}
FeedbackEvent
{
"rule_id": "RULE-abc123",
"type": "helpful",
"reason": "Prevented SQL injection in auth flow",
"session_id": "session-current",
"timestamp": "2024-03-20T14:22:00Z"
}
DiaryEntry
{
"id": "diary-xyz789",
"session_id": "session-abc",
"summary": "Implemented OAuth2 flow with PKCE",
"patterns_observed": ["token-refresh", "secure-storage"],
"issues_encountered": ["redirect-uri-mismatch"],
"candidate_rules": ["Always use state parameter in OAuth"],
"created_at": "2024-03-19T16:45:00Z"
}
SessionOutcome
{
"session_id": "session-current",
"helpful_rules": ["RULE-123", "RULE-456"],
"harmful_rules": ["RULE-789"],
"recorded_at": "2024-03-20T17:00:00Z",
"applied": false
}
Rule Maturity States
Rules progress through maturity stages:
proposed → validated → mature → stale → deprecated
↓ ↓ ↓ ↓
Needs Evidence Proven Needs Soft-
evidence confirmed helpful refresh deleted
| State | Meaning | Action |
|---|---|---|
proposed |
Newly extracted, unvalidated | Await evidence |
validated |
Has supporting evidence | Monitor feedback |
mature |
Consistently helpful over time | Trust highly |
stale |
No recent feedback (>90 days) | Seek re-validation |
deprecated |
Marked for removal | Will be purged |
MCP Server
Run cm as an MCP server for direct agent integration:
# Start server
cm serve
# Custom port
cm serve --port 9000
# With logging
cm serve --verbose
MCP Tools
| Tool | Description |
|---|---|
get_context |
Retrieve task-relevant rules and history |
search_rules |
Semantic search over playbook |
record_feedback |
Mark rules as helpful/harmful |
record_outcome |
Record session outcome with rule attribution |
get_stats |
Get playbook health metrics |
validate_rule |
Check proposed rule against evidence |
MCP Resources
| Resource | Description |
|---|---|
playbook://rules |
Full playbook as JSON |
playbook://top/{n} |
Top N rules by score |
playbook://stale |
Rules needing re-validation |
diary://recent/{n} |
Recent diary entries |
stats://health |
Playbook health metrics |
Configuration
Directory Structure
~/.config/cm/
├── config.toml # Main configuration
├── playbook.json # Rule storage
└── diary/ # Session summaries
└── *.json
.cm/ # Project-local config
├── config.toml # Project overrides
└── playbook.json # Project-specific rules
Config File Reference
# ~/.config/cm/config.toml
[general]
# LLM model for Generator/Reflector/Validator
model = "claude-sonnet-4-20250514"
# Auto-apply outcomes after session
auto_apply_outcomes = false
# Check for updates
check_updates = true
[decay]
# Half-life in days
half_life_days = 90
# Harmful feedback multiplier
harmful_multiplier = 4.0
# Minimum score before deprecation
min_score = 0.1
[reflection]
# Minimum sessions before reflecting
min_sessions = 3
# Auto-reflect on session end
auto_reflect = false
[privacy]
# Enable cross-agent learning
cross_agent_enrichment = true
# Anonymize session data in rules
anonymize_sources = false
[mcp]
# MCP server port
port = 8080
# Enable MCP server
enabled = false
Environment Variables
| Variable | Description | Default |
|---|---|---|
CM_CONFIG_DIR |
Configuration directory | ~/.config/cm |
CM_DATA_DIR |
Data storage directory | ~/.local/share/cm |
CM_MODEL |
LLM model for ACE pipeline | claude-sonnet-4-20250514 |
CM_HALF_LIFE |
Decay half-life in days | 90 |
CM_MCP_PORT |
MCP server port | 8080 |
Privacy Controls
# View current privacy settings
cm privacy
# Disable cross-agent learning
cm privacy --disable-enrichment
# Enable cross-agent learning
cm privacy --enable-enrichment
# Anonymize sources in exported playbooks
cm privacy --anonymize-export
What Cross-Agent Enrichment Means
When enabled:
- Rules extracted from Agent A's sessions can help Agent B
- Diary entries reference sessions across agents
- Collective learning improves all agents
When disabled:
- Each agent's playbook is isolated
- No session data shared between identities
- Rules only come from your own sessions
Project Integration
Export playbook for project documentation:
# Generate project patterns doc
cm project --output docs/PATTERNS.md
# Include confidence scores
cm project --output docs/PATTERNS.md --scores
# Only mature rules
cm project --output docs/PATTERNS.md --mature-only
This creates a human-readable document of learned patterns for team reference.
Graceful Degradation
CM degrades gracefully when components are unavailable:
| Scenario | Behavior |
|---|---|
| No cass index | Works from diary only |
| No LLM access | Curator still works, reflection paused |
| Stale sessions | Uses cached diary entries |
| Empty playbook | Returns starter suggestions |
Performance Characteristics
| Operation | Typical Time | Notes |
|---|---|---|
cm context |
50-200ms | Depends on playbook size |
cm similar |
100-300ms | Semantic search overhead |
cm reflect |
2-10s | LLM calls for Generator/Reflector |
cm validate |
1-3s | LLM call for Validator |
cm mark |
<50ms | Pure database operation |
Integration with CASS
CM builds on top of cass (Coding Agent Session Search):
# cass provides raw session search
cass search "authentication" --robot
# cm transforms that into procedural memory
cm context "authentication"
The typical workflow:
- Work happens in agent sessions (stored by various tools)
cassindexes and searches those sessionscm reflectextracts patterns into diary/playbookcm contextretrieves relevant knowledge for new tasks
Exit Codes
| Code | Meaning |
|---|---|
0 |
Success |
1 |
General error |
2 |
Configuration error |
3 |
Validation failed |
4 |
LLM error (reflection/validation) |
5 |
No data (empty results) |
Troubleshooting
Common Issues
| Problem | Solution |
|---|---|
| "No sessions found" | Run cass reindex to rebuild session index |
| "Reflection failed" | Check LLM API key and model availability |
| "Stale playbook" | Run cm reflect to process recent sessions |
| "Low confidence everywhere" | Natural decay; use cm mark --helpful to reinforce |
Debug Mode
# Verbose output
cm context "task" --verbose
# Show scoring details
cm top 10 --debug
# Trace decay calculations
cm stats --trace-decay
Ready-to-Paste AGENTS.md Blurb
## cm - CASS Memory System
Procedural memory for AI coding agents. Transforms scattered sessions into
persistent cross-agent learnings with confidence decay and anti-pattern detection.
### Quick Start
cm context "your task" --json # Get relevant rules
cm mark RULE-ID --helpful # Reinforce good rules
cm outcome success "RULE-1,RULE-2" # Record session results
cm reflect # Extract new rules
### Three Layers
- Episodic: Raw sessions (via cass)
- Working: Diary summaries
- Procedural: Playbook rules (this tool)
### Key Features
- 90-day confidence half-life (stale rules decay)
- 4x penalty for harmful rules (asymmetric by design)
- Anti-pattern auto-inversion (bad rules become warnings)
- Cross-agent learning (everyone benefits)
- LLM-free Curator (no hallucinated provenance)
### Essential Commands
cm context "task" --json # Start of session
cm similar "pattern" # Find related rules
cm mark ID --helpful/harmful # Give feedback
cm outcome success "IDs" # End of session
cm reflect # Periodic maintenance
Exit codes: 0=success, 1=error, 2=config, 3=validation, 4=LLM, 5=no-data
Workflow Example: Complete Session
# 1. Start task - get relevant context
cm context "implementing rate limiting for API" --json
# → Returns rules about rate limiting, caching, API design
# 2. Note which rules you're applying
# (mental note: RULE-123 about token buckets, RULE-456 about Redis)
# 3. During work, if you find a useful pattern
cm validate "Use sliding window for rate limit precision"
# → Shows if this has historical support
# 4. End of session - record what helped
cm outcome success "RULE-123,RULE-456"
# 5. If something hurt (caused bugs/issues)
cm outcome failure "RULE-789"
# 6. Apply the feedback
cm outcome-apply
# 7. Periodically, extract new rules
cm reflect --since "7d"
Philosophy: Why Procedural Memory?
Episodic memory (raw sessions) is too noisy for real-time use. Working memory (summaries) lacks actionability. Procedural memory distills both into executable rules that directly guide behavior.
The key insight: Rules should be testable hypotheses, not static commandments. The confidence decay system treats every rule as a hypothesis that requires ongoing validation. Rules that stop being useful naturally fade; rules that keep helping get reinforced.
This mirrors how human expertise works: you don't remember every project you've done, but you retain the patterns that proved useful across many projects.