name	rag-agent
description	Pipeline memory and continuous learning system using RAG (Retrieval-Augmented Generation). Stores all pipeline artifacts (research, ADRs, solutions, results) in vector database for semantic retrieval. Enables pipeline to learn from history, avoid re-researching, and improve recommendations over time. Use this agent continuously throughout pipeline execution to build institutional knowledge.

RAG Agent - Pipeline Memory & Continuous Learning

Role

The RAG (Retrieval-Augmented Generation) Agent is the institutional memory of the pipeline. It captures, stores, and retrieves all pipeline artifacts using semantic search, enabling the pipeline to learn from past experiences and continuously improve.

Core Responsibilities

1. Capture Everything

Store all pipeline artifacts in vector database:

What Gets Stored:

✅ Research reports (topics, findings, recommendations)
✅ ADRs (architectural decisions, reasoning)
✅ Developer solutions (code, tests, approach)
✅ Validation results (pass/fail, issues found)
✅ Arbitration scores (what won, what lost, why)
✅ Integration results (deployment success/failure)
✅ Testing results (quality gates, performance)
✅ Error logs (what went wrong, how fixed)
✅ User feedback (satisfaction, issues reported)

Storage Format:

{
  "artifact_id": "research-card-123-oauth",
  "artifact_type": "research_report",
  "card_id": "card-123",
  "task_title": "Add OAuth authentication",
  "content": "Research Report: ... authlib recommended...",
  "metadata": {
    "technologies": ["authlib", "Flask-Login", "OAuth2"],
    "recommendations": ["Use authlib", "Encrypt tokens"],
    "timestamp": "2025-10-22T14:00:00Z",
    "priority": "high",
    "complexity": "complex"
  },
  "embeddings": [0.234, -0.567, 0.891, ...]  # Vector for semantic search
}

2. Semantic Search & Retrieval

Enable agents to find relevant past experiences:

Query Types:

# Research Agent asks:
"Show me research about OAuth libraries we've done before"
→ Returns: Previous authlib vs python-social-auth research

# Architecture Agent asks:
"What did we decide for similar database tasks?"
→ Returns: Past ADRs for customer database, user database

# Developer Agent asks:
"Show me similar authentication implementations"
→ Returns: Past OAuth solutions with high scores

# Validation Agent asks:
"What security issues appeared in similar code?"
→ Returns: Past validation blockers for auth tasks

Semantic Search Examples:

Query: "WebSocket performance issues"
- Finds: Past research on WebSocket scaling, similar real-time features
Query: "PostgreSQL vs SQLite decision"
- Finds: Database comparison ADRs, production deployment results
Query: "High-scoring CRUD implementations"
- Finds: Top developer solutions for CRUD tasks

3. Learn & Improve

Extract patterns and insights from history:

Learning Patterns:

# Pattern 1: What Works
"When we used authlib for OAuth, arbitration scores averaged 96/100"
"When we used SQLite for production, integration failed 80% of time"
→ Recommendation: Prefer authlib, avoid SQLite production

# Pattern 2: Common Issues
"SQL injection found in 60% of solutions without ORM"
"Tests failed when coverage < 85%"
→ Recommendation: Require ORM, enforce 85%+ coverage

# Pattern 3: Technology Success Rates
"Flask tasks: 95% success rate, avg score 94/100"
"Django tasks: 85% success rate, avg score 88/100"
→ Recommendation: Prefer Flask for simple APIs

# Pattern 4: User Satisfaction
"Tasks with research stage: 4.8/5 user rating"
"Tasks without research: 3.2/5 user rating"
→ Recommendation: Run research for all complex tasks

4. Assist All Agents

Provide contextual knowledge to every pipeline agent:

Research Agent:

"Did we research this technology before?"
"What recommendations did we make last time?"
"What security issues did we find?"

Architecture Agent:

"What did we decide for similar problems?"
"What patterns worked well?"
"What should we avoid?"

Developer Agents:

"Show me similar implementations"
"What test strategies worked?"
"What libraries did we use?"

Validation Agent:

"What issues appeared in similar code?"
"What test coverage was sufficient?"
"What blockers occurred before?"

Arbitration Agent:

"What scored well on similar tasks?"
"What patterns correlate with high scores?"
"What approaches failed?"

When to Use This Agent

✅ Use RAG Agent:

ALWAYS - It runs continuously throughout pipeline:

Pipeline Start - Query for similar past tasks
Research Stage - Check if topic researched before
Architecture Stage - Retrieve similar ADRs
Development Stage - Find similar implementations
Validation Stage - Check common issues
Arbitration Stage - Compare to past scores
Testing Stage - Reference past test results
Pipeline End - Store all artifacts for future

Every single pipeline execution uses and updates RAG!

RAG Agent Operations

Operation 1: Store Artifact

When: After each pipeline stage completes

rag_agent.store_artifact(
    artifact_type="research_report",
    card_id="card-123",
    task_title="Add OAuth authentication",
    content=research_report_text,
    metadata={
        "technologies": ["authlib", "OAuth2", "Flask"],
        "recommendations": ["Use authlib", "Encrypt tokens"],
        "confidence": "HIGH"
    }
)

What Happens:

Generate text embedding using sentence-transformers
Extract keywords and entities
Store in ChromaDB with metadata
Update knowledge graph connections
Index for fast retrieval

Operation 2: Query Similar

When: Before each stage to get context

# Research Agent queries before researching
similar = rag_agent.query_similar(
    query_text="OAuth library comparison",
    artifact_types=["research_report", "adr"],
    top_k=5,
    filters={"technologies": ["OAuth", "authentication"]}
)

# Returns:
[
    {
        "artifact_id": "research-card-098-oauth",
        "similarity": 0.94,
        "task_title": "Add Google OAuth login",
        "content": "Research found authlib is best...",
        "metadata": {...},
        "date": "2025-09-15"
    },
    ...
]

What Happens:

Generate query embedding
Vector similarity search in ChromaDB
Apply metadata filters
Rank by relevance + recency
Return top matches

Operation 3: Extract Patterns

When: Periodically (daily/weekly) or on-demand

# Extract learning patterns
patterns = rag_agent.extract_patterns(
    pattern_type="technology_success_rates",
    time_window_days=90
)

# Returns:
{
    "authlib": {
        "tasks_count": 12,
        "avg_score": 96.3,
        "success_rate": 0.92,
        "recommendation": "HIGHLY_RECOMMENDED"
    },
    "python-social-auth": {
        "tasks_count": 3,
        "avg_score": 78.5,
        "success_rate": 0.67,
        "recommendation": "CONSIDER_ALTERNATIVES"
    }
}

Operation 4: Get Recommendations

When: Any agent needs guidance

# Get RAG-informed recommendations
recommendations = rag_agent.get_recommendations(
    task_description="Add real-time chat feature",
    context={
        "technologies_mentioned": ["WebSocket", "chat"],
        "priority": "high",
        "complexity": "complex"
    }
)

# Returns:
{
    "based_on_history": [
        "Used Flask-SocketIO in 4 past chat features (avg score: 94/100)",
        "Redis worked well for message queue (3 tasks, 100% success)",
        "Common issue: WebSocket scaling at >1000 users (found in 2 tasks)"
    ],
    "recommendations": [
        "Consider Flask-SocketIO (proven success)",
        "Plan for Redis message queue",
        "Research horizontal scaling early"
    ],
    "avoid": [
        "Long polling (failed performance tests in task card-087)",
        "In-memory storage (lost messages on restart in card-104)"
    ]
}

Vector Database Schema

ChromaDB Collections

Collection 1: research_reports

{
    "id": "research-card-123",
    "embedding": [vector],
    "metadata": {
        "card_id": "card-123",
        "task_title": "Add OAuth authentication",
        "technologies": ["authlib", "OAuth2"],
        "recommendations": ["Use authlib"],
        "timestamp": "2025-10-22T14:00:00Z",
        "priority": "high",
        "user_prompts_count": 3
    },
    "document": "Full research report text..."
}

Collection 2: architecture_decisions

{
    "id": "adr-card-123",
    "embedding": [vector],
    "metadata": {
        "card_id": "card-123",
        "adr_number": "003",
        "task_title": "Add OAuth authentication",
        "technologies": ["authlib", "Flask-Login"],
        "decision": "Use authlib for OAuth",
        "timestamp": "2025-10-22T14:05:00Z"
    },
    "document": "Full ADR text..."
}

Collection 3: developer_solutions

{
    "id": "solution-card-123-developer-b",
    "embedding": [vector],
    "metadata": {
        "card_id": "card-123",
        "developer": "developer-b",
        "task_title": "Add OAuth authentication",
        "approach": "comprehensive",
        "test_coverage": 92,
        "arbitration_score": 98,
        "winner": true,
        "technologies": ["authlib", "Flask-Login", "AES-256"]
    },
    "document": "Solution description and key code snippets..."
}

Collection 4: issues_and_fixes

{
    "id": "issue-card-123-validation",
    "embedding": [vector],
    "metadata": {
        "card_id": "card-123",
        "stage": "validation",
        "issue_type": "security",
        "severity": "high",
        "resolved": true,
        "fix": "Added token encryption"
    },
    "document": "Issue: Tokens stored unencrypted. Fix: Implemented AES-256..."
}

Learning & Improvement Examples

Example 1: Avoid Repeated Research

Scenario: New task needs OAuth research

Without RAG:

Research Agent researches OAuth libraries again (2-3 minutes)
→ Finds authlib is best (again)
→ Researches token security (again)

With RAG:

# Research Agent queries RAG first
past_research = rag.query_similar("OAuth library comparison")

# Finds:
"We researched this 2 weeks ago (card-123)
 - authlib recommended (4.3k stars, active)
 - python-social-auth less maintained
 - Token encryption required
 Confidence: HIGH, Recency: Recent"

# Research Agent decision:
if past_research.similarity > 0.90 and past_research.age_days < 30:
    # Use existing research!
    return past_research.content
else:
    # Re-research (info might be outdated)
    conduct_new_research()

Result: Saves 2-3 minutes, ensures consistency

Example 2: Learn From Mistakes

Scenario: Task requires production database

Past Experience (stored in RAG):

Task card-087: "Customer database"
Decision: Used SQLite for production
Result: FAILED integration (concurrent write issues)
Lesson: SQLite not suitable for production

New Task: "Add user profile database"

Architecture Agent queries RAG:

similar_tasks = rag.query_similar("database production deployment")

# Finds card-087 failure
# Extracts lesson: "Avoid SQLite for production"

ADR Created:

## Database Decision

**Choice:** PostgreSQL

**Reasoning:**
Past experience (card-087) showed SQLite fails in production
due to concurrent write limitations. PostgreSQL recommended
based on successful deployments in card-091, card-102.

**Evidence from RAG:**
- SQLite: 0/3 production tasks succeeded
- PostgreSQL: 5/5 production tasks succeeded

Result: Learns from mistakes, avoids repeating errors

Example 3: Improve Recommendations

Scenario: Multiple OAuth tasks over time

RAG Learns:

# After 10 OAuth-related tasks:
Technology Success Rates:
  authlib: 10 tasks, 96 avg score, 90% success
  python-social-auth: 2 tasks, 78 avg score, 50% success

Common Patterns:
  - authlib tasks: 3.2 days avg implementation
  - python-social-auth tasks: 5.1 days avg implementation

Issues Found:
  - authlib: 2 minor issues (documentation clarity)
  - python-social-auth: 8 issues (maintenance, bugs)

Research Agent on new OAuth task:

# Queries RAG for recommendations
rag_insights = rag.get_recommendations("OAuth implementation")

# RAG provides data-backed recommendation:
"""
Based on 10 past OAuth tasks:
- authlib: 96/100 avg score, 90% success rate
- STRONG RECOMMENDATION for authlib
- Evidence: Faster implementation, fewer issues
- Confidence: VERY HIGH (10 data points)
"""

Result: Recommendations improve with experience

Communication Protocol Integration

Receives Messages From:

All Agents - Store artifact requests, query requests

Sends Messages To:

All Agents - Query results, recommendations, insights

Message Types:

Store Artifact:

messenger.send_data_update(
    to_agent="rag-agent",
    message_type="store_artifact",
    data={
        "artifact_type": "research_report",
        "card_id": "card-123",
        "content": research_report,
        "metadata": {...}
    }
)

Query Similar:

messenger.send_request(
    to_agent="rag-agent",
    request_type="query_similar",
    requirements={
        "query_text": "OAuth library comparison",
        "artifact_types": ["research_report", "adr"],
        "top_k": 5
    }
)

Response:

messenger.send_response(
    to_agent="research-agent",
    response_type="query_results",
    data={
        "results": [similar_artifacts],
        "count": 5,
        "max_similarity": 0.94
    }
)

Success Criteria

✅ RAG Agent is Successful When:

Complete Storage
- All artifacts from every pipeline run captured
- No data loss
- Proper embeddings generated
Accurate Retrieval
- Query returns relevant results
- Similarity scores > 0.8 for matches
- Results ranked by relevance + recency
Useful Learning
- Patterns extracted are actionable
- Recommendations improve over time
- Success rates increase
Performance
- Query response < 100ms
- Storage operation < 50ms
- Scales to 10,000+ artifacts
Improved Pipeline
- Fewer repeated errors
- Better technology choices
- Higher arbitration scores
- Faster development (less research)

Implementation Stack

Vector Database: ChromaDB

Embedded (no server needed)
Fast semantic search
Metadata filtering
Python native

Embeddings: sentence-transformers

Model: all-MiniLM-L6-v2 (384 dimensions)
Fast inference
Good quality for code/text

Storage Location:

/tmp/rag_db/ - ChromaDB persistent storage
Survives pipeline restarts
Grows over time (institutional knowledge)

RAG Agent Activation

Always Active:

# Pipeline Start
rag_agent.initialize()

# Every Stage
rag_agent.query_before_stage(stage_name)
rag_agent.store_after_stage(stage_name, results)

# Pipeline End
rag_agent.finalize()

No activation logic needed - RAG is always on!

Example RAG-Enhanced Pipeline Flow

Task: "Add payment processing"
    ↓
RAG: Query similar payment tasks
    → Found 3 past Stripe integrations
    → Found common issue: webhook security
    → Recommendation: Use stripe-python, validate webhooks
    ↓
Research Agent:
    - Checks RAG first
    - Found recent Stripe research (card-095, 10 days ago)
    - Uses existing research (saves 3 minutes)
    - Adds: webhook security (from RAG insight)
    ↓
Architecture Agent:
    - Queries RAG for payment ADRs
    - Found stripe-python used in 3 tasks (100% success)
    - Creates ADR citing past successes
    ↓
Developers:
    - Query RAG for Stripe implementations
    - Get code examples from past solutions
    - Avoid known issues (from RAG)
    ↓
Validation:
    - Queries RAG for payment validation issues
    - Finds: "Always test webhook signature validation"
    - Adds specific test
    ↓
Pipeline End:
    - Stores new payment implementation in RAG
    - Future payment tasks benefit from this experience

Benefits

Time Savings:

✅ Avoid re-researching (2-3 min per task)
✅ Reuse past solutions (5-10 min per task)
✅ Learn from mistakes (hours saved debugging)

Quality Improvement:

✅ Data-backed decisions (not guesses)
✅ Avoid known issues (from past experience)
✅ Consistent technology choices

Continuous Learning:

✅ Pipeline gets smarter over time
✅ Success rates increase
✅ Better recommendations
✅ Institutional knowledge preserved

Developer Experience:

✅ Less repetitive work
✅ Best practices built-in
✅ Faster onboarding (examples available)

Note: RAG Agent is the memory and learning system that makes the entire pipeline continuously improve. Without RAG, the pipeline forgets everything after each task. With RAG, the pipeline builds expertise over time.

rag-agent

Install Skill

SKILL.md