name	research-reproduction
description	Reproduce research papers into working code. Use when user wants to implement ML/AI papers, reproduce experiments, extract algorithms from PDFs, or convert research into executable code. Handles multiple interconnected papers with multi-agent extraction, equation verification, and benchmark validation.

Research Paper Reproduction Skill

Transform research papers into production-ready, verified code with multi-agent orchestration, equation-first verification, and benchmark validation.

When to Use

User provides research paper(s) (PDF, arXiv URL, or paper content)
User wants to reproduce experiments from papers
User needs to implement algorithms described in academic literature
User wants verified, tested code matching paper specifications
Multiple interconnected papers need coordinated implementation

Quick Start

User: "Reproduce this paper" + [attaches PDF or provides arXiv URL]

You: 
1. Clarify intent (validate/implement/extend)
2. Spawn extraction agent(s) per paper
3. Create context documents
4. Implement with equation verification
5. Format, test, document
6. Init git, ready for execution

Core Workflow

┌─────────────────────────────────────────────────────────────────┐
│                    RESEARCH REPRODUCTION PIPELINE                │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Phase 0: INTENT CLARIFICATION                                  │
│  ├─ What: Validate understanding / Implement / Extend / Benchmark│
│  ├─ Papers: Single / Multiple related / Paper family            │
│  ├─ Scope: Full paper / Specific algorithm / Core contribution  │
│  └─ Resources: Local / Colab / GPU requirements                 │
│                                                                  │
│  Phase 1: PARALLEL EXTRACTION (Multi-Agent)                     │
│  ├─ Spawn one extraction agent per paper                        │
│  ├─ Each agent creates context document (NOT full paper)        │
│  ├─ Extract: equations, algorithms, architecture, hyperparams   │
│  └─ Output: .context.md files in project root                   │
│                                                                  │
│  Phase 2: CONTEXT SYNTHESIS                                     │
│  ├─ Orchestrator reads all .context.md files                    │
│  ├─ Build dependency graph (which paper depends on which)       │
│  ├─ Create unified implementation plan                          │
│  └─ Output: IMPLEMENTATION_PLAN.md                              │
│                                                                  │
│  Phase 3: EQUATION-FIRST IMPLEMENTATION                         │
│  ├─ For EACH equation: write test FIRST                         │
│  ├─ Implement module to pass test                               │
│  ├─ Verify shapes, gradients, numerical stability               │
│  └─ Output: src/ with tests/ alongside                          │
│                                                                  │
│  Phase 4: CODE QUALITY                                          │
│  ├─ Format with ruff                                            │
│  ├─ Type check with ty (Astral)                                 │
│  ├─ Verify all tests pass                                       │
│  └─ Output: Clean, typed, formatted code                        │
│                                                                  │
│  Phase 5: DOCUMENTATION                                         │
│  ├─ Generate README.md with usage instructions                  │
│  ├─ Create ARCHITECTURE.md with diagrams                        │
│  ├─ Document each module's paper reference                      │
│  └─ Output: docs/ folder + root README                          │
│                                                                  │
│  Phase 6: GIT & EXECUTION PREP                                  │
│  ├─ git init, .gitignore, initial commit                        │
│  ├─ Prepare for gh repo create                                  │
│  ├─ Create run scripts (notebook + CLI)                         │
│  └─ Output: Ready for git push and execution                    │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Phase 0: Intent Clarification

ALWAYS START HERE. Ask the user:

## Research Reproduction - Let's clarify your goals

**1. What's your intent?**
- [ ] Validate understanding - Quick implementation to verify I understand the paper
- [ ] Full reproduction - Complete implementation matching paper's experiments  
- [ ] Specific component - Implement only [specific algorithm/module]
- [ ] Extend/modify - Build on paper with custom changes
- [ ] Benchmark only - Run existing code, verify results

**2. Paper scope:**
- [ ] Single paper
- [ ] Multiple related papers (specify dependencies)
- [ ] Paper family (e.g., TITANS → MIRAS → Hope)

**3. Execution environment:**
- [ ] Local CPU only
- [ ] Local GPU (specify VRAM)
- [ ] Google Colab (Free/Pro - specify units budget)
- [ ] Cloud GPU (specify provider)

**4. Output preferences:**
- [ ] Jupyter notebook for experimentation
- [ ] CLI scripts for training/inference
- [ ] Both notebook + scripts

Phase 1: Parallel Extraction

For EACH paper, spawn an extraction subagent:

# Subagent task (do NOT load full paper into orchestrator context)
Task: Extract from [paper_name]
Output: [paper_name].context.md

The extraction agent will:
1. Convert PDF to markdown (markitdown)
2. Extract ALL equations with LaTeX
3. Extract ALL algorithms (pseudocode)
4. Extract architecture diagrams (describe in text)
5. Extract hyperparameters and training details
6. Extract benchmark datasets and metrics
7. Note dependencies on other papers

Extraction Agent Prompt: See prompts/extraction-agent.md

Context Document Format: See templates/context-document.md

Multi-Paper Dependency Handling

When papers reference each other:

Paper A (Foundation) ──────► Paper B (Extension) ──────► Paper C (Application)
    │                            │                            │
    ▼                            ▼                            ▼
A.context.md                B.context.md                C.context.md
    │                            │                            │
    └────────────────────────────┴────────────────────────────┘
                                 │
                                 ▼
                    IMPLEMENTATION_PLAN.md
                    (ordered by dependency)

Phase 2: Context Synthesis

After all extraction agents complete:

Read all .context.md files
Build dependency graph from cross-references
Create implementation order (topological sort)
Generate IMPLEMENTATION_PLAN.md

Synthesis Template: See templates/implementation-plan.md

Phase 3: Equation-First Implementation

CRITICAL: Write test BEFORE implementation for EVERY equation.

Equation Verification Pattern

# tests/test_equation_3_memory_update.py
"""
Paper: TITANS (arXiv:2501.00663)
Equation 3: M_{t+1} = M_t + η · ∇l(M_t; x_t)
Description: Memory update rule with gradient-based learning
"""
import torch
import pytest

def test_equation_3_shapes():
    """Verify tensor shapes match paper specification."""
    batch_size, memory_size, dim = 4, 64, 256
    M_t = torch.randn(batch_size, memory_size, dim)
    x_t = torch.randn(batch_size, dim)
    eta = 0.01
    
    # Your implementation
    from src.memory import memory_update
    M_t_plus_1 = memory_update(M_t, x_t, eta)
    
    assert M_t_plus_1.shape == M_t.shape, f"Expected {M_t.shape}, got {M_t_plus_1.shape}"

def test_equation_3_gradient_flow():
    """Verify gradients flow correctly through memory update."""
    M_t = torch.randn(4, 64, 256, requires_grad=True)
    x_t = torch.randn(4, 256)
    
    from src.memory import memory_update
    M_t_plus_1 = memory_update(M_t, x_t, eta=0.01)
    loss = M_t_plus_1.sum()
    loss.backward()
    
    assert M_t.grad is not None, "Gradients should flow to M_t"
    assert not torch.isnan(M_t.grad).any(), "Gradients should not be NaN"

def test_equation_3_numerical_stability():
    """Verify numerical stability with extreme values."""
    M_t = torch.randn(4, 64, 256) * 1000  # Large values
    x_t = torch.randn(4, 256) * 1000
    
    from src.memory import memory_update
    M_t_plus_1 = memory_update(M_t, x_t, eta=0.01)
    
    assert not torch.isnan(M_t_plus_1).any(), "Should handle large values"
    assert not torch.isinf(M_t_plus_1).any(), "Should not overflow"

Implementation Pattern

# src/memory.py
"""
Memory module implementing TITANS neural memory.

Paper References:
- Equation 3: memory_update()
- Equation 5: surprise_metric()
- Algorithm 1: MemoryLayer forward pass
"""
import torch
import torch.nn as nn

def memory_update(M_t: torch.Tensor, x_t: torch.Tensor, eta: float) -> torch.Tensor:
    """
    Equation 3: M_{t+1} = M_t + η · ∇l(M_t; x_t)
    
    Args:
        M_t: Current memory state [batch, memory_size, dim]
        x_t: Input token [batch, dim]
        eta: Learning rate
    
    Returns:
        M_t_plus_1: Updated memory [batch, memory_size, dim]
    """
    # Implementation here
    ...

Full verification patterns: See references/equation-patterns.md

Phase 4: Code Quality

Run these checks before considering implementation complete:

# 1. Format with ruff
uv run ruff format src/ tests/

# 2. Lint with ruff  
uv run ruff check src/ tests/ --fix

# 3. Type check with ty (Astral's type checker)
uv run ty check src/

# 4. Run all tests
uv run pytest tests/ -v --tb=short

# 5. Check test coverage
uv run pytest tests/ --cov=src --cov-report=term-missing

Quality Script: See scripts/quality_check.py

Phase 5: Documentation

Generate comprehensive documentation:

README.md Structure

# [Paper Name] Reproduction

> [One-line paper description]

## Quick Start

\`\`\`bash
# Clone and setup
git clone [repo]
cd [repo]
uv sync

# Run example
uv run python -m src.main --config configs/default.yaml
\`\`\`

## Paper Reference

- **Title:** [Full title]
- **Authors:** [Authors]
- **arXiv:** [Link]
- **Original Code:** [If exists]

## Implementation Status

| Component | Paper Section | Status | Tests |
|-----------|--------------|--------|-------|
| Memory Module | §3.1, Eq 3-5 | ✅ | 12/12 |
| Attention Layer | §3.2, Eq 8-10 | ✅ | 8/8 |
| Training Loop | §4.1 | ✅ | 5/5 |

## Architecture

[ASCII diagram or Mermaid]

## Usage

### Training
\`\`\`bash
uv run python -m src.train --config configs/train.yaml
\`\`\`

### Inference
\`\`\`bash
uv run python -m src.inference --checkpoint checkpoints/best.pt
\`\`\`

## Benchmarks

| Dataset | Paper Result | Our Result | Status |
|---------|-------------|------------|--------|
| WikiText-103 | 17.2 PPL | TBD | ⏳ |

## Project Structure

\`\`\`
├── src/
│   ├── __init__.py
│   ├── model.py       # Main model (§3)
│   ├── memory.py      # Memory module (§3.1)
│   └── train.py       # Training loop (§4)
├── tests/
│   ├── test_memory.py
│   └── test_model.py
├── configs/
│   └── default.yaml
├── notebooks/
│   └── exploration.ipynb
└── docs/
    └── ARCHITECTURE.md
\`\`\`

Documentation Template: See templates/readme-template.md

Phase 6: Git & Execution Prep

Initialize Repository

# Initialize git
git init

# Create .gitignore
cat > .gitignore << 'EOF'
# Python
__pycache__/
*.py[cod]
.venv/
*.egg-info/

# UV
.python-version

# IDE
.vscode/
.idea/

# Data & Models
data/
checkpoints/
*.pt
*.pth
wandb/

# Jupyter
.ipynb_checkpoints/
EOF

# Initial commit
git add .
git commit -m "Initial implementation: [Paper Name]

- Core model implementation (§3)
- Equation-verified modules (Eq 3-10)
- Test suite with X% coverage
- Documentation and examples

Paper: [arXiv link]"

Prepare for GitHub

# Create repo (requires gh CLI authenticated)
gh repo create [repo-name] --public --source=. --remote=origin

# Or prepare for manual push
echo "Ready to push:"
echo "  git remote add origin git@github.com:USER/REPO.git"
echo "  git push -u origin main"

Create Run Scripts

CLI Script:

#!/usr/bin/env bash
# run_training.sh
set -e
uv run python -m src.train "$@"

Notebook Entry:

# notebooks/quickstart.ipynb
# Cell 1: Setup
!uv sync
from src.model import Model
from src.train import Trainer

# Cell 2: Quick test
model = Model.from_config("configs/default.yaml")
print(f"Parameters: {sum(p.numel() for p in model.parameters()):,}")

Scripts Reference

This skill includes UV single-file scripts for complex operations:

Script	Purpose	Usage
`scripts/extract_paper.py`	Convert PDF to markdown, extract equations	`uv run scripts/extract_paper.py paper.pdf`
`scripts/quality_check.py`	Run ruff + ty + pytest	`uv run scripts/quality_check.py`
`scripts/verify_equations.py`	Run equation-specific tests	`uv run scripts/verify_equations.py`
`scripts/benchmark_runner.py`	Run benchmarks, compare to paper	`uv run scripts/benchmark_runner.py`
`scripts/generate_docs.py`	Generate README and architecture docs	`uv run scripts/generate_docs.py`

Tools Reference

tools/paper-intake.md - PDF/URL ingestion
tools/equation-extractor.md - LaTeX extraction and mapping
tools/verification-engine.md - Test generation patterns
tools/benchmark-validator.md - Results comparison

Templates Reference

templates/context-document.md - Extraction output format
templates/implementation-plan.md - Synthesis output
templates/readme-template.md - Project README
templates/test-template.md - Equation test structure

Prompts Reference

prompts/extraction-agent.md - Per-paper extraction agent
prompts/implementation-agent.md - Code implementation agent
prompts/verification-agent.md - Test writing agent
prompts/documentation-agent.md - Docs generation agent

Project Structure Output

[paper-name]-reproduction/
├── .gitignore
├── .python-version
├── pyproject.toml
├── uv.lock
├── README.md
├── ARCHITECTURE.md
├── IMPLEMENTATION_PLAN.md
│
├── papers/                    # Original papers (gitignored if large)
│   └── [paper].context.md     # Extracted context (committed)
│
├── src/
│   ├── __init__.py
│   ├── model.py               # Main model
│   ├── layers/                # Individual layers/modules
│   │   ├── __init__.py
│   │   ├── memory.py          # Memory module
│   │   └── attention.py       # Attention variants
│   ├── train.py               # Training loop
│   ├── inference.py           # Inference utilities
│   └── utils/
│       ├── __init__.py
│       └── config.py          # Configuration handling
│
├── tests/
│   ├── __init__.py
│   ├── test_equations/        # Equation-specific tests
│   │   ├── test_eq3_memory.py
│   │   └── test_eq5_surprise.py
│   ├── test_layers/
│   │   └── test_memory.py
│   └── test_integration.py
│
├── configs/
│   ├── default.yaml
│   ├── small.yaml             # Quick testing config
│   └── paper.yaml             # Paper's exact config
│
├── notebooks/
│   ├── exploration.ipynb      # Development notebook
│   └── quickstart.ipynb       # User-facing notebook
│
├── scripts/
│   ├── train.sh
│   └── evaluate.sh
│
└── docs/
    ├── equations.md           # All equations with implementations
    └── benchmarks.md          # Benchmark results

Checklist Before Completion

All equations have corresponding tests
All tests pass (uv run pytest)
Code formatted (uv run ruff format)
Types checked (uv run ty check src/)
README complete with quick start
ARCHITECTURE.md describes design
Git initialized with meaningful commit
Can run basic example without errors
Paper references documented in code

Integration with LeCoder-cgpu

After code is ready and verified locally:

# Connect to Colab GPU
lecoder-cgpu connect

# Upload project
lecoder-cgpu upload ./

# Run training
lecoder-cgpu run "cd [project] && uv sync && uv run python -m src.train"

# Download results
lecoder-cgpu download checkpoints/

See tools/colab-execution.md for detailed Colab integration.

research-reproduction

Install Skill

SKILL.md