Claude Code Plugins

Community-maintained marketplace

Feedback
0
0

Design, configure, launch, and analyze ablation sweeps for GRPO training. Use for hypothesis testing, hyperparameter experiments, and systematic comparisons.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name perform-sweep
description Design, configure, launch, and analyze ablation sweeps for GRPO training. Use for hypothesis testing, hyperparameter experiments, and systematic comparisons.

Perform Sweep

End-to-end workflow for running ablation experiments on the Diplomacy GRPO training pipeline.

Quick Reference

Phase Action Command
Configure Create sweep.yaml See YAML Reference
Validate Dry run python scripts/launch_sweep.py <path> --dry-run
Info Show config python scripts/launch_sweep.py <path> --info
Launch Start sweep python scripts/launch_sweep.py <path>
Status Check progress python scripts/launch_sweep.py <path> --status
List List all sweeps python scripts/launch_sweep.py --list
Analyze Compare results Use experiment-analysis skill

Workflow

1. Hypothesis Design

  • Review recent experiments in experiments/experiment-tracker.md
  • Identify one variable to test (e.g., horizon length, scoring function)
  • Predict expected outcome
  • Document reasoning in sweep.yaml hypothesis field

2. YAML Configuration

Create experiments/sweeps/<name>/sweep.yaml:

metadata:
  name: "my-ablation"
  description: "Testing hypothesis X"
  hypothesis: "Longer horizons should improve strategic play"
  experiment_tag_prefix: "my-ablation"

defaults:
  total_steps: 100

runs:
  A:
    name: "control"
    description: "Baseline configuration"
    config:
      experiment_tag: "${metadata.experiment_tag_prefix}-A"
  B:
    name: "treatment"
    description: "With longer horizon"
    config:
      rollout_horizon_years: 8
      experiment_tag: "${metadata.experiment_tag_prefix}-B"

See YAML Reference for full schema.

3. Validate Configuration

# Show sweep info
python scripts/launch_sweep.py experiments/sweeps/<name>/ --info

# Dry run (validates config, shows what would run)
python scripts/launch_sweep.py experiments/sweeps/<name>/ --dry-run

4. Launch and Monitor

# Launch (fire-and-forget - runs in cloud)
python scripts/launch_sweep.py experiments/sweeps/<name>/

# Check status anytime
python scripts/launch_sweep.py experiments/sweeps/<name>/ --status

# List all sweeps
python scripts/launch_sweep.py --list

5. Analysis

After sweep completes, use the experiment-analysis skill:

# Full analysis for each run
uv run python .claude/skills/experiment-analysis/analyze_elo.py <run-name>

# Compare in WandB
# Filter by experiment_tag_prefix (e.g., "my-ablation")

Key Features

  • Fire-and-forget: Launch and close laptop - sweep runs in Modal cloud
  • Auto-resume: If Modal times out (24hr max), sweep automatically respawns
  • Sequential execution: Runs one training at a time (infra constraint)
  • Progress tracking: State saved after each run for recovery

Example Sweeps

See existing sweeps in experiments/sweeps/:

  • longer-horizon-inverted-weight-ablation/ - 2x2 ablation on horizon and scoring

Integration

  • Use experiment-analysis skill for post-sweep metrics analysis
  • Results logged to WandB with experiment_tag for filtering
  • Document findings in sweep directory's results.md