| name | perform-sweep |
| description | Design, configure, launch, and analyze ablation sweeps for GRPO training. Use for hypothesis testing, hyperparameter experiments, and systematic comparisons. |
Perform Sweep
End-to-end workflow for running ablation experiments on the Diplomacy GRPO training pipeline.
Quick Reference
| Phase | Action | Command |
|---|---|---|
| Configure | Create sweep.yaml | See YAML Reference |
| Validate | Dry run | python scripts/launch_sweep.py <path> --dry-run |
| Info | Show config | python scripts/launch_sweep.py <path> --info |
| Launch | Start sweep | python scripts/launch_sweep.py <path> |
| Status | Check progress | python scripts/launch_sweep.py <path> --status |
| List | List all sweeps | python scripts/launch_sweep.py --list |
| Analyze | Compare results | Use experiment-analysis skill |
Workflow
1. Hypothesis Design
- Review recent experiments in
experiments/experiment-tracker.md - Identify one variable to test (e.g., horizon length, scoring function)
- Predict expected outcome
- Document reasoning in sweep.yaml
hypothesisfield
2. YAML Configuration
Create experiments/sweeps/<name>/sweep.yaml:
metadata:
name: "my-ablation"
description: "Testing hypothesis X"
hypothesis: "Longer horizons should improve strategic play"
experiment_tag_prefix: "my-ablation"
defaults:
total_steps: 100
runs:
A:
name: "control"
description: "Baseline configuration"
config:
experiment_tag: "${metadata.experiment_tag_prefix}-A"
B:
name: "treatment"
description: "With longer horizon"
config:
rollout_horizon_years: 8
experiment_tag: "${metadata.experiment_tag_prefix}-B"
See YAML Reference for full schema.
3. Validate Configuration
# Show sweep info
python scripts/launch_sweep.py experiments/sweeps/<name>/ --info
# Dry run (validates config, shows what would run)
python scripts/launch_sweep.py experiments/sweeps/<name>/ --dry-run
4. Launch and Monitor
# Launch (fire-and-forget - runs in cloud)
python scripts/launch_sweep.py experiments/sweeps/<name>/
# Check status anytime
python scripts/launch_sweep.py experiments/sweeps/<name>/ --status
# List all sweeps
python scripts/launch_sweep.py --list
5. Analysis
After sweep completes, use the experiment-analysis skill:
# Full analysis for each run
uv run python .claude/skills/experiment-analysis/analyze_elo.py <run-name>
# Compare in WandB
# Filter by experiment_tag_prefix (e.g., "my-ablation")
Key Features
- Fire-and-forget: Launch and close laptop - sweep runs in Modal cloud
- Auto-resume: If Modal times out (24hr max), sweep automatically respawns
- Sequential execution: Runs one training at a time (infra constraint)
- Progress tracking: State saved after each run for recovery
Example Sweeps
See existing sweeps in experiments/sweeps/:
longer-horizon-inverted-weight-ablation/- 2x2 ablation on horizon and scoring
Integration
- Use
experiment-analysisskill for post-sweep metrics analysis - Results logged to WandB with
experiment_tagfor filtering - Document findings in sweep directory's
results.md