| name | design-experiment |
| description | Plan LLM fine-tuning and evaluation experiments. Use when the user wants to design a new experiment, plan training runs, or create an experiment_summary.yaml file. |
Design Experiment
You help users plan experiments for fine-tuning and evaluating LLMs. Create a plan that specifies the complete workflow from training through evaluation, verifies resources, and documents all steps in a structured YAML configuration.
Your Task
Guide the user through designing their experiment by asking questions, verifying resources, and creating a comprehensive experiment_summary.yaml file that documents the complete plan.
Workflow
Follow the three-stage process:
1. Parameter Selection → param_selection.md
Guide the user through 9 interactive steps to gather all experiment parameters:
- Determine experiment purpose and location (workflow test vs research experiment)
- Understand the experiment (scientific question, variables)
- Confirm tool choices (torchtune for preparation, inspect-ai for evaluation)
- Design training runs (models, datasets, hyperparameters)
- Design evaluation runs (tasks, epochs, evaluation matrix)
- Establish naming (experiment name, run names)
- Verify resources (models, datasets, eval scripts exist)
- Get approval (validate first, then present)
- Create files (proceed to generation stage)
See param_selection.md for:
- Complete question flow for each step
- Auto-detection logic for experiment location
- Resource verification commands
- Conversation patterns
2. Validation → validation.md
Before presenting plan to user (step 8), validate completeness:
- ✓ All YAML sections present and properly structured
- ✓ All run names follow convention
- ✓ All parameters documented (variables and controls)
- ✓ Evaluation plan is consistent (0-indexed epochs, base vs fine-tuned)
- ✓ System prompt matches between training and evaluation (critical!)
- ✓ All resources verified (or noted as prerequisites)
See validation.md for:
- Complete validation checklist
- Common issues to check
- How to handle missing prerequisites
3. Experiment Generation → experiment_generation.md
After user approves, create output files:
experiment_summary.yaml- Structured experiment configuration (usetemplates/experiment_summary.yaml)design-experiment.jsonl- Machine-readable audit trail (seelogging.md)
Then ask about next steps (scaffold-experiment?).
See experiment_generation.md for:
- File creation instructions
- YAML formatting guidance
- Next steps conversation pattern
- Prerequisites handling
Cross-Cutting Concerns
Logging → logging.md
IMPORTANT: Throughout param_selection and generation, create detailed log at {experiment_dir}/design-experiment.jsonl.
What to log:
- ✓ Resource verification (ls, du, df commands and results)
- ✓ Prior run searches (if performed)
- ✓ Decisions (naming, recipe, configuration)
- ✓ File creation
Format: JSON Lines (.jsonl) - one JSON object per line
See logging.md for:
- Complete log format specification
- All action types with schemas
- Example entries for each action type
- When to log during workflow
Templates → templates/
Reference materials for output generation:
templates/experiment_summary.yaml- YAML schema and structure for experiment plan
Important Reminders
- Use paths from
claude.local.mdfor models, datasets, scratch directories - Always verify resources exist before finalizing plan (log all verification)
- System prompt consistency is critical - must match between training and evaluation for inspect-ai
- Epochs are 0-indexed - Use [0, 1, 2] in evaluation matrix
- Base models use
epochs: null, fine-tuned models useepochs: [0, 1] - Document tool choices in YAML - torchtune for training, inspect-ai for evaluation
- Handle missing resources gracefully - note as prerequisites, don't block the plan
- If inspect-ai task doesn't exist - note that
create-inspect-taskskill should be run first - Generate YAML, not Markdown - Use structured YAML format with proper indentation
Module Organization
This skill uses the param_selection → validation → generation pattern:
| Module | Purpose | Lines |
|---|---|---|
| param_selection.md | 9-step interactive workflow | ~340 |
| validation.md | Completeness checklist | ~140 |
| experiment_generation.md | Create YAML and JSONL files | ~125 |
| logging.md | JSONL audit trail specification | ~400 |
| templates/experiment_summary.yaml | YAML schema and structure | ~150 |
Pattern: Three action verbs (selection, validation, generation) matching scaffold/run skills, plus cross-cutting logging and templates.
See README.md for: Complete pattern documentation and rationale.