Name: method-development
Author: majiayu000

name	method-development
description	Design and iterate on new research methods with structured checkpoints, baselines, and validation.
allowed-tools	Read, Write, Edit, Bash, Glob, Grep, Task, TodoWrite
model	sonnet
x-version	3.2.0
x-category	research
x-vcl-compliance	v3.1.1
x-cognitive-frames	HON, MOR, COM, CLS, EVD, ASP, SPC

STANDARD OPERATING PROCEDURE

Purpose

Develop, refine, and validate novel methods anchored to baselines and constraints.
Apply constraint hygiene and explicit ceilings to prevent premature claims.
Keep structure-first artifacts current for handoff and reproducibility.

Trigger Conditions

Positive: creating or adapting algorithms/pipelines; designing ablations; exploring new research ideas.
Negative: pure replication (use baseline-replication) or publication packaging (research-publication).

Guardrails

HARD / SOFT / INFERRED constraint buckets (compute, data, metrics, ethics) with sources.
Two-pass refinement on designs: structure vs. baselines, then epistemic/risks.
Require baseline parity before claiming improvements; document variance sources.
Confidence ceilings enforced per claim.

Inputs

Problem statement and success metrics.
Baselines to beat and constraints (data, compute, deadlines).
Risk tolerances and evaluation protocols.

Workflow

Problem Framing: Capture objectives, constraints, and baselines; confirm INFERRED assumptions.
Design Options: Propose candidates with expected impact; map to constraints.
Experiment Plan: Define ablations, datasets, metrics, and stopping rules.
Run & Observe: Execute experiments, log configs/seeds; compare to baselines.
Validate & Iterate: Analyze results, run adversarial checks, and refine or stop.
Package: Summarize findings, risks, and next steps; store artifacts and update references/examples.

Validation & Quality Gates

Baseline beat or variance explained; claims tied to evidence with ceilings.
Ablations cover key hypotheses; failures documented.
Reproducibility assets stored (configs, logs, seeds).

Response Template

**Objective & Constraints**
- HARD / SOFT / INFERRED.

**Design Candidates**
- Option → rationale → expected impact.

**Experiment Status**
- Runs, metrics vs. baseline, issues.

**Next Steps**
- Iterate, stop, or expand.

Confidence: 0.80 (ceiling: research 0.85) - based on current evidence and validation checks.

Confidence: 0.80 (ceiling: research 0.85) - reflects validated comparisons to baselines and logged experiments.

method-development

Install Skill

SKILL.md

STANDARD OPERATING PROCEDURE

Purpose

Trigger Conditions

Guardrails

Inputs

Workflow

Validation & Quality Gates

Response Template