| name | marketing-experimentation |
| description | Systematic marketing experimentation process - discover concepts, generate hypotheses, coordinate multiple experiments, synthesize results, generate next-iteration ideas through rigorous validation cycles |
Marketing Experimentation
Overview
Use this skill when you need to validate marketing concepts or business ideas through rigorous experimental cycles. This skill orchestrates the complete Build-Measure-Learn cycle from concept to data-driven signal.
When to use this skill:
- You have a marketing concept or business idea that needs validation
- You want to test multiple related hypotheses systematically
- You need to integrate results across multiple experiments
- You're designing the next iteration based on experimental evidence
- You're conducting qualitative market research combined with quantitative testing
What this skill does:
- Validates concepts through market research before experimentation
- Generates multiple testable hypotheses from marketing ideas
- Coordinates multiple experiments: quantitative (hypothesis-testing) and qualitative (qualitative-research)
- Synthesizes results across experiments using interpreting-results and creating-visualizations
- Produces clear signals (positive/negative/null/mixed) for each campaign
- Generates actionable next-iteration ideas based on experimental evidence
What this skill does NOT do:
- Design individual experiments (delegates to hypothesis-testing or qualitative-research)
- Execute statistical analysis directly (uses hypothesis-testing for quantitative rigor)
- Conduct interviews/surveys/observations directly (uses qualitative-research for qualitative rigor)
- Operationalize successful ideas (focuses on validation, not scaling)
- Platform-specific implementation (tool-agnostic techniques only)
Integration with existing skills:
- Delegates to
hypothesis-testingfor quantitative experiment design and execution (metrics, A/B tests, statistical analysis) - Delegates to
qualitative-researchfor qualitative experiment design and execution (interviews, surveys, focus groups, observations) - Uses
interpreting-resultsto synthesize findings across multiple experiments - Uses
creating-visualizationsto communicate aggregate results - Invokes
market-researcheragent for concept validation via internet research
Multi-conversation persistence: This skill is designed for campaigns spanning days or weeks. Each phase documents completely enough that new conversations can resume after extended breaks. The experiment tracker (04-experiment-tracker.md) serves as the living coordination hub.
Prerequisites
Required skills:
hypothesis-testing- Quantitative experiment design and execution (invoked for metric-based experiments)qualitative-research- Qualitative experiment design and execution (invoked for interviews, surveys, focus groups, observations)interpreting-results- Result synthesis and pattern identification (invoked in Phase 5)creating-visualizations- Aggregate result visualization (invoked in Phase 5)
Required agents:
market-researcher- Concept validation via internet research (invoked in Phase 1)
Required knowledge:
- Understanding of Lean Startup Build-Measure-Learn cycle
- Familiarity with marketing tactics (landing pages, ads, email, content)
- Basic experimental design principles (control/treatment, signals, metrics)
- Understanding of qualitative vs quantitative research methods
Data requirements:
- None initially (market research is qualitative)
- Data requirements emerge from experiment design in Phase 4
- Quantitative experiments (hypothesis-testing): SQL databases, analytics data, A/B test results
- Qualitative experiments (qualitative-research): Interview transcripts, survey responses, observation notes
Mandatory Process Structure
CRITICAL: This is a 6-phase process skill. You MUST complete all phases in order. Use TodoWrite to track progress through each phase.
TodoWrite template:
When starting a marketing-experimentation session, create these todos:
- [ ] Phase 1: Discovery & Asset Inventory
- [ ] Phase 2: Hypothesis Generation
- [ ] Phase 3: Prioritization
- [ ] Phase 4: Experiment Coordination
- [ ] Phase 5: Cross-Experiment Synthesis
- [ ] Phase 6: Iteration Planning
Workspace structure:
All work for a marketing-experimentation session is saved to:
analysis/marketing-experimentation/[campaign-name]/
├── 01-discovery.md
├── 02-hypothesis-generation.md
├── 03-prioritization.md
├── 04-experiment-tracker.md
├── 05-synthesis.md
├── 06-iteration-plan.md
└── experiments/
├── [experiment-1]/ # hypothesis-testing session
├── [experiment-2]/ # hypothesis-testing session
└── [experiment-3]/ # hypothesis-testing session
Phase progression rules:
- Each phase has a CHECKPOINT with verification requirements
- You MUST satisfy all checkpoint requirements before proceeding to the next phase
- Document every decision with rationale in the numbered markdown files
- Commit markdown files after each phase completes
- The experiment tracker (04-experiment-tracker.md) is a LIVING DOCUMENT - update it throughout Phase 4
Multi-conversation resumption:
- At the start of any conversation, check if an experiment tracker exists for this campaign
- If it exists, read it first to understand current experiment status
- Update the tracker as experiments progress
- All phases should be complete enough to resume after days or weeks
Phase 1: Discovery & Asset Inventory
CHECKPOINT: Before proceeding, you MUST have:
- Gathered business concept description from user
- Invoked market-researcher agent and documented findings
- Completed asset inventory (content, campaigns, audiences, data)
- Defined success criteria and validation signals
- Documented known constraints
- Saved to
01-discovery.md
Instructions
Gather the business concept
- Ask user to describe the marketing concept or business idea to validate
- What problem does it solve? Who is the target audience?
- What's the desired outcome? (awareness, leads, conversions, etc.)
- What stage is this idea at? (new concept, existing campaign, iteration)
Invoke market-researcher agent for concept validation
Dispatch the
market-researcheragent with the concept description:- Agent will research market demand signals
- Agent will identify similar solutions and competitors
- Agent will analyze audience needs and pain points
- Agent will find validation evidence (case studies, reviews, testimonials)
Document agent findings in
01-discovery.mdunder "Market Research Findings"Conduct asset inventory
Work with user to inventory existing assets that could be leveraged:
Content Assets:
- Blog posts, case studies, whitepapers
- Video content, webinars, tutorials
- Social media presence and following
- Email lists and subscriber segments
Campaign Assets:
- Existing ad campaigns and performance data
- Landing pages and conversion rates
- Email campaigns and open/click rates
- SEO performance and keyword rankings
Audience Assets:
- Customer segments and personas
- Audience data (demographics, behaviors, preferences)
- Customer feedback and reviews
- Support tickets and common questions
Data Assets:
- Analytics platforms (Google Analytics, Mixpanel, etc.)
- CRM data (Salesforce, HubSpot, etc.)
- Ad platform data (Google Ads, Facebook Ads, etc.)
- Email platform data (Mailchimp, SendGrid, etc.)
Define success criteria and validation signals
Work with user to define:
- What metrics indicate success? (CTR, conversion rate, CAC, LTV, etc.)
- What magnitude of change is meaningful? (practical significance thresholds)
- What signal types are acceptable?
- Positive: Validates concept, proceed to scale
- Negative: Invalidates concept, pivot or abandon
- Null: Inconclusive, needs refinement or more data
- Mixed: Some aspects work, some don't, iterate strategically
Document known constraints
Capture any constraints that will affect experimentation:
- Budget constraints (ad spend limits, tool costs)
- Time constraints (launch deadlines, seasonal factors)
- Resource constraints (team capacity, content production)
- Technical constraints (platform limitations, integration issues)
- Regulatory constraints (GDPR, CCPA, industry regulations)
Create
01-discovery.mdwith:./templates/01-discovery.mdSTOP and get user confirmation
- Review discovery findings with user
- Confirm asset inventory is complete
- Confirm success criteria are appropriate
- Do NOT proceed to Phase 2 until confirmed
Common Rationalization: "I'll skip discovery and go straight to testing - the concept is obvious" Reality: Discovery surfaces assumptions, constraints, and existing assets that dramatically affect experiment design. Always start with discovery.
Common Rationalization: "I don't need market research - I already know this market" Reality: The market-researcher agent provides current, data-driven validation signals that prevent building experiments around false assumptions. Always validate.
Common Rationalization: "Asset inventory is busywork - I'll figure out what's available as I go" Reality: Existing assets can dramatically reduce experiment cost and time. Inventorying first prevents reinventing wheels and enables building on proven foundations.
Phase 2: Hypothesis Generation
CHECKPOINT: Before proceeding, you MUST have:
- Generated 5-10 testable hypotheses from the concept
- Each hypothesis maps to a specific tactic/channel
- Each hypothesis has expected outcome and rationale
- Hypotheses cover multiple tactics (not all ads or all email)
- Referenced relevant frameworks (Lean Startup, AARRR, ICE/RICE)
- Saved to
02-hypothesis-generation.md
Instructions
Generate 5-10 testable hypotheses
For each hypothesis, use this format:
Hypothesis [N]: [Brief statement]
- Tactic/Channel: [landing page | ad campaign | email sequence | content marketing | social media | SEO | etc.]
- Expected Outcome: [Specific, measurable result]
- Rationale: [Why we believe this will work based on discovery findings]
- Variables to Test: [What will we manipulate/measure]
Example hypothesis:
Hypothesis 1: Value proposition clarity drives conversion
- Tactic/Channel: Landing page A/B test
- Expected Outcome: 15%+ increase in conversion rate from landing page variant with simplified value proposition
- Rationale: Market research showed audience confusion about product benefits. Discovery found existing landing page has 8 different value propositions competing for attention.
- Variables to Test: Headline clarity, benefit hierarchy, CTA prominence
Ensure tactic coverage
Verify hypotheses cover multiple marketing tactics:
Acquisition Tactics:
- Landing pages (conversion optimization, value prop testing, layout)
- Ad campaigns (targeting, creative, messaging, platforms)
- Content marketing (blog posts, videos, webinars, lead magnets)
- SEO (keyword targeting, content optimization, technical SEO)
Activation Tactics:
- Email sequences (onboarding, nurture, activation)
- Product tours (in-app guidance, feature discovery)
- Social proof (testimonials, case studies, reviews)
Retention Tactics:
- Email campaigns (engagement, re-activation, upsell)
- Content (newsletters, educational content, community)
Don't generate 10 ad hypotheses. Aim for diversity across tactics.
Reference experimentation frameworks
Lean Startup Build-Measure-Learn:
- Build: What's the minimum viable test? (landing page, ad, email, etc.)
- Measure: What metrics indicate success/failure?
- Learn: What will we learn regardless of outcome?
AARRR Pirate Metrics:
- Acquisition: How do users find us?
- Activation: Do they have a great first experience?
- Retention: Do they come back?
- Referral: Do they tell others?
- Revenue: Do they pay?
Map each hypothesis to one or more AARRR stages.
ICE/RICE Prioritization (used in Phase 3):
- Impact: How much will this move the metric?
- Confidence: How sure are we this will work?
- Ease: How easy is this to implement?
- Reach: How many users will this affect? (RICE only)
Create
02-hypothesis-generation.mdwith:./templates/02-hypothesis-generation.mdSTOP and get user confirmation
- Review all hypotheses with user
- Confirm hypotheses are testable and meaningful
- Confirm tactic coverage is appropriate
- Do NOT proceed to Phase 3 until confirmed
Common Rationalization: "I'll generate hypotheses as I build experiments - more efficient" Reality: Generating hypotheses before prioritization enables strategic selection of highest-impact tests. Generating ad-hoc leads to testing whatever's easiest, not what matters most.
Common Rationalization: "I'll focus all hypotheses on one tactic (ads) since that's what we know" Reality: Tactic diversity reveals which channels work for this concept. Single-tactic testing creates blind spots and missed opportunities.
Common Rationalization: "I'll write vague hypotheses and refine them during experiment design" Reality: Vague hypotheses lead to vague experiments that produce vague results. Specific hypotheses with expected outcomes enable clear signal detection.
Common Rationalization: "More hypotheses = better coverage, I'll generate 20+" Reality: Too many hypotheses dilute focus and create analysis paralysis in prioritization. 5-10 high-quality hypotheses enable strategic selection of 2-4 tests.
Phase 3: Prioritization
CHECKPOINT: Before proceeding, you MUST have:
- Scored all hypotheses using ICE or RICE framework with computational method
- Created prioritized backlog (highest to lowest score) using Python script
- Selected 2-4 highest-priority hypotheses for testing
- Documented prioritization rationale
- Identified any dependencies or sequencing requirements
- Saved to
03-prioritization.md
Instructions
CRITICAL: You MUST use computational methods (Python scripts) to calculate scores. Do NOT estimate or manually calculate scores.
Choose prioritization framework
ICE Framework (simpler, faster):
- Impact: How much will this move the success metric? (1-10 scale)
- Confidence: How confident are we this will work? (1-10 scale)
- Ease: How easy is this to implement? (1-10 scale)
- Score: (Impact × Confidence) / Ease
RICE Framework (more comprehensive):
- Reach: How many users will this affect? (absolute number or percentage)
- Impact: How much will this move the metric per user? (1-10 scale: 0.25=minimal, 3=massive)
- Confidence: How confident are we in our estimates? (percentage: 50%, 80%, 100%)
- Effort: Person-weeks to implement (absolute number)
- Score: (Reach × Impact × Confidence) / Effort
Choose ICE for speed, RICE for precision when reach varies significantly.
Score each hypothesis using Python script
For ICE Framework:
Create a Python script to compute and sort ICE scores:
#!/usr/bin/env python3 """ ICE Score Calculator for Marketing Experimentation Computes ICE scores: (Impact × Confidence) / Ease Sorts hypotheses by score (highest to lowest) """ hypotheses = [ { "id": "H1", "name": "Value proposition clarity drives conversion", "impact": 8, "confidence": 7, "ease": 9 }, { "id": "H2", "name": "Ad targeting refinement", "impact": 7, "confidence": 6, "ease": 5 }, { "id": "H3", "name": "Email sequence optimization", "impact": 6, "confidence": 8, "ease": 8 }, { "id": "H4", "name": "Content marketing expansion", "impact": 5, "confidence": 4, "ease": 3 }, ] # Calculate ICE scores for h in hypotheses: h['ice_score'] = (h['impact'] * h['confidence']) / h['ease'] # Sort by ICE score (descending) sorted_hypotheses = sorted(hypotheses, key=lambda x: x['ice_score'], reverse=True) # Print results table print("| Hypothesis | Impact | Confidence | Ease | ICE Score | Rank |") print("|------------|--------|------------|------|-----------|------|") for rank, h in enumerate(sorted_hypotheses, 1): print(f"| {h['id']}: {h['name'][:30]} | {h['impact']} | {h['confidence']} | {h['ease']} | {h['ice_score']:.2f} | {rank} |")Usage:
python3 ice_calculator.pyFor RICE Framework:
Create a Python script to compute and sort RICE scores:
#!/usr/bin/env python3 """ RICE Score Calculator for Marketing Experimentation Computes RICE scores: (Reach × Impact × Confidence) / Effort Sorts hypotheses by score (highest to lowest) """ hypotheses = [ { "id": "H1", "name": "Value proposition clarity drives conversion", "reach": 10000, # users affected "impact": 3, # 0.25=minimal, 1=low, 2=medium, 3=high, 5=massive "confidence": 80, # percentage (50, 80, 100) "effort": 2 # person-weeks }, { "id": "H2", "name": "Ad targeting refinement", "reach": 50000, "impact": 1, "confidence": 50, "effort": 4 }, { "id": "H3", "name": "Email sequence optimization", "reach": 5000, "impact": 2, "confidence": 80, "effort": 3 }, { "id": "H4", "name": "Content marketing expansion", "reach": 20000, "impact": 1, "confidence": 50, "effort": 8 }, ] # Calculate RICE scores for h in hypotheses: # Convert confidence percentage to decimal confidence_decimal = h['confidence'] / 100 h['rice_score'] = (h['reach'] * h['impact'] * confidence_decimal) / h['effort'] # Sort by RICE score (descending) sorted_hypotheses = sorted(hypotheses, key=lambda x: x['rice_score'], reverse=True) # Print results table print("| Hypothesis | Reach | Impact | Confidence | Effort | RICE Score | Rank |") print("|------------|-------|--------|------------|--------|------------|------|") for rank, h in enumerate(sorted_hypotheses, 1): print(f"| {h['id']}: {h['name'][:30]} | {h['reach']} | {h['impact']} | {h['confidence']}% | {h['effort']}w | {h['rice_score']:.2f} | {rank} |")Usage:
python3 rice_calculator.pyScoring Guidance:
Impact (1-10 for ICE, 0.25-5 for RICE):
- ICE: 1-3 minimal, 4-6 moderate, 7-8 significant, 9-10 transformative
- RICE: 0.25 minimal, 1 low, 2 medium, 3 high, 5 massive
Confidence (1-10 for ICE, 50-100% for RICE):
- ICE: 1-3 speculative, 4-6 uncertain, 7-8 likely, 9-10 validated
- RICE: 50% low confidence, 80% high confidence, 100% certainty
Ease (1-10 for ICE):
- 1-3: Complex, significant resources
- 4-6: Moderate effort, some obstacles
- 7-8: Straightforward, few dependencies
- 9-10: Trivial, immediate execution
Effort (person-weeks for RICE):
- Estimate total person-weeks required
- Include design, implementation, monitoring time
- Examples: 1w (simple landing page), 4w (complex ad campaign), 8w (content series)
Run scoring script and document results
- Create the Python script (ice_calculator.py or rice_calculator.py)
- Update the
hypotheseslist with actual hypothesis data from Phase 2 - Run the script:
python3 [ice|rice]_calculator.py - Copy the output table into
03-prioritization.md - Include the script in the markdown file for reproducibility:
## Prioritization Calculation **Method:** ICE Framework **Calculation Script:** ```python [paste full script here]Results:
[paste output table here]
Select 2-4 highest-priority hypotheses
Considerations for selection:
- Don't test everything: Focus on highest-scoring 2-4 hypotheses
- Resource constraints: Match selection to available time/budget/capacity
- Learning value: Sometimes lower-scoring hypothesis with high uncertainty is worth testing
- Dependencies: Test prerequisites before dependent hypotheses
- Sequencing: Consider whether experiments need to run sequentially or can run in parallel
Selection criteria:
- Primary: Top 2-4 by ICE/RICE score from computational results
- Secondary: Balance quick wins vs. high-impact long-term bets
- Tertiary: Ensure tactic diversity (don't test 3 ad variants if other tactics untested)
Document experiment sequence
Determine execution strategy:
- Parallel: Multiple experiments running simultaneously (faster results, higher resource needs)
- Sequential: One experiment at a time (slower, easier to manage)
- Hybrid: Run independent experiments in parallel, sequence dependent ones
Example sequence plan:
Week 1-2: Launch H1 (landing page) and H3 (email) in parallel Week 3-4: Analyze H1 and H3 results Week 5-6: Launch H2 (ads) based on H1 learnings Week 7-8: Analyze H2 resultsCreate
03-prioritization.mdwith:./templates/03-prioritization.mdSTOP and get user confirmation
- Review computed scores and prioritization with user
- Confirm selected hypotheses are appropriate
- Confirm experiment sequence is feasible
- Do NOT proceed to Phase 4 until confirmed
Common Rationalization: "I'll test all hypotheses - don't want to miss opportunities" Reality: Resource constraints make testing everything impossible. Prioritization ensures highest-value experiments get resources. Unfocused testing produces weak signals across too many fronts.
Common Rationalization: "Scoring is subjective and arbitrary - I'll just pick what feels right" Reality: Scoring frameworks force explicit reasoning about trade-offs. "Feels right" selections optimize for recency bias and personal preference, not business value. Computational methods ensure consistency.
Common Rationalization: "I'll skip prioritization and go straight to easiest test" Reality: Easiest test rarely equals highest value. Prioritization prevents optimizing for ease at the expense of impact.
Common Rationalization: "I'll estimate scores mentally instead of running the script" Reality: Manual estimation introduces calculation errors and inconsistency. Python scripts ensure exact, reproducible results that can be audited and verified.
Phase 4: Experiment Coordination
CHECKPOINT: Before proceeding, you MUST have:
- Created experiment tracker with all selected hypotheses
- Determined experiment type for each hypothesis (Quantitative or Qualitative)
- Invoked appropriate skill for each hypothesis (hypothesis-testing OR qualitative-research)
- Updated tracker with experiment status (Planned, In Progress, Complete)
- Documented location of each experiment session
- Saved experiment tracker to
04-experiment-tracker.md - Note: This phase may span multiple days/weeks and conversations
Instructions
CRITICAL: This phase is designed for multi-conversation workflows. The experiment tracker is a LIVING DOCUMENT that you will update throughout experimentation. New conversations should ALWAYS read this file first.
Determine experiment type for each hypothesis
CRITICAL: Before creating the tracker, classify each hypothesis as Quantitative or Qualitative.
Quantitative experiments use hypothesis-testing skill:
- Measure numeric metrics: CTR, conversion rate, bounce rate, time on page, revenue, CAC, LTV, etc.
- Rely on existing data sources: Google Analytics, ad platforms, CRM, email platforms, database queries
- Test using A/B tests, multivariate tests, or time-series analysis
- Require statistical significance testing
- Examples:
- Landing page A/B test measuring conversion rate
- Ad campaign comparing CTR across different creatives
- Email sequence measuring open rate and click-through rate
- SEO experiment tracking organic traffic changes
Qualitative experiments use qualitative-research skill:
- Gather non-numeric insights: opinions, experiences, needs, pain points, motivations
- Collect through interviews, surveys, focus groups, or observations
- Analyze using thematic analysis rather than statistical tests
- Focus on understanding why behaviors occur
- Examples:
- Customer discovery interviews to understand pain points
- Open-ended survey asking about product needs
- Focus group discussing ad creative perceptions
- Observational study of how users interact with product
Decision criteria:
Ask: "What do we need to learn?"
- If answer is "Does X increase metric Y by Z%?" → Quantitative (hypothesis-testing)
- If answer is "Why do users do X?" or "What do users think about Y?" → Qualitative (qualitative-research)
Ask: "What data will we collect?"
- If answer is "Metrics from analytics" → Quantitative (hypothesis-testing)
- If answer is "Interview transcripts, survey responses, or observation notes" → Qualitative (qualitative-research)
Mixed methods:
- Some hypotheses may require BOTH quantitative and qualitative experiments
- Example: "Value prop clarity drives conversion" could test:
- Quantitatively: A/B test landing page, measure conversion rate (hypothesis-testing)
- Qualitatively: Interview users about which value prop resonates (qualitative-research)
- Track these as separate experiments with linked hypotheses in the tracker
Create experiment tracker
The tracker is your coordination hub for managing multiple experiments over time.
Create
04-experiment-tracker.mdwith:./templates/04-experiment-tracker.mdTracker format:
For each selected hypothesis, create an entry:
### Experiment 1: [Hypothesis Brief Name] **Status:** [Planned | In Progress | Complete] **Hypothesis:** [Full hypothesis statement from Phase 2] **Tactic/Channel:** [landing page | ads | email | etc.] **Priority Score:** [ICE/RICE score from Phase 3] **Start Date:** [YYYY-MM-DD or "Not started"] **Completion Date:** [YYYY-MM-DD or "In progress"] **Location:** `analysis/marketing-experimentation/[campaign-name]/experiments/[experiment-name]/` **Signal:** [Positive | Negative | Null | Mixed | "Not analyzed"] **Key Findings:** [Brief summary when complete, "TBD" otherwise]Invoke appropriate skill for each experiment
For each hypothesis marked "Planned" or "In Progress":
Step 1: Read the hypothesis details from
02-hypothesis-generation.mdStep 2a: If Quantitative experiment, invoke
hypothesis-testingskill:Use hypothesis-testing skill to test: [Hypothesis statement] Context for hypothesis-testing: - Session name: [descriptive-name-for-experiment] - Save location: analysis/marketing-experimentation/[campaign-name]/experiments/[experiment-name]/ - Success criteria: [From Phase 1 discovery] - Expected outcome: [From Phase 2 hypothesis] - Metric to measure: [CTR, conversion rate, etc.] - Data source: [Google Analytics, database, etc.]Step 2b: If Qualitative experiment, invoke
qualitative-researchskill:Use qualitative-research skill to conduct: [Hypothesis statement] Context for qualitative-research: - Session name: [descriptive-name-for-experiment] - Save location: analysis/marketing-experimentation/[campaign-name]/experiments/[experiment-name]/ - Research question: [From Phase 2 hypothesis] - Collection method: [Interviews | Surveys | Focus Groups | Observations] - Success criteria: [What insights validate/invalidate hypothesis?]Step 3: Update experiment tracker:
- Change status from "Planned" to "In Progress"
- Add start date
- Update location with actual path
- Document experiment type (Quantitative or Qualitative)
Step 4a: Let hypothesis-testing skill complete its 5-phase workflow:
- Phase 1: Hypothesis Formulation
- Phase 2: Test Design
- Phase 3: Data Analysis
- Phase 4: Statistical Interpretation
- Phase 5: Conclusion
Step 4b: Let qualitative-research skill complete its 6-phase workflow:
- Phase 1: Research Design
- Phase 2: Data Collection
- Phase 3: Data Familiarization
- Phase 4: Systematic Coding
- Phase 5: Theme Development
- Phase 6: Synthesis & Reporting
Step 5: When skill completes, update tracker:
- Change status to "Complete"
- Add completion date
- Document signal (Positive/Negative/Null/Mixed)
- Summarize key findings
Step 6: Commit tracker updates after each status change
Handle multi-conversation resumption
At the start of EVERY conversation during Phase 4:
- Check if
04-experiment-tracker.mdexists - If it exists, READ IT FIRST before doing anything else
- Review experiment status:
- Planned: Ready to launch
- In Progress: Check hypothesis-testing session for current phase
- Complete: Ready for synthesis (Phase 5)
- Ask user which experiment to continue or which new experiment to launch
- Update tracker with new status/dates/findings
- Commit tracker updates
Example resumption:
I've read the experiment tracker. Current status: - Experiment 1 (H1: Value prop): Complete, Positive signal - Experiment 2 (H3: Email sequence): In Progress, currently in hypothesis-testing Phase 3 - Experiment 3 (H2: Ad targeting): Planned, not yet started What would you like to do? a) Continue Experiment 2 (in hypothesis-testing Phase 3) b) Start Experiment 3 c) Move to synthesis (Phase 5) since Experiment 1 is complete- Check if
Coordinate parallel vs. sequential experiments
Parallel execution (multiple experiments simultaneously):
- Launch multiple hypothesis-testing sessions
- Track each separately in experiment tracker
- Update tracker as each progresses independently
- Requires managing multiple analysis directories
Sequential execution (one at a time):
- Complete one experiment fully before starting next
- Simpler tracking, easier to manage
- Can incorporate learnings between experiments
Hybrid execution:
- Run independent experiments in parallel
- Sequence dependent experiments (e.g., H2 depends on H1 insights)
Progress through all experiments
Continue invoking hypothesis-testing and updating the tracker until:
- All selected experiments have status "Complete"
- All experiments have documented signals
- All findings are summarized in tracker
Only when ALL experiments are complete should you proceed to Phase 5.
Common Rationalization: "I'll keep experiment details in my head - the tracker is just busywork" Reality: Multi-day campaigns lose context between conversations. The tracker is the ONLY source of truth that persists across sessions. Without it, you'll re-ask questions and lose progress.
Common Rationalization: "I'll wait until all experiments finish before updating the tracker" Reality: Batch updates create opportunity for lost data. Update the tracker IMMEDIATELY after status changes. Real-time tracking prevents confusion and missed experiments.
Common Rationalization: "I'll design the experiment myself instead of using hypothesis-testing" Reality: hypothesis-testing skill provides rigorous experimental design, statistical analysis, and signal detection. Skipping it produces weak experiments with ambiguous results.
Common Rationalization: "All experiments are done, I don't need to update the tracker before synthesis" Reality: The tracker is your input to Phase 5. Incomplete tracker means incomplete synthesis. Update ALL fields (status, dates, signals, findings) before proceeding.
Phase 5: Cross-Experiment Synthesis
CHECKPOINT: Before proceeding, you MUST have:
- ALL experiments from Phase 4 marked "Complete" with signals documented
- Created aggregate results table across all experiments
- Invoked presenting-data skill to synthesize findings with visualizations
- Documented what worked, what didn't, and what's unclear
- Classified overall campaign signal (Positive/Negative/Null/Mixed)
- Saved to
05-synthesis.md
Instructions
CRITICAL: This phase synthesizes results ACROSS multiple experiments. Do NOT proceed until ALL Phase 4 experiments are complete with documented signals.
Verify experiment completion
Read
04-experiment-tracker.mdand verify:- All experiments have Status = "Complete"
- All experiments have Signal documented (Positive/Negative/Null/Mixed)
- All experiments have Key Findings summarized
If any experiments are incomplete, return to Phase 4 to finish them.
Create aggregate results table
Compile findings from all experiments into a summary table:
Example Aggregate Table (Mixed Quantitative & Qualitative):
Experiment Type Hypothesis Tactic Signal Key Finding Confidence E1 Quant Value prop clarity Landing page A/B test Positive Conversion rate +18% (p<0.05) High E2 Qual Customer pain points Discovery interviews Positive 8 of 10 cited onboarding complexity High E3 Quant Ad targeting Ads Null CTR +2% (not sig., p=0.12) Medium E4 Qual Ad message resonance Focus groups Negative 6 of 8 found messaging confusing High For Quantitative experiments (hypothesis-testing):
- Signal classification (Positive/Negative/Null/Mixed)
- Key metric measured (CTR, conversion rate, etc.)
- Magnitude of effect with statistical significance (e.g., "+18%, p<0.05")
- Confidence level from statistical analysis
For Qualitative experiments (qualitative-research):
- Signal classification (Positive/Negative/Null/Mixed)
- Key themes identified with prevalence (e.g., "8 of 10 participants mentioned X")
- Representative quotes or patterns
- Confidence assessment (credibility, dependability, transferability)
Invoke presenting-data skill for comprehensive synthesis
Use the
presenting-dataskill to create complete synthesis with visualizations and presentation materials:Use presenting-data skill to synthesize marketing experimentation results: Context: - Campaign: [campaign name] - Experiments completed: [count] - Results table: [paste aggregate table] - Audience: [stakeholders/decision-makers] - Format: [markdown report | slides | whitepaper] - Focus: Pattern identification across experiments (what works, what doesn't, what's unclear)presenting-data skill will handle:
- Pattern identification (using interpreting-results internally)
- Visualization creation (using creating-visualizations internally)
- Synthesis documentation (markdown, slides, or whitepaper format)
- Citation of sources (individual hypothesis-testing sessions)
- Reproducibility (references to experiment locations)
Focus areas for synthesis:
- What worked: Experiments with Positive signals (both quantitative and qualitative)
- What didn't work: Experiments with Negative signals (both quantitative and qualitative)
- What's unclear: Experiments with Null or Mixed signals
- Cross-experiment patterns: Do results cluster by tactic? By audience? By timing? Do quantitative and qualitative findings align or conflict?
- Triangulation: Do qualitative findings explain quantitative results? (e.g., interviews reveal WHY conversion rate increased)
- Confounding factors: Are there external factors affecting multiple experiments?
- Confidence assessment: Which findings are robust? Which are uncertain? How do qualitative and quantitative confidence levels compare?
Document patterns and insights
The presenting-data skill will create
05-synthesis.md(or slides/whitepaper) with:What Worked (Positive Signals):
- List experiments with positive results
- Explain WHY these worked (based on analysis)
- Identify commonalities across successful experiments
What Didn't Work (Negative Signals):
- List experiments with negative results
- Explain WHY these failed (based on analysis)
- Identify lessons learned
What's Unclear (Null/Mixed Signals):
- List experiments with inconclusive results
- Explain potential reasons (insufficient power, confounding factors, etc.)
- Identify what additional investigation is needed
Cross-Experiment Patterns:
- Do results cluster by tactic, audience, timing, or other factors?
- Are there confounding variables affecting multiple experiments?
- What overarching insights emerge?
Visualizations (created by presenting-data):
- Signal distribution (bar chart: Positive/Negative/Null/Mixed counts)
- Effect sizes (bar chart: metric changes by experiment)
- Confidence levels (scatter plot: effect size vs. confidence)
- Tactic performance (grouped by channel/tactic)
Classify overall campaign signal
Based on aggregate analysis (from presenting-data output), classify the campaign:
Positive: Campaign validates concept, proceed to scaling
- Multiple experiments show positive signals
- Successful tactics identified for scale-up
- Clear path to ROI improvement
Negative: Campaign invalidates concept, pivot or abandon
- Multiple experiments show negative signals
- No successful tactics identified
- Concept doesn't resonate with audience
Null: Campaign results inconclusive, needs refinement
- Most experiments show null signals
- Insufficient power or confounding factors
- Needs redesigned experiments or longer observation
Mixed: Some aspects work, some don't, iterate strategically
- Mix of positive and negative signals across experiments
- Some tactics work, others don't
- Selective scaling + pivots needed
Review presenting-data output and finalize synthesis
After presenting-data skill completes:
- Review generated synthesis document
- Verify all experiments are covered
- Confirm visualizations are appropriate
- Ensure signal classification is documented
- Make any necessary edits for clarity
STOP and get user confirmation
- Review synthesis findings with user
- Confirm pattern interpretations are accurate
- Confirm overall signal classification is appropriate
- Do NOT proceed to Phase 6 until confirmed
Common Rationalization: "I'll synthesize results mentally - no need to document patterns" Reality: Mental synthesis loses details and creates false confidence. Documented synthesis with presenting-data skill ensures intellectual honesty and identifies confounding factors you'd otherwise miss.
Common Rationalization: "I'll skip synthesis for experiments with clear signals" Reality: Individual experiment signals don't reveal cross-experiment patterns. Synthesis identifies why some tactics work while others don't - the strategic insight that guides iteration.
Common Rationalization: "Visualization is optional - the data speaks for itself" Reality: Tabular data obscures patterns. Visualization reveals signal distribution, effect size clusters, and confidence patterns that inform strategic decisions. presenting-data handles this systematically.
Phase 6: Iteration Planning
CHECKPOINT: Before proceeding, you MUST have:
- Generated 3-7 new experiment ideas based on synthesis findings
- Categorized ideas (scale winners, investigate nulls, pivot from failures, explore new)
- Documented campaign-level signal and strategic recommendation
- Explained feed-forward pattern (ideas → new marketing-experimentation sessions)
- Saved to
06-iteration-plan.md - Campaign complete - ready for next cycle or conclusion
Instructions
CRITICAL: Phase 6 generates experiment IDEAS, NOT hypotheses. Ideas feed into new marketing-experimentation sessions where Phase 2 formalizes hypotheses. Do NOT skip the discovery and hypothesis generation steps.
Generate 3-7 new experiment ideas
Based on Phase 5 synthesis, generate ideas for next iteration:
Idea Format:
Idea [N]: [Brief descriptive name]
- Rationale: [Why this idea based on current findings]
- Expected Learning: [What we'll learn from testing this]
- Category: [Scale Winners | Investigate Nulls | Pivot from Failures | Explore New]
Example Ideas:
Idea 1: Scale value prop landing page to paid ads
- Rationale: E1 showed +18% conversion from simplified value prop. Apply winning message to ad creative.
- Expected Learning: Does simplified value prop improve ad CTR and cost-per-conversion?
- Category: Scale Winners
Idea 2: Investigate email sequence timing sensitivity
- Rationale: E3 showed negative result for email sequence, but timing may be a confound (sent during holidays).
- Expected Learning: Is the email sequence inherently weak, or was timing the issue?
- Category: Investigate Nulls
Idea 3: Pivot from broad ad targeting to lookalike audiences
- Rationale: E2 showed null result for ad targeting. Broad targeting may dilute signal. Pivot to lookalike audiences based on E1 converters.
- Expected Learning: Do lookalike audiences outperform broad targeting?
- Category: Pivot from Failures
Categorize ideas by strategy
Scale Winners:
- Double down on successful tactics
- Apply winning patterns to new channels
- Increase budget/effort on validated approaches
- Examples: Winning landing page → ads, winning ad → email, winning message → content
Investigate Nulls:
- Redesign experiments with null/mixed results
- Address confounding factors identified in synthesis
- Increase statistical power (larger sample, longer duration)
- Examples: Retest with better timing, retest with clearer treatment, retest with focused audience
Pivot from Failures:
- Abandon unsuccessful approaches
- Try alternative tactics for same goal
- Apply learnings to avoid similar failures
- Examples: Broad targeting failed → try lookalike, generic message failed → try personalization
Explore New:
- Test entirely new tactics not in original hypothesis set
- Investigate new audience segments
- Explore new channels or platforms
- Examples: Add referral program, test new platform, try new content format
Document campaign-level signal and strategic recommendation
Campaign Summary:
- Overall Signal: [Positive | Negative | Null | Mixed]
- Key Wins: [List successful tactics]
- Key Learnings: [List insights regardless of signal]
- Strategic Recommendation: [What to do next]
Strategic Recommendations by Signal:
Positive Signal:
- Proceed to scaling: Increase budget/effort on winning tactics
- Optimize: Refine winning approaches for incremental gains
- Expand: Apply winning patterns to new channels/audiences
Negative Signal:
- Pivot: Major change in approach, audience, or value proposition
- Pause: Reassess concept before additional investment
- Abandon: Consider alternative concepts if no viable path forward
Null Signal:
- Refine: Redesign experiments with better power/clarity
- Investigate: Address confounding factors before new tests
- Extend: Continue observation period if time-dependent
Mixed Signal:
- Selective scaling: Double down on winners
- Selective pivots: Abandon or redesign losers
- Strategic iteration: Focus next tests on promising areas
Explain feed-forward pattern
CRITICAL: Ideas from Phase 6 are NOT ready for testing. They MUST go through a new marketing-experimentation session:
Feed-Forward Cycle:
Phase 6 generates IDEAS ↓ Start new marketing-experimentation session with idea ↓ Phase 1: Discovery (validate idea with market-researcher) ↓ Phase 2: Hypothesis Generation (formalize idea into testable hypotheses) ↓ Phase 3-6: Complete full experimental cycleWhy this matters:
- Ideas need market validation (Phase 1) before testing
- Ideas need formalization into specific hypotheses (Phase 2)
- Skipping discovery leads to untested assumptions
- Skipping hypothesis generation leads to vague experiments
Example: Phase 6 generates "Scale value prop to ads" (idea) → New session Phase 1: Market research on ad platform best practices → New session Phase 2: Generate hypotheses like "H1: Simplified value prop in ad headline increases CTR by 10%+" (specific, testable) → Continue with Phase 3-6 to test formal hypotheses
Create
06-iteration-plan.mdwith:./templates/06-iteration-plan.mdSTOP and review with user
- Review all iteration ideas with user
- Confirm strategic recommendation is appropriate
- Confirm understanding of feed-forward cycle
- Discuss whether to start new marketing-experimentation session or conclude campaign
Common Rationalization: "I'll turn ideas directly into experiments - skip the new session" Reality: Ideas need discovery and hypothesis generation. Skipping these steps leads to untested assumptions and vague experiments. Always run ideas through a new marketing-experimentation session.
Common Rationalization: "I'll generate hypotheses in Phase 6 for efficiency" Reality: Phase 6 generates IDEAS, Phase 2 (in a new session) generates hypotheses. Conflating these skips critical validation and formalization steps. Ideas → new session → hypotheses.
Common Rationalization: "Campaign signal is obvious from results, no need to document strategic recommendation" Reality: Documented recommendation provides clear guidance for stakeholders and future sessions. Without it, insights are lost and decisions become ad-hoc.
Common Rationalizations
These are rationalizations that lead to failure. When you catch yourself thinking any of these, STOP and follow the skill process instead.
"I'll skip discovery and just start testing - the concept is obvious"
Why this fails: Discovery surfaces assumptions, constraints, and existing assets that dramatically affect experiment design. "Obvious" concepts often hide critical assumptions that need validation.
Reality: Market-researcher agent provides current, data-driven validation signals. Asset inventory reveals resources that reduce experiment cost and time. Success criteria definition prevents ambiguous results. Always start with discovery.
What to do instead: Complete Phase 1 (Discovery & Asset Inventory) before generating hypotheses. Invoke market-researcher agent. Document all findings.
"I'll design the experiment myself instead of using hypothesis-testing or qualitative-research"
Why this fails: The research skills provide rigorous experimental design, analysis, and signal detection. hypothesis-testing ensures statistical rigor for quantitative experiments. qualitative-research ensures systematic rigor for qualitative experiments. Skipping them produces weak experiments with ambiguous results.
Reality: Marketing-experimentation is a meta-orchestrator that coordinates multiple experiments. It does NOT design experiments itself. Delegation to appropriate skills (hypothesis-testing or qualitative-research) ensures methodological rigor.
What to do instead: Determine experiment type (quantitative or qualitative) in Phase 4. Invoke hypothesis-testing skill for quantitative experiments. Invoke qualitative-research skill for qualitative experiments. Let the appropriate skill handle all design, execution, and analysis.
"One experiment is enough to draw conclusions"
Why this fails: Single experiments miss cross-experiment patterns. Some tactics work, others don't. Single-experiment campaigns can't identify which channels/tactics are most effective.
Reality: Marketing-experimentation tests 2-4 hypotheses to reveal strategic insights. Synthesis (Phase 5) identifies patterns across experiments - which tactics work, which don't, and why.
What to do instead: Follow Phase 3 prioritization to select 2-4 hypotheses. Complete all experiments before synthesis. Use Phase 5 to identify patterns.
"I'll wait until all experiments complete before updating the tracker"
Why this fails: Batch updates create opportunity for lost data. Multi-day campaigns lose context between conversations. Incomplete tracker leads to missed experiments and confusion.
Reality: The experiment tracker (04-experiment-tracker.md) is the ONLY source of truth that persists across sessions. Update it IMMEDIATELY after status changes.
What to do instead: Update tracker after every status change (Planned → In Progress, In Progress → Complete). Commit tracker updates to git. Read tracker FIRST in every new conversation.
"Results are obvious, I don't need to document synthesis"
Why this fails: Individual experiment signals don't reveal cross-experiment patterns. "Obvious" interpretations miss confounding factors and alternative explanations.
Reality: Documented synthesis with presenting-data skill ensures intellectual honesty. Visualization reveals patterns. Statistical assessment identifies robust vs uncertain findings.
What to do instead: Always complete Phase 5 (Cross-Experiment Synthesis). Invoke presenting-data skill. Document patterns, visualizations, and signal classification. Get user confirmation.
"I'll form hypotheses in Phase 6 for efficiency"
Why this fails: Phase 6 generates IDEAS, not hypotheses. Ideas need discovery (Phase 1) and hypothesis generation (Phase 2) in new sessions. Skipping these steps leads to untested assumptions and vague experiments.
Reality: Feed-forward cycle: Phase 6 ideas → new marketing-experimentation session → Phase 1 discovery → Phase 2 hypothesis generation → Phase 3-6 complete cycle.
What to do instead: Generate IDEAS in Phase 6. Start NEW marketing-experimentation session with selected idea. Complete Phase 1 and Phase 2 to formalize idea into testable hypotheses.
"I'll estimate ICE/RICE scores mentally instead of running the script"
Why this fails: Manual estimation introduces calculation errors and inconsistency. Mental math is unreliable for multiplication and division.
Reality: Python scripts ensure exact, reproducible results that can be audited and verified. Computational methods eliminate human error.
What to do instead: Use Python scripts (ICE or RICE calculator) from Phase 3 instructions. Update hypothesis data in script. Run script and document exact scores. Copy output table to prioritization document.
"I'll synthesize results mentally - no need to use presenting-data"
Why this fails: Mental synthesis loses details and creates false confidence. Cross-experiment patterns require systematic analysis.
Reality: presenting-data skill handles pattern identification (via interpreting-results), visualization creation (via creating-visualizations), and synthesis documentation. It ensures intellectual honesty and reproducibility.
What to do instead: Always invoke presenting-data skill in Phase 5. Provide aggregate results table. Request pattern analysis and visualizations. Document all findings from presenting-data output.
Summary
The marketing-experimentation skill ensures rigorous, evidence-based validation of marketing concepts through structured experimental cycles. This skill orchestrates the complete Build-Measure-Learn loop from concept to data-driven signal.
What this skill ensures:
Validated concepts through market research - market-researcher agent provides current demand signals, competitive landscape analysis, and audience insights before experimentation begins.
Strategic hypothesis generation - 5-10 testable hypotheses spanning multiple tactics (landing pages, ads, email, content) grounded in discovery findings and mapped to experimentation frameworks (Lean Startup, AARRR).
Data-driven prioritization - Computational methods (ICE/RICE Python scripts) ensure exact, reproducible scoring. Selection of 2-4 highest-value hypotheses optimizes resource allocation.
Multi-experiment coordination - Experiment tracker (living document) enables multi-conversation workflows spanning days or weeks. Status tracking (Planned, In Progress, Complete) maintains visibility across all experiments. Supports both quantitative and qualitative experiment types.
Methodological rigor through delegation - hypothesis-testing skill handles quantitative experiment design (statistical analysis, A/B tests, metrics). qualitative-research skill handles qualitative experiment design (interviews, surveys, focus groups, observations, thematic analysis). Marketing-experimentation coordinates multiple tests without duplicating methodology.
Cross-experiment synthesis - presenting-data skill identifies patterns across experiments (what works, what doesn't, what's unclear). Aggregate analysis reveals strategic insights invisible in single experiments.
Clear signal generation - Campaign-level classification (Positive/Negative/Null/Mixed) with strategic recommendations (Scale/Pivot/Refine/Pause) provides actionable guidance for stakeholders.
Systematic iteration - Phase 6 generates experiment IDEAS (not hypotheses) that feed into new marketing-experimentation sessions. Feed-forward cycle maintains rigor through repeated discovery and hypothesis generation.
Multi-conversation persistence - Complete documentation at every phase enables resumption after days or weeks. Experiment tracker serves as coordination hub. All artifacts are git-committable.
Tool-agnostic approach - Focuses on techniques (value proposition testing, targeting strategies, sequence optimization) rather than specific platforms. Applicable across marketing tools and channels.
Key principles:
- Discovery before experimentation (Phase 1 always first)
- Hypothesis generation separate from idea generation (Phase 2 vs Phase 6)
- Multiple experiments for pattern identification (2-4 minimum)
- Computational scoring for objectivity (Python scripts)
- Delegation for methodological rigor (hypothesis-testing for quantitative, qualitative-research for qualitative)
- Mixed-methods integration (quantitative metrics + qualitative insights for complete picture)
- Synthesis for strategic insight (presenting-data skill handles both quantitative and qualitative results)
- Documentation for reproducibility (numbered markdown files, git commits)
- Iteration through validated cycles (ideas → new sessions → discovery → hypotheses)