| name | experiment-analyzer |
| description | Analyze completed growth experiment results, validate hypotheses, generate insights, and suggest follow-up experiments. Use when experiments are completed, when the user asks about results or learnings, or when discussing what to do next based on experiment outcomes. |
| allowed-tools | Read, Write, Grep, Glob |
Experiment Analyzer Skill
Analyze completed growth experiments, extract insights, and drive continuous learning.
When to Activate
This skill should activate when:
- User marks experiment as "completed"
- User asks "what did we learn?"
- User mentions "results", "outcomes", or "analysis"
- User asks "what should we do next?"
- User wants to compare multiple experiments
- User asks about experiment success rates
Analysis Framework
1. Result Classification
Win (Positive + Significant)
- Result is better than baseline
- Statistical significance ≥ 95%
- Change is meaningful (usually ≥5%)
Loss (Negative + Significant)
- Result is worse than baseline
- Statistical significance ≥ 95%
- Change is meaningful
Inconclusive
- Statistical significance < 95%
- Not enough data to make decision
- Sample size may be insufficient
Neutral
- Minimal change (< ±2%)
- No meaningful impact either way
- May indicate hypothesis was off
2. Hypothesis Validation
Compare original hypothesis to results:
Hypothesis Components:
- Proposed change → Was it implemented as planned?
- Target audience → Did we reach the right users?
- Expected outcome → Did we hit the target?
- Rationale → Was our reasoning correct?
Validation Questions:
- Did we achieve the expected outcome? (Yes/No/Partially)
- Was the underlying assumption correct?
- What surprised us?
- What would we do differently?
3. ICE Score Retrospective
Compare predicted vs actual:
Impact Score Validation:
- Predicted Impact: [original score]
- Actual Impact: [calculate based on results]
- Delta: [difference]
- Learning: Was our impact prediction accurate?
Confidence Score Validation:
- Predicted Confidence: [original score]
- Outcome: [win/loss/inconclusive]
- Learning: Was our confidence justified?
Ease Score Validation:
- Predicted Ease: [original score]
- Actual Time: [if tracked]
- Learning: Was implementation as easy as expected?
4. Insight Generation
Key Questions:
- What worked? Specific elements that drove success
- What didn't work? Elements that failed or harmed metrics
- What was surprising? Unexpected findings
- What patterns emerge? Connections to other experiments
- What new questions arise? Areas to investigate further
Secondary Metrics:
- Review all secondary metrics tracked
- Look for unintended positive effects
- Watch for negative side effects
- Consider holistic impact
5. Follow-up Experiment Suggestions
Based on the outcome, suggest 2-3 follow-up experiments:
For Wins:
- Scale: Roll out to 100% of users
- Amplify: Make the winning element more prominent
- Extend: Apply pattern to related areas
- Optimize: Test variations to improve further
For Losses:
- Pivot: Try alternative approach to same problem
- Investigate: Run research to understand why
- Revert: Document and move on
- Learn: Apply learnings to future experiments
For Inconclusive:
- Re-run: Increase sample size or duration
- Simplify: Test smaller version to isolate variable
- Segment: Test with specific user segments
- Refine: Adjust hypothesis based on early signals
Analysis Process
Step 1: Load and Validate
1. Read experiment JSON from completed/archived folder
2. Verify results data exists:
- Primary metric
- Baseline value
- Result value
- Statistical significance
- Sample size
- Duration
3. Check if hypothesis is documented
4. Review ICE scores
Step 2: Calculate Key Metrics
Change Percentage = ((Result - Baseline) / Baseline) × 100
Result Classification:
- IF change% > 2% AND significance >= 95% → Win
- IF change% < -2% AND significance >= 95% → Loss
- IF significance < 95% → Inconclusive
- IF abs(change%) < 2% → Neutral
Step 3: Generate Insights
1. Classify result (Win/Loss/Inconclusive/Neutral)
2. Validate hypothesis against results
3. Review ICE score predictions
4. Extract key learnings
5. Identify surprising findings
6. Check secondary metrics
7. Look for patterns across related experiments
Step 4: Create Follow-up Ideas
1. Based on result type, brainstorm 2-3 follow-ups
2. For each follow-up:
- Draft hypothesis
- Explain rationale (reference current learnings)
- Suggest category
- Provide preliminary ICE estimate
3. Prioritize follow-ups by potential impact
Step 5: Generate Report
1. Create markdown analysis report
2. Include:
- Summary (result classification, key numbers)
- Hypothesis validation
- ICE score retrospective
- Key insights (bulleted list)
- Secondary metrics review
- Recommendations
- Follow-up experiment ideas
3. Save to experiments/archive/[id]_analysis.md
4. Update experiment JSON with learnings
Analysis Output Template
# Experiment Analysis: [Title]
**Date:** [Analysis date]
**Experiment ID:** [id]
**Status:** [Win/Loss/Inconclusive/Neutral] ✓/✗/?/○
## Summary
- **Primary Metric:** [metric name]
- **Baseline:** [baseline value]
- **Result:** [result value]
- **Change:** [+/-X%]
- **Statistical Significance:** [XX%]
- **Sample Size:** [count]
- **Duration:** [days]
## Hypothesis Validation
### Original Hypothesis
[Full hypothesis statement]
### Validation
- **Expected Outcome:** [what we expected]
- **Actual Outcome:** [what happened]
- **Hypothesis Validated:** [Yes/No/Partially]
**Analysis:**
[Explanation of whether and why hypothesis was validated]
## ICE Score Retrospective
| Component | Predicted | Actual/Assessment | Accuracy |
|-----------|-----------|------------------|----------|
| Impact | [score] | [calculate from results] | [good/overestimated/underestimated] |
| Confidence | [score] | [based on outcome] | [justified/overconfident/underconfident] |
| Ease | [score] | [based on actual effort] | [accurate/harder/easier] |
**Learnings for Future Scoring:**
- [What we learned about predicting impact]
- [What we learned about confidence]
- [What we learned about ease]
## Key Insights
1. **[Primary insight]** - [Explanation with data]
2. **[Secondary insight]** - [Explanation]
3. **[Surprising finding]** - [What we didn't expect]
## Secondary Metrics
| Metric | Change | Interpretation |
|--------|--------|----------------|
| [metric 1] | [+/-X%] | [Good/Bad/Neutral] |
| [metric 2] | [+/-X%] | [Good/Bad/Neutral] |
**Side Effects:**
- Positive: [Any unexpected positive impacts]
- Negative: [Any unexpected negative impacts]
## Recommendations
### Immediate Actions
- [ ] [Action item 1]
- [ ] [Action item 2]
### Strategic Implications
[Broader implications for product/growth strategy]
## Follow-up Experiment Ideas
### 1. [Experiment Title]
**Category:** [category]
**Hypothesis:**
[Full hypothesis following template]
**Rationale:**
[Why this follow-up based on current learnings]
**Preliminary ICE:**
- Impact: [score] - [reasoning]
- Confidence: [score] - [reasoning]
- Ease: [score] - [reasoning]
- **Total: [score]**
---
### 2. [Experiment Title]
[Repeat format]
---
### 3. [Experiment Title]
[Repeat format]
## Related Experiments
[List any related experiments and their outcomes for pattern recognition]
## Notes
[Any additional context, edge cases, or considerations]
Cross-Experiment Analysis
When user asks to analyze multiple experiments:
Metrics to Calculate:
- Success Rate: % of wins out of completed experiments
- Category Performance: Which funnel stages have best win rate?
- ICE Score Accuracy: How well do high-ICE experiments perform?
- Average Impact: What's the typical metric improvement?
- Cycle Time: Average days from backlog → completed
Pattern Recognition:
- Which types of experiments succeed most?
- Which audience segments respond best?
- Which testing methods are most reliable?
- What confidence levels actually predict success?
Portfolio View:
# Experiment Portfolio Analysis
## Overview
- Total Experiments: [count]
- Completed: [count]
- Win Rate: [X%]
- Average Change: [+X%]
## By Category
| Category | Experiments | Win Rate | Avg Impact |
|----------|-------------|----------|------------|
| Acquisition | [count] | [X%] | [+X%] |
| Activation | [count] | [X%] | [+X%] |
| Retention | [count] | [X%] | [+X%] |
| Revenue | [count] | [X%] | [+X%] |
| Referral | [count] | [X%] | [+X%] |
## ICE Score Performance
- Experiments with ICE > 500: [X% win rate]
- Experiments with ICE 300-500: [X% win rate]
- Experiments with ICE < 300: [X% win rate]
**Learning:** [Are high ICE scores actually better predictors?]
## Top Performers
1. [Experiment] - [+X%] change
2. [Experiment] - [+X%] change
3. [Experiment] - [+X%] change
## Key Patterns
- [Pattern 1 discovered across experiments]
- [Pattern 2]
- [Pattern 3]
## Recommendations
[Strategic recommendations based on portfolio analysis]
Integration Points
- Automatically trigger when
/experiment-updatesets status to "completed" - Work with ICE scorer skill to validate predictions
- Inform hypothesis generator with learnings
- Feed into metrics calculator for portfolio analysis
Continuous Improvement
After each analysis:
- Store learnings in a knowledge base
- Update ICE scoring calibration
- Refine hypothesis templates
- Build pattern library
- Improve follow-up suggestions