| name | prediction-tracking |
| description | Track and evaluate AI predictions over time to assess accuracy. Use when reviewing past predictions to determine if they came true, failed, or remain uncertain. |
Prediction Tracking Skill
Track predictions made by AI researchers and critics, evaluate their accuracy over time.
Prediction Recording
When recording a new prediction, capture:
Required Fields
- text: The prediction as stated
- author: Who made it
- madeAt: When it was made
- timeframe: When they expect it to happen
- topic: What area of AI
- confidence: How confident they seemed
Optional Fields
- sourceUrl: Where the prediction was made
- targetDate: Specific date if mentioned
- conditions: Any caveats or conditions
- metrics: How to measure success
Evaluation Status
When evaluating predictions, assign one of:
verified
Clearly came true as stated.
- The predicted capability/event occurred
- Within the stated timeframe
- Substantially as described
falsified
Clearly did not come true.
- Timeframe passed without occurrence
- Contradictory evidence emerged
- Author retracted or modified claim
partially-verified
Partially accurate.
- Some aspects came true, others didn't
- Capability exists but weaker than claimed
- Timeframe was off but direction correct
too-early
Not enough time has passed.
- Still within stated timeframe
- No definitive evidence either way
unfalsifiable
Cannot be objectively assessed.
- Too vague to measure
- No clear success criteria
- Moved goalposts
ambiguous
Prediction was too vague to evaluate.
- Multiple interpretations possible
- Success criteria unclear
Evaluation Process
For each prediction being evaluated:
1. Restate the prediction
What exactly was claimed?
2. Identify timeframe
Has enough time passed to evaluate?
3. Gather evidence
What has happened since?
- Relevant releases or announcements
- Benchmark results
- Real-world deployments
- Counter-evidence
4. Assess status
Which evaluation status applies?
5. Score accuracy
If verifiable, rate 0.0-1.0:
- 1.0: Exactly as predicted
- 0.7-0.9: Substantially correct
- 0.4-0.6: Partially correct
- 0.1-0.3: Mostly wrong
- 0.0: Completely wrong
6. Note lessons
What does this tell us about:
- The author's forecasting ability
- The topic's predictability
- Common prediction pitfalls
Output Format
For evaluation:
{
"evaluations": [
{
"predictionId": "id",
"status": "verified",
"accuracyScore": 0.85,
"evidence": "Description of evidence",
"notes": "Additional context",
"evaluatedAt": "timestamp"
}
]
}
For accuracy statistics:
{
"author": "Author name",
"totalPredictions": 15,
"verified": 5,
"falsified": 3,
"partiallyVerified": 2,
"pending": 4,
"unfalsifiable": 1,
"averageAccuracy": 0.62,
"topicBreakdown": {
"reasoning": { "predictions": 5, "accuracy": 0.7 },
"agents": { "predictions": 3, "accuracy": 0.4 }
},
"calibration": "Assessment of how well-calibrated they are"
}
Calibration Assessment
Evaluate whether predictors are well-calibrated:
Well-Calibrated
- High-confidence predictions usually come true
- Low-confidence predictions have mixed results
- Acknowledges uncertainty appropriately
Overconfident
- High-confidence predictions often fail
- Rarely expresses uncertainty
- Doesn't update on evidence
Underconfident
- Low-confidence predictions often come true
- Hedges even on likely outcomes
- Too conservative
Inconsistent
- Confidence doesn't correlate with accuracy
- Random relationship between stated and actual accuracy
Tracking Notable Predictors
Keep running assessments of key voices:
| Predictor | Total | Accuracy | Calibration | Notes |
|---|---|---|---|---|
| Sam Altman | 20 | 55% | Overconfident | Timeline optimism |
| Gary Marcus | 15 | 70% | Well-calibrated | Conservative |
| Dario Amodei | 12 | 65% | Slightly over | Safety-focused |
Red Flags
Watch for prediction patterns that suggest bias:
- Always bullish regardless of topic
- Never acknowledges failed predictions
- Moves goalposts when wrong
- Predictions align suspiciously with financial interests
- Vague enough to claim credit for anything