| name | effectiveness-tracking |
| description | Tracks task success metrics to improve future model routing. Use when completing any implementation, refactor, bugfix, or testing task. |
Effectiveness Tracking Protocol
On Task Completion
After tests pass and task is complete, log effectiveness data for smart routing learning.
Required Metrics
Task Pattern: Which pattern did this task follow?
architecture- System design, integration planningmulti-file-refactor- Restructuring 5+ filesbugfix- Fixing broken functionalityfeature- Adding new capability (2-5 files)testing- Writing test coveragedocumentation- Markdown/docs updates
Model Used: Which model completed this task
- Example:
claude-sonnet-4.5,claude-opus-4.5,gpt-5
- Example:
Prompts Used: Total conversation turns to completion
- Count from task start to passing tests
Success: Did tests pass?
trueorfalse
Timestamp: When completed
- ISO 8601 format
Storage Format
Append to dev-data/effectiveness-log.jsonl (JSON Lines format):
{"pattern":"multi-file-refactor","model":"claude-sonnet-4.5","prompts":2,"success":true,"timestamp":"2026-01-03T12:00:00Z","files_changed":7}
{"pattern":"bugfix","model":"claude-haiku","prompts":1,"success":true,"timestamp":"2026-01-03T12:15:00Z","files_changed":1}
{"pattern":"feature","model":"claude-opus-4.5","prompts":3,"success":true,"timestamp":"2026-01-03T12:30:00Z","files_changed":4}
Querying Historical Data
Before starting complex tasks, query effectiveness log:
// Example: What's the best model for multi-file refactor?
const logs = readJsonLines('dev-data/effectiveness-log.jsonl');
const pattern = logs.filter(l => l.pattern === 'multi-file-refactor' && l.success);
const byModel = groupBy(pattern, 'model');
const avgPrompts = {
opus: average(byModel.opus.map(l => l.prompts)),
sonnet: average(byModel.sonnet.map(l => l.prompts)),
haiku: average(byModel.haiku.map(l => l.prompts))
};
// Recommend model with lowest average prompts
Integration with Smart Router
The effectiveness log feeds the smart routing recommendation engine:
- User starts task → pattern detected
- Query log for similar past tasks
- Calculate average prompts by model
- Recommend model with best effectiveness (fewest prompts)
- Show cost as secondary info
Example Recommendation
📊 Smart Router Recommendation
Task Pattern: Multi-file refactor (7 files)
Historical Data: 15 similar tasks
Model Performance:
• Opus: avg 2.3 prompts (90% success) ← RECOMMENDED
• Sonnet: avg 4.8 prompts (75% success)
• Haiku: avg 8.2 prompts (50% success)
Cost Delta: +0.8 credits for Opus vs Sonnet
Prompts Saved: ~2.5 prompts (worth the cost)
Auto-Logging
If possible, log automatically on task completion:
// Backend middleware example
app.post('/api/task/complete', (req, res) => {
const log = {
pattern: classifyTask(req.body.description),
model: req.body.model,
prompts: req.body.conversationLength,
success: req.body.testsPassed,
timestamp: new Date().toISOString(),
files_changed: req.body.filesChanged
};
appendToFile('dev-data/effectiveness-log.jsonl', JSON.stringify(log) + '\n');
});