| name | self-improvement-engineer-role |
| description | Role assignment for Claude Agent |
Agent #7: Self-Improvement Engineer - Role Assignment
๐ฏ Your Mission
You are Agent #7 in the Lead Hunter Prime multi-agent system. Your SOLE responsibility is to build the self-improvement engine that makes the lead system smarter over time.
What you BUILD:
- โ Daily metrics cron job (analyzes performance daily)
- โ Feedback analysis system (learns from agent feedback)
- โ Bottleneck detection (identifies where leads drop off)
- โ Skill spawning logic (creates targeted improvements)
- โ A/B testing framework (validates improvements)
What you DO NOT build:
- โ Database schema (Agent #1 already built this - it's DONE)
- โ Validation API (Agent #5 is building that)
- โ Ingestion API (Agent #6 is building that)
- โ Dashboard (Week 2 - separate team)
- โ N8N workflows (Agent #2 - separate system)
๐ Database Schema (Already Built)
Agent #1 created these tables for you. DO NOT modify them.
Your Input Tables (Read-Only)
lh_feedback_log- Agent feedback on lead qualitylh_validation_history- Validation attempt logslh_lead_status- Lead progression through pipelinelh_contacts- Contact validation statuslh_properties- Property distress data
Your Output Tables (Read/Write)
lh_daily_metrics- Write aggregate performance datalh_pattern_performance- Track pattern accuracy over timelh_spawned_skills- Log skills you createlh_ab_tests- Track experiments and results
Helper Function Available
-- Calculates and updates daily metrics
SELECT lh_update_daily_metrics(); -- for today
SELECT lh_update_daily_metrics('2025-01-20'); -- for specific date
Your job: Call this function daily via cron, then analyze the results.
๐ The Self-Improvement Loop
1. MONITOR โ Run daily metrics cron (12:01 AM)
2. ANALYZE โ Identify bottlenecks (conversion drops, slow response, poor quality)
3. SPAWN โ Create targeted skill to fix problem
4. TEST โ Run A/B test with statistical rigor
5. DECIDE โ Keep (if lift > 10%, p < 0.05), Kill (if no impact), or Continue
6. REPEAT โ Forever
๐ What You Need to Build
Component #1: Daily Metrics Cron Job
File: supabase/functions/lead-hunter-daily-metrics/index.ts
Purpose: Runs every day at 12:01 AM to calculate performance metrics.
What it does:
- Calls
lh_update_daily_metrics()function (Agent #1 built this) - Analyzes results to detect bottlenecks:
- Conversion rate drops (e.g., contacted โ qualified dropped from 60% to 40%)
- Response time slowdowns (e.g., avg response time increased from 2h to 8h)
- Validation accuracy degradation (e.g., business detection dropped from 87% to 75%)
- If bottleneck detected โ Trigger skill spawning logic
Schedule: Daily via Supabase Edge Function cron
Example Implementation:
// supabase/functions/lead-hunter-daily-metrics/index.ts
import { serve } from "https://deno.land/std@0.168.0/http/server.ts";
import { createClient } from "https://esm.sh/@supabase/supabase-js@2";
serve(async (req) => {
const supabase = createClient(
Deno.env.get("SUPABASE_URL")!,
Deno.env.get("SUPABASE_SERVICE_ROLE_KEY")!
);
// Step 1: Update daily metrics
const { data: metricsResult, error: metricsError } = await supabase
.rpc('lh_update_daily_metrics');
if (metricsError) {
return new Response(JSON.stringify({ error: metricsError }), { status: 500 });
}
// Step 2: Fetch today's metrics
const { data: todayMetrics, error: fetchError } = await supabase
.from('lh_daily_metrics')
.select('*')
.eq('metric_date', new Date().toISOString().split('T')[0])
.single();
if (fetchError || !todayMetrics) {
return new Response(JSON.stringify({ error: fetchError }), { status: 500 });
}
// Step 3: Detect bottlenecks
const bottlenecks = await detectBottlenecks(supabase, todayMetrics);
// Step 4: Spawn skills if needed
for (const bottleneck of bottlenecks) {
await considerSpawningSkill(supabase, bottleneck);
}
return new Response(
JSON.stringify({
success: true,
metrics: todayMetrics,
bottlenecks
}),
{ headers: { "Content-Type": "application/json" } }
);
});
Component #2: Bottleneck Detection Logic
Purpose: Analyze metrics to identify where the system is underperforming.
Common Bottleneck Patterns:
Low Response Rate
- Detection:
avg_response_time_hours > 4ORcontacted_count / new_leads_count < 0.6 - Action: Spawn "instant-responder" skill
- Detection:
Poor Qualification
- Detection:
qualified_count / contacted_count < 0.3 - Action: Spawn "lead-scorer" skill
- Detection:
Validation Accuracy Drop
- Detection:
validation_accuracy < 0.80(from lh_pattern_performance) - Action: Retrain business patterns, spawn "pattern-optimizer" skill
- Detection:
Slow Follow-Up
- Detection: Average time from contacted โ qualified > 5 days
- Action: Spawn "follow-up-sequencer" skill
Weak Lead Sources
- Detection: Specific source has conversion rate < 10% (while others are 30%+)
- Action: Spawn "source-optimizer" skill
Example Detection Function:
async function detectBottlenecks(supabase, todayMetrics) {
const bottlenecks = [];
// Bottleneck #1: Low response rate
if (todayMetrics.avg_response_time_hours > 4) {
bottlenecks.push({
type: 'low_response_rate',
severity: 'high',
metric: 'avg_response_time_hours',
current_value: todayMetrics.avg_response_time_hours,
target_value: 2,
suggested_skill: 'instant-responder'
});
}
// Bottleneck #2: Poor qualification rate
const qualificationRate = todayMetrics.qualified_count / todayMetrics.contacted_count;
if (qualificationRate < 0.3) {
bottlenecks.push({
type: 'poor_qualification',
severity: 'medium',
metric: 'qualification_rate',
current_value: qualificationRate,
target_value: 0.5,
suggested_skill: 'lead-scorer'
});
}
// Bottleneck #3: Validation accuracy drop
const { data: patternPerf } = await supabase
.from('lh_pattern_performance')
.select('avg_f1_score')
.order('performance_date', { ascending: false })
.limit(1)
.single();
if (patternPerf && patternPerf.avg_f1_score < 0.80) {
bottlenecks.push({
type: 'validation_accuracy_drop',
severity: 'high',
metric: 'f1_score',
current_value: patternPerf.avg_f1_score,
target_value: 0.87,
suggested_skill: 'pattern-optimizer'
});
}
return bottlenecks;
}
Component #3: Skill Spawning Logic
Purpose: When a bottleneck is detected, decide whether to spawn a new skill or continue testing an existing one.
Decision Tree:
- Check if skill already exists for this bottleneck type
- If exists and currently testing โ Continue test, don't spawn duplicate
- If exists and killed โ Check if enough time has passed to retry (30 days)
- If doesn't exist โ Spawn new skill with A/B test plan
Example Spawning Function:
async function considerSpawningSkill(supabase, bottleneck) {
// Check if we already have a skill for this bottleneck
const { data: existingSkill } = await supabase
.from('lh_spawned_skills')
.select('*')
.eq('skill_type', bottleneck.suggested_skill)
.order('spawned_at', { ascending: false })
.limit(1)
.single();
// If skill is currently being tested, don't spawn duplicate
if (existingSkill && existingSkill.status === 'testing') {
console.log(`Skill ${bottleneck.suggested_skill} already testing`);
return;
}
// If skill was killed recently (< 30 days), don't retry yet
if (existingSkill && existingSkill.status === 'killed') {
const daysSinceKilled = (Date.now() - new Date(existingSkill.decision_made_at).getTime()) / (1000 * 60 * 60 * 24);
if (daysSinceKilled < 30) {
console.log(`Skill ${bottleneck.suggested_skill} killed recently, waiting ${30 - daysSinceKilled} more days`);
return;
}
}
// Spawn new skill
const { data: newSkill, error } = await supabase
.from('lh_spawned_skills')
.insert({
skill_type: bottleneck.suggested_skill,
skill_name: `Auto-spawned ${bottleneck.suggested_skill} v${Date.now()}`,
problem_detected: bottleneck.type,
target_metric: bottleneck.metric,
baseline_value: bottleneck.current_value,
target_value: bottleneck.target_value,
status: 'testing',
spawned_at: new Date().toISOString()
})
.select()
.single();
if (error) {
console.error('Failed to spawn skill:', error);
return;
}
// Create A/B test for this skill
await createABTest(supabase, newSkill, bottleneck);
console.log(`Spawned new skill: ${newSkill.skill_name}`);
}
Component #4: A/B Testing Framework
Purpose: Validate that a spawned skill actually improves metrics before rolling it out to 100% of leads.
Test Structure:
- Control group: 50% of leads get existing process
- Treatment group: 50% of leads get new skill
- Minimum sample: 100 leads per group
- Significance threshold: p-value < 0.05
- Success criteria: Lift > 10%
Example A/B Test Creation:
async function createABTest(supabase, skill, bottleneck) {
const { data: test, error } = await supabase
.from('lh_ab_tests')
.insert({
test_name: `Test: ${skill.skill_name}`,
skill_id: skill.id,
hypothesis: `${skill.skill_type} will improve ${bottleneck.metric} from ${bottleneck.current_value} to ${bottleneck.target_value}`,
control_description: 'Existing process (no intervention)',
treatment_description: `Apply ${skill.skill_type} intervention`,
metric_tracked: bottleneck.metric,
traffic_allocation_pct: 50,
min_sample_size: 100,
confidence_level: 0.95,
status: 'running',
started_at: new Date().toISOString()
})
.select()
.single();
if (error) {
console.error('Failed to create A/B test:', error);
return;
}
console.log(`Created A/B test: ${test.test_name}`);
}
Assignment Logic (for other agents to implement): When a new lead comes in, check if any A/B tests are running. If yes, assign lead to control or treatment group:
// This logic would be added to Agent #6 (Ingestion API)
const leadId = newLead.id;
const isControl = leadId % 2 === 0; // Simple 50/50 split
if (isControl) {
// Apply existing process
} else {
// Apply treatment (call spawned skill)
}
// Log assignment
await supabase.from('lh_lead_status').update({
ab_test_group: isControl ? 'control' : 'treatment'
}).eq('id', leadId);
Component #5: Test Results Analysis
Purpose: After sufficient sample size, analyze results and decide: keep, kill, or continue testing.
File: supabase/functions/lead-hunter-ab-analysis/index.ts
Run frequency: Weekly (every Monday)
Analysis Steps:
- Fetch all running A/B tests
- For each test:
- Check if min sample size reached (100 per group)
- Calculate conversion rates for control vs treatment
- Calculate lift (% improvement)
- Calculate p-value (statistical significance)
- Make decision:
- KEEP: lift > 10% AND p < 0.05 โ Deploy to 100% of leads
- KILL: lift < 5% OR p > 0.10 โ Revert to control
- CONTINUE: Promising but not significant โ Let run longer
Example Analysis Function:
async function analyzeABTest(supabase, test) {
// Fetch results for this test
const { data: controlResults } = await supabase
.from('lh_lead_status')
.select('id, status')
.eq('ab_test_id', test.id)
.eq('ab_test_group', 'control');
const { data: treatmentResults } = await supabase
.from('lh_lead_status')
.select('id, status')
.eq('ab_test_id', test.id)
.eq('ab_test_group', 'treatment');
// Check sample size
if (!controlResults || !treatmentResults) return;
if (controlResults.length < test.min_sample_size || treatmentResults.length < test.min_sample_size) {
console.log(`Test ${test.test_name} needs more data`);
return;
}
// Calculate conversion rates
const controlConversions = controlResults.filter(l => ['qualified', 'closed'].includes(l.status)).length;
const treatmentConversions = treatmentResults.filter(l => ['qualified', 'closed'].includes(l.status)).length;
const controlRate = controlConversions / controlResults.length;
const treatmentRate = treatmentConversions / treatmentResults.length;
const lift = ((treatmentRate - controlRate) / controlRate) * 100;
// Calculate p-value (simplified chi-square test)
const pValue = calculatePValue(controlConversions, controlResults.length, treatmentConversions, treatmentResults.length);
// Make decision
let recommendation = 'CONTINUE';
let winner = null;
if (lift > 10 && pValue < 0.05) {
recommendation = 'KEEP';
winner = 'treatment';
} else if (lift < 5 || pValue > 0.10) {
recommendation = 'KILL';
winner = 'control';
}
// Update test results
await supabase.from('lh_ab_tests').update({
control_sample_size: controlResults.length,
treatment_sample_size: treatmentResults.length,
control_conversion_rate: controlRate,
treatment_conversion_rate: treatmentRate,
lift_pct: lift,
p_value: pValue,
winner,
recommendation,
ended_at: recommendation !== 'CONTINUE' ? new Date().toISOString() : null,
status: recommendation !== 'CONTINUE' ? 'completed' : 'running'
}).eq('id', test.id);
// Update spawned skill status
if (recommendation === 'KEEP') {
await supabase.from('lh_spawned_skills').update({
status: 'active',
actual_lift_pct: lift,
decision_made_at: new Date().toISOString()
}).eq('id', test.skill_id);
} else if (recommendation === 'KILL') {
await supabase.from('lh_spawned_skills').update({
status: 'killed',
actual_lift_pct: lift,
decision_made_at: new Date().toISOString()
}).eq('id', test.skill_id);
}
console.log(`Test ${test.test_name} - Recommendation: ${recommendation}, Lift: ${lift.toFixed(2)}%, p-value: ${pValue.toFixed(4)}`);
}
function calculatePValue(controlSuccesses, controlTotal, treatmentSuccesses, treatmentTotal) {
// Simplified chi-square test for proportions
const p1 = controlSuccesses / controlTotal;
const p2 = treatmentSuccesses / treatmentTotal;
const pPool = (controlSuccesses + treatmentSuccesses) / (controlTotal + treatmentTotal);
const se = Math.sqrt(pPool * (1 - pPool) * (1/controlTotal + 1/treatmentTotal));
const z = Math.abs(p2 - p1) / se;
// Convert z-score to p-value (two-tailed)
const pValue = 2 * (1 - normalCDF(z));
return pValue;
}
function normalCDF(z) {
// Approximation of normal CDF
return 0.5 * (1 + erf(z / Math.sqrt(2)));
}
function erf(x) {
// Approximation of error function
const sign = x >= 0 ? 1 : -1;
x = Math.abs(x);
const a1 = 0.254829592;
const a2 = -0.284496736;
const a3 = 1.421413741;
const a4 = -1.453152027;
const a5 = 1.061405429;
const p = 0.3275911;
const t = 1.0 / (1.0 + p * x);
const y = 1.0 - (((((a5 * t + a4) * t) + a3) * t + a2) * t + a1) * t * Math.exp(-x * x);
return sign * y;
}
๐ File Structure You Need to Create
supabase/functions/
โโโ lead-hunter-daily-metrics/
โ โโโ index.ts # Daily cron job (12:01 AM)
โโโ lead-hunter-ab-analysis/
โโโ index.ts # Weekly A/B test analysis (Monday)
โ Success Criteria
Your mission is complete when:
โ Daily metrics cron job runs successfully
- Calls
lh_update_daily_metrics()function - Detects bottlenecks accurately
- Spawns skills when appropriate
- Calls
โ Bottleneck detection works
- Identifies low response rate (avg > 4 hours)
- Identifies poor qualification (rate < 30%)
- Identifies validation accuracy drops (< 80%)
โ Skill spawning logic works
- Doesn't spawn duplicates
- Creates skill records in
lh_spawned_skills - Creates corresponding A/B tests
โ A/B testing framework works
- Tests run with 50/50 split
- Sample size tracked correctly
- Statistical analysis is rigorous
โ Test results analysis works
- Correctly calculates lift and p-value
- Makes KEEP/KILL/CONTINUE decisions
- Updates skill status accordingly
๐งช How to Test
Test #1: Daily Metrics Cron
# Manually trigger the function
curl -X POST https://your-project.supabase.co/functions/v1/lead-hunter-daily-metrics \
-H "Authorization: Bearer YOUR_ANON_KEY"
# Check lh_daily_metrics table
psql -c "SELECT * FROM lh_daily_metrics ORDER BY metric_date DESC LIMIT 1;"
Test #2: Bottleneck Detection
# Insert test data with poor performance
psql -c "
INSERT INTO lh_lead_status (property_id, status, assigned_to, contacted_at, created_at)
VALUES
(1, 'contacted', 'agent-1', NOW() - INTERVAL '10 hours', NOW() - INTERVAL '12 hours'),
(2, 'contacted', 'agent-1', NOW() - INTERVAL '8 hours', NOW() - INTERVAL '10 hours');
"
# Run metrics cron
curl -X POST https://your-project.supabase.co/functions/v1/lead-hunter-daily-metrics
# Should detect "low_response_rate" bottleneck
Test #3: Skill Spawning
# After bottleneck detected, check lh_spawned_skills
psql -c "SELECT * FROM lh_spawned_skills ORDER BY spawned_at DESC LIMIT 1;"
# Should see a new skill with status = 'testing'
# Check A/B test created
psql -c "SELECT * FROM lh_ab_tests ORDER BY started_at DESC LIMIT 1;"
Test #4: A/B Test Analysis
# Insert test results (100 control, 100 treatment)
# Control: 30% conversion, Treatment: 45% conversion
# (Insert 100 leads per group with appropriate conversion rates)
# Run analysis
curl -X POST https://your-project.supabase.co/functions/v1/lead-hunter-ab-analysis
# Check recommendation
psql -c "SELECT test_name, lift_pct, p_value, recommendation FROM lh_ab_tests WHERE status = 'completed';"
# Should show KEEP with ~50% lift
๐จ Common Mistakes to Avoid
DON'T modify database schema
- Agent #1 already built all tables
- If you need a column, ask Kelvin first
DON'T spawn duplicate skills
- Always check if skill type already exists and is testing
DON'T call winners early
- Wait for min sample size (100 per group)
- Require statistical significance (p < 0.05)
DON'T confound experiments
- Only run ONE experiment per bottleneck at a time
- Don't test multiple skills simultaneously on same metric
DON'T ignore segment differences
- A skill might work for Zillow leads but not referrals
- Track performance by lead source
๐ Integration with Other Agents
Agent #5: Validation API
- You analyze their validation results in
lh_validation_history - If accuracy drops, you spawn "pattern-optimizer" skill
- They implement your skill recommendations
Agent #6: Ingestion API
- You detect if response time is slow
- You spawn "instant-responder" skill
- They implement the auto-response logic
Dashboard (Week 2)
- You populate
lh_daily_metricsfor their charts - You track experiments in
lh_ab_testsfor their experiment monitor - They display your skill performance in admin view
๐ Priority Order
Build in this order:
- Week 1: Daily metrics cron + basic bottleneck detection
- Week 1: Skill spawning logic (just log to table, don't implement skills yet)
- Week 2: A/B testing framework
- Week 2: Test results analysis and decision-making
- Week 3+: Implement actual skills (instant-responder, lead-scorer, etc.)
๐ฏ Key Principles
- Measure first, optimize second - Don't spawn skills without baseline data
- One variable at a time - Only test one improvement per metric
- Statistical rigor - Don't trust results without proper sample size and significance
- Kill bad ideas fast - Don't let ego keep underperforming skills alive
- Document everything - Log why skills were spawned, what they tested, results
๐ฅ You Got This
You're building the brain of Lead Hunter Prime. The system that makes it smarter over time. The AI that has a will to get better.
Focus on:
- Rock-solid bottleneck detection
- Rigorous A/B testing
- Clear KEEP/KILL decisions
Don't worry about:
- Making skills perfect on first try (tests will tell you if they work)
- Building all skill implementations now (spawn them first, implement later)
- Dashboard integration (Week 2 team handles that)
Ready? Start with the daily metrics cron. Get that running, then build bottleneck detection on top of it.
๐ LET'S SHIP THE BRAIN!