name	self-improving-ai
description	Understanding and using StickerNest's self-improving AI system. Use when the user asks about AI self-improvement, prompt versioning, reflection loops, AI evaluation, auto-tuning prompts, or the AI judge system. Covers AIReflectionService, stores, and the improvement loop.

Self-Improving AI System for StickerNest

This skill covers StickerNest's self-improving AI system - an AI that evaluates its own generations and automatically improves its prompts over time.

When to Use This Skill

This skill helps when you need to:

Understand how the self-improvement loop works
Configure the reflection system settings
Add new AI capabilities that should self-improve
Debug or tune the evaluation rubrics
Extend the improvement loop to new domains

Core Concepts

The Improvement Loop

The self-improving AI follows this cycle:

[Generation] → [Track Metrics] → [Evaluate] → [Analyze] → [Improve Prompt] → [Generation]
     ↓              ↓                ↓            ↓              ↓
  Widget/Image   MetricsStore   AIReflection   Suggestions   PromptVersion
                               Service (Judge)               Store

Key Components

Component	Purpose	Location
`AIReflectionStore`	Stores evaluations, runs, suggestions	`src/state/useAIReflectionStore.ts`
`PromptVersionStore`	Version control for AI prompts	`src/state/usePromptVersionStore.ts`
`GenerationMetricsStore`	Tracks generation quality	`src/state/useGenerationMetricsStore.ts`
`AIReflectionService`	The "judge" AI that evaluates	`src/ai/AIReflectionService.ts`
`SkillRecommendationService`	Suggests new skills	`src/ai/SkillRecommendationService.ts`
`ReflectionDashboard`	Admin UI panel	`src/components/ai-reflection/ReflectionDashboard.tsx`

Evaluation Rubrics

The system evaluates generations against rubrics with weighted criteria:

Widget Generation Rubric:

Protocol Compliance (25%) - Follows Widget Protocol v3.0
Code Quality (20%) - Clean, readable code
Functionality (25%) - Works correctly
Port Design (15%) - Good input/output definitions
User Experience (15%) - Visual design and interaction

Image Generation Rubric:

Prompt Accuracy (30%) - Matches user intent
Visual Quality (25%) - Clear, well-composed
Style Consistency (20%) - Matches requested style
Usability (25%) - Suitable for design use

Step-by-Step Guide

Step 1: Recording a Generation

When AI generates something, record it in the metrics store:

import { useGenerationMetricsStore } from '../state/useGenerationMetricsStore';

// After generation completes
const metricsStore = useGenerationMetricsStore.getState();
const recordId = metricsStore.addRecord({
  type: 'widget', // or 'image', 'pipeline', 'skill'
  promptVersionId: currentPromptVersionId,
  userPrompt: userInput,
  result: success ? 'success' : 'failure',
  errorMessage: error?.message,
  qualityScore: validationScore, // 0-100 if available
  metadata: {
    model: 'claude-3-5-sonnet',
    provider: 'anthropic',
    durationMs: elapsed,
  },
});

Step 2: Adding User Feedback

Capture user feedback on generations:

// Thumbs up/down
metricsStore.addFeedback(recordId, 'thumbs_up');

// Star rating
metricsStore.addFeedback(recordId, 'rating', 4);

// With comment and tags
metricsStore.addFeedback(recordId, 'rating', 2, 'Output was too verbose', ['too_long', 'verbose']);

Step 3: Running a Reflection

Trigger a reflection manually or let it run on schedule:

import { reflectOnWidgetGeneration } from '../ai/AIReflectionService';

// Manual reflection
const result = await reflectOnWidgetGeneration({ forceRun: true });

console.log('Evaluation passed:', result.evaluation?.passed);
console.log('Prompt changed:', result.promptChanged);
console.log('New suggestions:', result.suggestions.length);

Step 4: Managing Prompt Versions

Handle prompt version control:

import { usePromptVersionStore } from '../state/usePromptVersionStore';

const promptStore = usePromptVersionStore.getState();

// Get current prompt for a domain
const currentPrompt = promptStore.getActivePrompt('widget_generation');

// Create a new version
const versionId = promptStore.createVersion(
  'widget_generation',
  newPromptContent,
  'Improved based on reflection',
  'ai', // created by AI
  evaluationId
);

// Revert to previous version
promptStore.revertToVersion(previousVersionId);

// Handle pending proposals
const proposals = promptStore.getPendingProposals('widget_generation');
proposals.forEach(p => {
  // Review and approve/reject
  promptStore.approveProposal(p.id);
  // or promptStore.rejectProposal(p.id);
});

Step 5: Configuring the Reflection Loop

Adjust reflection settings:

import { useAIReflectionStore } from '../state/useAIReflectionStore';

const reflectionStore = useAIReflectionStore.getState();

reflectionStore.updateConfig({
  enabled: true,
  intervalMinutes: 60,        // How often to reflect
  messagesToEvaluate: 20,     // How many records to evaluate
  scoreThreshold: 3.5,        // Pass/fail threshold (1-5)
  cooldownMinutes: 30,        // Pause after prompt update
  autoApplyChanges: false,    // Require approval for changes
  evaluateUnevaluatedOnly: true,
});

Code Examples

Example: Custom Rubric for New Domain

import { useAIReflectionStore, type RubricCriteria } from '../state/useAIReflectionStore';

const customRubric: RubricCriteria[] = [
  {
    name: 'Accuracy',
    description: 'Output matches expected format and content',
    weight: 0.4,
    minScore: 1,
    maxScore: 5,
  },
  {
    name: 'Efficiency',
    description: 'Uses optimal approach without waste',
    weight: 0.3,
    minScore: 1,
    maxScore: 5,
  },
  {
    name: 'Maintainability',
    description: 'Easy to understand and modify',
    weight: 0.3,
    minScore: 1,
    maxScore: 5,
  },
];

const reflectionStore = useAIReflectionStore.getState();
reflectionStore.setWidgetRubric(customRubric);

Example: Tracking Skill Gaps

import { analyzeSkillGaps, generateSkillFromGap } from '../ai/SkillRecommendationService';

// Analyze patterns for potential new skills
const gaps = analyzeSkillGaps();

// Find high-priority gaps
const criticalGaps = gaps.filter(g => g.priority === 'critical' || g.priority === 'high');

// Generate a skill template for a gap
if (criticalGaps.length > 0) {
  const template = generateSkillFromGap(criticalGaps[0].id);
  console.log('Suggested skill:', template?.name);
  console.log('Content:', template?.content);
}

Example: Using the Reflection Dashboard

import { ReflectionDashboard } from '../components/ai-reflection';
import { useState } from 'react';

function MyComponent() {
  const [showDashboard, setShowDashboard] = useState(false);

  return (
    <>
      <button onClick={() => setShowDashboard(true)}>
        Open AI Dashboard
      </button>
      <ReflectionDashboard
        isOpen={showDashboard}
        onClose={() => setShowDashboard(false)}
      />
    </>
  );
}

Common Patterns

Pattern: Adding Self-Improvement to a New AI Feature

Add a prompt domain to PromptVersionStore
Track generations in GenerationMetricsStore
Create a rubric for evaluation
Add reflection trigger to AIReflectionService

Pattern: Manual Prompt Improvement

When you want to update a prompt based on observations:

const promptStore = usePromptVersionStore.getState();

// Create proposal for review
promptStore.createProposal(
  'widget_generation',
  improvedPromptContent,
  'User requested more concise outputs',
  ['User feedback: too verbose', 'Multiple complaints about length'],
  'manual-review'
);

Pattern: Exporting Data for Analysis

const metricsStore = useGenerationMetricsStore.getState();

const analysisData = metricsStore.exportForReflection('widget', {
  limit: 100,
  includeFailuresOnly: true,
});

console.log('Failure rate:', 100 - analysisData.metrics.successRate);
console.log('Common issues:', analysisData.metrics.commonIssues);

Reference Files

Category	File
Reflection Store	`src/state/useAIReflectionStore.ts`
Prompt Versions	`src/state/usePromptVersionStore.ts`
Generation Metrics	`src/state/useGenerationMetricsStore.ts`
Reflection Service	`src/ai/AIReflectionService.ts`
Skill Recommendations	`src/ai/SkillRecommendationService.ts`
Dashboard UI	`src/components/ai-reflection/ReflectionDashboard.tsx`

Troubleshooting

Issue: Reflection loop not running

Cause: Cooldown period active or no unevaluated records Fix: Check isInCooldown() and getUnevaluatedRecords(). Use forceRun: true to bypass cooldown.

Issue: Prompts changing too frequently

Cause: Score threshold too high or auto-apply enabled Fix: Lower scoreThreshold, disable autoApplyChanges, increase cooldownMinutes

Issue: AI judge too lenient

Cause: Rubric weights favor passing, or system prompt too forgiving Fix: Modify rubric weights, update the reflection_judge prompt in PromptVersionStore

Issue: Missing evaluations

Cause: Generations not being recorded to metrics store Fix: Ensure addRecord() is called after every generation with proper metadata

self-improving-ai

Install Skill

SKILL.md