| name | confidence-check |
| description | 5-check quantitative validation ensuring 90% confidence before implementation. Prevents wrong-direction work through systematic verification: duplicate check (25%), architecture compliance (25%), official docs (20%), working OSS (15%), root cause (15%). Thresholds: ≥90% proceed, ≥70% clarify, <70% STOP. Proven 25-250x token ROI from SuperClaude. |
| skill-type | QUANTITATIVE |
| shannon-version | >=4.0.0 |
| mcp-requirements | [object Object] |
| required-sub-skills | spec-analysis |
| optional-sub-skills | mcp-discovery |
| allowed-tools | Read, Grep, Glob, Bash, Serena, Tavily, GitHub |
Confidence Check
Overview
Purpose: Shannon's quantitative 5-check validation algorithm prevents wrong-direction work by ensuring ≥90% confidence before implementation. Each check contributes weighted points (total 100%) across duplicate verification, architecture compliance, official documentation, working OSS references, and root cause identification.
Critical Role: This skill prevents the most expensive failure mode in software development - building the right thing wrong, or building the wrong thing right. Proven 25-250x token ROI in SuperClaude production use.
Inputs
Required:
specification(string): Implementation request or feature description from usercontext(object): Optional context from spec-analysis skill (8D complexity scores, phase plan)
Optional:
skip_checks(array): List of checks to skip (e.g., ["oss", "root_cause"] for simple tasks)confidence_threshold(float): Override default 0.90 threshold (e.g., 0.85 for fast iterations)
Anti-Rationalization (Critical - Read First)
WARNING: Agents systematically rationalize skipping confidence checks. Below are the 6 most dangerous rationalizations detected in production, with mandatory counters.
Rationalization 1: "User seems confident, proceed"
Example: User says "I'm 75% sure this is right" → Agent responds "Let's proceed..."
COUNTER:
- ❌ NEVER accept user's confidence level without running 5-check algorithm
- ✅ User intuition is ONE data point (cognitive), not the total score
- ✅ Run all 5 checks objectively: duplicate (25%), architecture (25%), docs (20%), OSS (15%), root cause (15%)
- ✅ If calculated score <90%, STOP regardless of user's stated confidence
- ✅ Example: User "75% confident" + passed all 5 checks = 100% confidence → PROCEED
- ✅ Example: User "95% confident" + failed duplicate check (0/25) = 75% → STOP
Rule: Algorithm score overrides stated confidence. Always calculate objectively.
Rationalization 2: "Simple task, skip validation"
Example: "Just add a button" → Agent proceeds without checking existing buttons
COUNTER:
- ❌ NEVER skip checks because task "seems simple"
- ✅ "Simple" tasks often duplicate existing code (25% penalty)
- ✅ "Simple" tasks often violate architecture patterns (25% penalty)
- ✅ Even trivial changes get validated (takes 30 seconds, prevents hours of rework)
- ✅ Example: "Add login button" → Check: Is LoginButton.tsx already defined? YES → 0/25 duplicate check → 75% confidence → CLARIFY
Rule: No task too simple to validate. 30-second check prevents 2-hour rework.
Rationalization 3: "I know the API, skip docs"
Example: Agent uses Redis API from memory without checking current documentation
COUNTER:
- ❌ NEVER skip official docs based on "knowing" the API
- ✅ APIs change (Redis 3.x → 4.x syntax breaking changes)
- ✅ Memory is fallible (confusing similar APIs)
- ✅ Documentation takes 2 minutes to verify (debugging wrong API takes hours)
- ✅ Use Tavily MCP or direct doc access: redis.io, React docs, Express docs
- ✅ Zero points (0/20) if docs not consulted
Rule: Always verify official docs. Knowledge cutoff and API changes require fresh verification.
Rationalization 4: "I can design this, skip OSS research"
Example: "I'll design a real-time sync protocol" without checking Yjs, Automerge, ShareDB
COUNTER:
- ❌ NEVER design from scratch when proven OSS exists
- ✅ Working OSS = battle-tested, production-proven, community-validated
- ✅ Custom designs often miss edge cases (race conditions, conflicts, network partitions)
- ✅ 15-30 minutes researching OSS saves weeks of debugging custom code
- ✅ Use GitHub MCP: search "real-time collaborative editing", filter by stars, check recent commits
- ✅ Zero points (0/15) if no OSS referenced
Rule: Learn from production code. OSS research is mandatory for complex features.
Rationalization 5: "Obvious problem, skip root cause"
Example: "API slow → Add caching" without profiling actual bottleneck
COUNTER:
- ❌ NEVER implement solution before diagnosing root cause
- ✅ "Obvious" problems often have non-obvious causes
- ✅ Example: "API slow" → Root cause: N+1 database queries, not lack of caching
- ✅ Example: "Memory leak" → Root cause: Event listener not cleaned up, not memory allocation
- ✅ Diagnose FIRST (logs, profiler, metrics), implement SECOND
- ✅ Zero points (0/15) if no diagnostic evidence provided
Rule: Diagnosis before prescription. No solutions without identified root cause.
Rationalization 6: "85% is close enough to 90%"
Example: Score 85% → Agent thinks "Close enough, let's proceed"
COUNTER:
- ❌ NEVER round up or "close enough" confidence scores
- ✅ Thresholds are precise: ≥90% proceed, ≥70% clarify, <70% STOP
- ✅ 85% = CLARIFY band → Request missing information to reach 90%
- ✅ Example: 85% = missing OSS reference (0/15) → "Please provide working OSS example before proceeding"
- ✅ 5% gap often reveals critical missing validation
- ✅ Iron Law: Thresholds are exact, not approximate
Rule: 90% means 90.0%. Not 89.9%, not 85%, not "close enough". Exact threshold enforcement.
Rationalization 7: "Authority override"
Example: Senior engineer says "Trust me, I've done this 100 times, skip the checks"
COUNTER:
- ❌ NEVER accept authority as substitute for validation
- ✅ Algorithm is objective, applies to all experience levels
- ✅ "Trust" is not a substitute for verification
- ✅ Even senior engineers miss duplicates, architecture violations, outdated API knowledge
- ✅ Stated confidence (95%) doesn't override calculated confidence (might be 60%)
- ✅ Run 5-check algorithm regardless of who requests bypass
Rule: No authority exceptions. Algorithm applies universally, from junior to principal.
Rationalization 8: "Urgent/emergency bypass"
Example: "Production down! No time for confidence checks, implement OAuth2 now!"
COUNTER:
- ❌ NEVER skip root cause check in emergencies
- ✅ Wrong fix in emergency wastes MORE time than 2-minute diagnosis
- ✅ Emergency waivers allowed: docs (20), OSS (15) can skip with assumed PASS
- ✅ Emergency MANDATORY: duplicate (25), architecture (25), root cause (15)
- ✅ Example: "Login broken" → Check: Is auth down or database connection down?
- ✅ 2 minutes root cause diagnosis saves 2 hours implementing wrong fix
Rule: Emergencies require faster checks, not skipped checks. Root cause MANDATORY.
Rationalization 9: "Within margin of error"
Example: "88% is close to 90%, within margin of error, let's proceed"
COUNTER:
- ❌ NEVER round confidence scores
- ✅ Thresholds are exact:
if (score >= 0.90)notif (score > 0.88) - ✅ "Close enough" often reveals critical missing validation
- ✅ 2% gap = incomplete docs (2 points) OR missing OSS (15 points) - not trivial
- ✅ Example: 88% means incomplete OSS research - prevents reinventing wheel
- ✅ Request missing points explicitly, don't proceed "close enough"
Rule: Thresholds are exact. 89.9% = CLARIFY, not PROCEED. No rounding.
Rationalization 10: "Found some OSS, counts as research"
Example: Found 50-star unmaintained repo, claims 15/15 OSS check passed
COUNTER:
- ❌ NEVER accept low-quality OSS as validation
- ✅ Quality criteria: >1000 stars OR active maintenance (<3 months) OR production-used
- ✅ Partial credit allowed: 8/15 for lower quality (<1000 stars but active)
- ✅ Zero credit: <100 stars AND unmaintained (>1 year no commits)
- ✅ Example: 50-star 2-year-old repo = 0/15 (find production-quality instead)
- ✅ Purpose: Learn from PROVEN implementations, not hobby projects
Rule: OSS quality matters. Production-grade (15/15), active lower-quality (8/15), or fail (0/15).
Rationalization 11: "New feature bypasses root cause check"
Example: "'Add caching' is a new feature, so root cause check = N/A → 15/15"
COUNTER:
- ❌ NEVER claim "new feature" to skip root cause on fixes/improvements
- ✅ Keywords requiring root cause: "slow", "fix", "improve", "broken", "error", "leak", "crash", "optimize"
- ✅ "Add caching" to fix slow page = FIX → Need diagnostic evidence (profiler, metrics)
- ✅ "Build authentication system" from scratch = NEW FEATURE → Root cause N/A
- ✅ Example: "Page slow, add caching" = FIX → MANDATORY root cause check
- ✅ Example: "Build user authentication" = NEW → Root cause check N/A
Rule: Root cause MANDATORY for any fix/improvement. Keyword detection enforced.
Rationalization 12: "User provided docs, skip verification"
Example: User provides syntax snippet, agent accepts without verifying against official docs
COUNTER:
- ❌ NEVER accept user-provided syntax without official doc verification
- ✅ User syntax may be outdated (Redis 3.x vs 4.x breaking changes)
- ✅ User syntax may be from different library (mixing APIs)
- ✅ User syntax may be pseudo-code, not actual API
- ✅ ALWAYS verify against official docs regardless of user input quality
- ✅ Example: User says "redis.connect()" → Verify: Official Redis 4.x is "createClient() + await connect()"
Rule: Official docs verification MANDATORY. User input verified, not trusted blindly.
Detection Signal
If you're tempted to:
- Accept <90% confidence as "good enough"
- Skip checks for "simple" tasks
- Use memory instead of docs
- Design custom instead of researching OSS
- Implement solution before diagnosing problem
- Round 85% up to 90%
Then you are rationalizing. Stop. Run the 5-check algorithm. Report the score objectively.
When to Use
Use confidence-check skill when:
- About to begin implementation of ANY feature, fix, or change
- User provides specification, requirements, or work description
- Complexity score ≥0.30 (Moderate or higher) from spec-analysis
- Starting new wave in wave-based execution
- User says "I think...", "Maybe...", "Probably..." (uncertainty signals)
- Before writing code that modifies existing system architecture
DO NOT use when:
- Pure research or investigation tasks (no implementation)
- Documentation-only changes (README, comments)
- Test writing based on existing implementation (tests validate, not implement)
- Responding to clarification questions (dialogue, not implementation)
Core Competencies
1. Quantitative 5-Check Algorithm
Objective scoring across five validation dimensions, each contributing weighted points to total confidence score (0.00-1.00):
- No Duplicate Implementations (25%): Verify no existing code duplicates proposed implementation
- Architecture Compliance (25%): Confirm approach aligns with system architecture patterns
- Official Docs Verified (20%): Validate using current official documentation
- Working OSS Referenced (15%): Learn from production-proven open source implementations
- Root Cause Identified (15%): For fixes, verify diagnostic evidence of actual problem
2. Threshold-Based Decision System
- ≥90% confidence: PROCEED to implementation
- 70-89% confidence: CLARIFY - request missing information before proceeding
- <70% confidence: STOP - too many unknowns, requires deeper analysis
3. Integration with Shannon 8D Scoring
Confidence check score informs spec-analysis dimensions:
- Low confidence (<70%) increases Uncertainty dimension (10% weight)
- Missing architecture compliance increases Cognitive dimension (15% weight)
- No OSS reference increases Technical dimension (15% weight)
Workflow
Step 1: Initialize Confidence Assessment
Input: Implementation request from user
Processing:
- Parse request to identify: feature name, scope, affected components
- Initialize score:
confidence_score = 0.00 - Initialize checklist:
{ duplicate_check: { passed: null, points: 0, max: 25 }, architecture_check: { passed: null, points: 0, max: 25 }, docs_check: { passed: null, points: 0, max: 20 }, oss_check: { passed: null, points: 0, max: 15 }, root_cause_check: { passed: null, points: 0, max: 15 } }
Output: Initialized assessment structure
Duration: Instant
Step 2: Check #1 - No Duplicate Implementations (25%)
Purpose: Prevent reimplementing existing functionality
Processing:
Search codebase for similar implementations:
- Use Grep to search for function names, component names, class names
- Example: User wants "LoginButton" → Search:
grep -r "LoginButton" src/ - Example: User wants "authenticateUser" → Search:
grep -r "authenticateUser|authenticate.*user" src/
Check package.json for existing libraries:
- Example: User wants "JWT auth" → Check:
grep -i "jsonwebtoken|jwt" package.json - If library installed, verify it's not already used
- Example: User wants "JWT auth" → Check:
Review existing architecture:
- Use Read to check key files (routes, controllers, services)
- Example: Authentication → Check:
src/middleware/auth.js,src/routes/auth.js
Scoring:
- ✅ PASS (25/25): No duplicate found, or duplicate intentional (different purpose documented)
- ⚠️ PARTIAL (15/25): Similar code exists but in deprecated/unused module
- ❌ FAIL (0/25): Active duplicate found, will create redundant implementation
Output:
duplicate_check.passed: true | falseduplicate_check.points: 0 | 15 | 25duplicate_check.evidence: File paths, code snippets showing existing implementation (if found)
Duration: 1-3 minutes
Example:
User: "Build authentication middleware"
Search: grep -r "auth.*middleware" src/
Found: src/middleware/authenticate.js (active, exports authenticateUser)
Result: FAIL (0/25) - Duplicate implementation exists
Evidence: "src/middleware/authenticate.js already implements JWT authentication"
Step 3: Check #2 - Architecture Compliance (25%)
Purpose: Ensure proposed approach aligns with system architecture patterns
Processing:
Identify architecture patterns:
- Frontend: Component structure (atomic, pages, layouts), state management (Redux, Context, Zustand)
- Backend: MVC vs microservices, layered architecture (routes → controllers → services → models)
- Database: ORM patterns (Prisma, TypeORM), repository pattern, query builders
Locate architecture documentation:
- Check for:
ARCHITECTURE.md,CONTRIBUTING.md,docs/architecture/ - If missing, infer from codebase structure:
- Read:
src/directory structure - Read: Key files (index, app, main) to understand initialization patterns
- Read:
- Check for:
Verify proposed approach matches patterns:
- Example: Frontend component → Check: Does project use
/components/atoms/structure? - Example: API endpoint → Check: Does project use
/routes/ → /controllers/ → /services/layers? - Example: Database query → Check: Does project use raw SQL or ORM?
- Example: Frontend component → Check: Does project use
Scoring:
- ✅ PASS (25/25): Approach matches established patterns, or creates documented new pattern
- ⚠️ PARTIAL (15/25): Approach deviates slightly but acceptable (e.g., new pattern for new domain)
- ❌ FAIL (0/25): Approach violates architecture (e.g., business logic in routes, direct DB access in components)
Output:
architecture_check.passed: true | falsearchitecture_check.points: 0 | 15 | 25architecture_check.rationale: Explanation of alignment or violation
Duration: 2-4 minutes
Example:
User: "Add getUserById() in routes/users.js"
Architecture: Project uses MVC (routes → controllers → services → models)
Proposed: Adding business logic (getUserById) directly in routes
Result: FAIL (0/25) - Violates MVC pattern
Rationale: "getUserById should be in services/userService.js, routes should only handle HTTP"
Step 4: Check #3 - Official Docs Verified (20%)
Purpose: Ensure implementation uses current, official API syntax and patterns
Processing:
Identify required documentation:
- Extract technologies from user request
- Example: "Redis caching" → Docs: redis.io
- Example: "React hooks" → Docs: react.dev/reference/react
- Example: "Express middleware" → Docs: expressjs.com/en/guide/writing-middleware.html
Access official documentation:
- If Tavily MCP available: Use for quick doc lookup
tavily_search("Redis client.connect() API current syntax") - If Context7 MCP available: Use for framework-specific docs
get_library_docs("/redis/redis", topic: "client connection") - Manual fallback: Report doc URLs for user verification
- If Tavily MCP available: Use for quick doc lookup
Verify current API syntax:
- Check version-specific changes (e.g., Redis 3.x vs 4.x breaking changes)
- Confirm proposed usage matches official examples
- Identify deprecated patterns (e.g.,
componentWillMountin React)
Scoring:
- ✅ PASS (20/20): Current official docs verified, syntax confirmed correct
- ⚠️ PARTIAL (10/20): Docs consulted but using slightly outdated version (still functional)
- ❌ FAIL (0/20): No docs consulted, using memory/intuition, or wrong syntax
Output:
docs_check.passed: true | falsedocs_check.points: 0 | 10 | 20docs_check.source: URL or doc reference consulteddocs_check.verification: Specific API syntax confirmed
Duration: 2-5 minutes (depending on MCP availability)
Example:
User: "Use Redis client.connect()"
Action: Search redis.io documentation
Found: Redis 4.x requires: await client.connect() (async)
Redis 3.x used: client.connect(callback)
Verification: Project uses Redis 4.x (package.json: "redis": "^4.6.0")
Result: PASS (20/20) - Correct async syntax for Redis 4.x
Source: https://redis.io/docs/latest/develop/connect/clients/nodejs/
Step 5: Check #4 - Working OSS Referenced (15%)
Purpose: Learn from production-proven implementations, avoid reinventing solved problems
Processing:
Identify OSS research need:
- Complex features: Real-time sync, authentication, payment processing
- Novel patterns: CRDT algorithms, optimistic UI updates, distributed systems
- Domain-specific: E-commerce checkout, video streaming, collaborative editing
Search for working implementations:
- If GitHub MCP available:
github_search_repos("real-time collaborative editing", language: "javascript") // Filter: stars > 1000, recently updated - If Tavily MCP available:
tavily_search("production WebSocket real-time sync implementation") - Manual fallback: Report search terms for user to research
- If GitHub MCP available:
Evaluate OSS quality:
- Check: GitHub stars (>1000 preferred), recent commits (active maintenance)
- Check: Production use (companies using it, "used by" section)
- Check: Code quality (tests, documentation, TypeScript support)
Extract learnings:
- Study: Architecture decisions, error handling patterns, edge case handling
- Identify: Design patterns applicable to current implementation
- Document: Key takeaways from OSS approach
Scoring:
- ✅ PASS (15/15): Production-quality OSS found (>1000 stars, active), learnings documented
- ⚠️ PARTIAL (8/15): OSS found but lower quality (<1000 stars, or inactive)
- ❌ FAIL (0/15): No OSS researched, designing from scratch
- ⚠️ N/A (skip): Trivial implementation where OSS research not applicable (e.g., simple utility function)
Output:
oss_check.passed: true | false | null (N/A)oss_check.points: 0 | 8 | 15oss_check.examples: List of OSS repositories with URLs and star countsoss_check.learnings: Key design patterns extracted from OSS
Duration: 5-10 minutes (research intensive)
Example:
User: "Build real-time collaborative editing"
Action: Search GitHub for "collaborative editing CRDT"
Found:
1. Yjs (github.com/yjs/yjs) - 13.2k stars, active, used by Google, Microsoft
2. Automerge (github.com/automerge/automerge) - 3.5k stars, active, research-backed
3. ShareDB (github.com/share/sharedb) - 6.1k stars, active, Operational Transforms
Learnings:
- Yjs uses CRDT (Conflict-free Replicated Data Types) for automatic conflict resolution
- WebSocket for real-time sync, with offline support and eventual consistency
- State vector compression reduces bandwidth (only send deltas)
Result: PASS (15/15) - Production OSS researched, design patterns identified
Examples: ["yjs/yjs (13.2k stars)", "automerge/automerge (3.5k stars)"]
Step 6: Check #5 - Root Cause Identified (15%)
Purpose: For fixes/improvements, verify diagnostic evidence of actual problem before implementing solution
Processing:
Determine if root cause check applies:
- Applies to: Bug fixes, performance improvements, error handling, optimization
- Does NOT apply to: New features, greenfield implementations, documentation
If applicable, gather diagnostic evidence:
- For bugs: Error logs, stack traces, reproduction steps
- For performance: Profiler output, metrics, query times, network traces
- For errors: Log entries showing failure conditions, frequency, patterns
Verify evidence identifies root cause (not symptoms):
❌ Symptom: "API is slow"
✅ Root cause: "Database query takes 2.4s due to missing index on users.email"
❌ Symptom: "Memory leak"
✅ Root cause: "EventEmitter listeners not removed in componentWillUnmount, accumulating 1000+ listeners"
❌ Symptom: "App crashes"
✅ Root cause: "Uncaught promise rejection in async fetchData() when API returns 404"
Validate proposed solution addresses root cause:
- Example: Root cause = missing index → Solution = add index ✅
- Example: Root cause = N+1 queries → Solution = add caching ❌ (should fix query)
Scoring:
- ✅ PASS (15/15): Diagnostic evidence provided, root cause identified, solution matches cause
- ⚠️ PARTIAL (8/15): Evidence provided but root cause unclear (symptom identified, cause assumed)
- ❌ FAIL (0/15): No diagnostic evidence, solution-first approach
- ⚠️ N/A (skip): New feature (not a fix), root cause check doesn't apply
Output:
root_cause_check.passed: true | false | null (N/A)root_cause_check.points: 0 | 8 | 15root_cause_check.evidence: Diagnostic data (logs, profiler, metrics)root_cause_check.cause: Identified root causeroot_cause_check.solution_alignment: Does solution address cause?
Duration: 3-8 minutes (depending on diagnostic complexity)
Example:
User: "API is slow, add caching"
Action: Request diagnostic evidence
User provides: "Logs show /api/users taking 3.2s average"
Investigation:
- Check: Database query logs
- Found: SELECT * FROM users WHERE email = ? (no index on email column)
- Profiler: 95% of time spent in database query
Root Cause: Missing database index on users.email column (N+1 query problem)
Proposed Solution: "Add caching"
Alignment: MISMATCH - Caching treats symptom, doesn't fix root cause
Better Solution: "Add index on users.email column"
Result: FAIL (0/15) - Solution doesn't address root cause
Evidence: "Database profiler shows 3.1s query time on unindexed email column"
Cause: "Missing index on users.email"
Solution Alignment: "Proposed caching, should add database index instead"
Step 7: Calculate Total Confidence Score
Input: All 5 check results
Processing:
Sum points from all checks:
total_points = duplicate_check.points + architecture_check.points + docs_check.points + oss_check.points + root_cause_check.pointsCalculate confidence score (0.00-1.00):
confidence_score = total_points / 100.0Determine threshold band:
if (confidence_score >= 0.90) { decision = "PROCEED" action = "Begin implementation" } else if (confidence_score >= 0.70) { decision = "CLARIFY" action = "Request missing information before proceeding" } else { decision = "STOP" action = "Too many unknowns, requires deeper analysis or spec revision" }Identify missing checks (if <90%):
missing_checks = checks.filter(c => c.points < c.max) // Example: [{name: "docs", missing: 20}, {name: "oss", missing: 15}]
Output:
confidence_score: 0.00-1.00 (e.g., 0.85)decision: "PROCEED" | "CLARIFY" | "STOP"action: Recommended next stepmissing_checks: List of incomplete checks with missing points
Duration: Instant (calculation)
Example:
Results:
duplicate_check: 25/25 ✅
architecture_check: 25/25 ✅
docs_check: 20/20 ✅
oss_check: 0/15 ❌ (no OSS researched)
root_cause_check: 15/15 ✅ (N/A, new feature)
Total: 85/100
Confidence: 0.85 (85%)
Decision: CLARIFY
Action: "Request OSS examples before proceeding"
Missing: ["OSS reference (0/15)"]
Step 8: Report Confidence Assessment
Input: Complete assessment with decision
Processing:
Format assessment report:
# Confidence Check: [Feature Name] **Total Confidence**: X.XX (XX%) **Decision**: PROCEED | CLARIFY | STOP ## 5-Check Results | Check | Points | Status | Evidence | |-------|--------|--------|----------| | Duplicate | XX/25 | ✅/❌ | [Details] | | Architecture | XX/25 | ✅/❌ | [Details] | | Docs | XX/20 | ✅/❌ | [Details] | | OSS | XX/15 | ✅/❌ | [Details] | | Root Cause | XX/15 | ✅/❌ | [Details] | ## Decision: [PROCEED/CLARIFY/STOP] [Action description] ## Next Steps [Specific actions based on decision]Save to Serena MCP (if available and complexity >=0.50):
serena_write_memory(`confidence_check_${feature_name}_${timestamp}`, { feature: feature_name, confidence_score: 0.85, decision: "CLARIFY", checks: [...], missing_checks: [...] })Integrate with spec-analysis:
- If confidence <70%, increase Uncertainty dimension in 8D scoring
- Report confidence score to wave-orchestration for risk assessment
Output: Formatted markdown report with decision and next steps
Duration: 1 minute
Step 9: Execute Decision
Input: Decision (PROCEED | CLARIFY | STOP)
Processing:
If PROCEED (≥90%):
- Document confidence assessment in implementation PR/commit
- Proceed to implementation
- No additional gating required
If CLARIFY (70-89%):
- Identify specific missing information:
- Example: "Need OSS reference for real-time sync (missing 15 points)"
- Example: "Need official Redis docs verification (missing 20 points)"
- Request clarification from user:
- "Before proceeding, please provide: [specific requests]"
- "This will increase confidence from 85% to 100%"
- Wait for clarification before implementation
- Re-run confidence check with new information
If STOP (<70%):
- Report critical gaps:
- Example: "Missing architecture alignment (0/25) AND OSS reference (0/15) = 60% confidence"
- Recommend alternatives:
- "Suggest running /sh_spec for deeper analysis"
- "Consider spike/research task before implementation"
- "Revise specification to address unknowns"
- DO NOT proceed to implementation
- Escalate to user for direction
Output: Executed decision with user feedback or implementation start
Duration: Depends on decision path
Integration with Shannon 8D Scoring
Confidence score informs spec-analysis dimensions:
// In spec-analysis workflow
const confidence_result = run_confidence_check(feature_request)
// Update Uncertainty dimension (10% weight in 8D)
if (confidence_result.score < 0.70) {
uncertainty_score += 0.30 // Major unknowns
} else if (confidence_result.score < 0.90) {
uncertainty_score += 0.15 // Minor clarifications needed
}
// Update Cognitive dimension (15% weight) if architecture unclear
if (confidence_result.architecture_check.points < 15) {
cognitive_score += 0.20 // Need deeper architectural thinking
}
// Update Technical dimension (15% weight) if no OSS reference
if (confidence_result.oss_check.points === 0) {
technical_score += 0.15 // Increased technical risk without proven patterns
}
// Recalculate total complexity with confidence-adjusted dimensions
total_complexity = calculate_8d_weighted_total()
Result: Confidence check directly impacts project complexity assessment and resource planning.
Outputs
Structured confidence assessment:
{
"feature": "authentication_middleware",
"timestamp": "2025-11-04T10:30:00Z",
"confidence_score": 0.85,
"decision": "CLARIFY",
"checks": [
{
"name": "duplicate",
"points": 25,
"max": 25,
"passed": true,
"evidence": "No existing auth middleware found in src/"
},
{
"name": "architecture",
"points": 25,
"max": 25,
"passed": true,
"rationale": "Follows MVC pattern: middleware/ directory exists"
},
{
"name": "docs",
"points": 20,
"max": 20,
"passed": true,
"source": "https://expressjs.com/en/guide/writing-middleware.html",
"verification": "Confirmed Express 4.x middleware syntax"
},
{
"name": "oss",
"points": 0,
"max": 15,
"passed": false,
"examples": [],
"reason": "No OSS authentication middleware researched"
},
{
"name": "root_cause",
"points": 15,
"max": 15,
"passed": null,
"note": "N/A - new feature, not a fix"
}
],
"missing_checks": [
{
"name": "oss",
"missing_points": 15,
"recommendation": "Research Passport.js or express-jwt OSS implementations"
}
],
"action": "Request OSS examples before proceeding (need 90% confidence)",
"next_steps": [
"User: Provide working OSS reference for authentication middleware",
"User: Consider using Passport.js (13k stars) or express-jwt (6k stars)",
"Agent: Re-run confidence check after OSS research"
]
}
Success Criteria
This skill succeeds if:
- ✅ All 5 checks executed (duplicate, architecture, docs, OSS, root cause)
- ✅ Confidence score calculated correctly (0.00-1.00, sum of weighted checks)
- ✅ Decision matches threshold: ≥90% PROCEED, 70-89% CLARIFY, <70% STOP
- ✅ Missing checks identified when score <90%
- ✅ Evidence documented for each check (file paths, doc URLs, OSS repos)
- ✅ Decision executed (implementation started, clarification requested, or stopped)
- ✅ Assessment saved to Serena MCP (if available and complexity >=0.50)
- ✅ Confidence score integrated with spec-analysis (Uncertainty dimension updated)
Validation:
def validate_confidence_check(result):
assert 0.0 <= result['confidence_score'] <= 1.0
assert result['confidence_score'] == sum(c['points'] for c in result['checks']) / 100.0
# Verify decision thresholds
if result['confidence_score'] >= 0.90:
assert result['decision'] == 'PROCEED'
elif result['confidence_score'] >= 0.70:
assert result['decision'] == 'CLARIFY'
else:
assert result['decision'] == 'STOP'
# Verify all checks present
assert len(result['checks']) == 5
check_names = [c['name'] for c in result['checks']]
assert set(check_names) == {'duplicate', 'architecture', 'docs', 'oss', 'root_cause'}
# Verify missing checks identified if <90%
if result['confidence_score'] < 0.90:
assert len(result['missing_checks']) > 0
Common Pitfalls
Pitfall 1: Accepting Stated Confidence Without Calculation
Wrong:
User: "I'm 80% sure this is right"
Agent: "Let's proceed with 80% confidence"
Right:
User: "I'm 80% sure this is right"
Agent: "Running 5-check validation..."
Result:
- duplicate: 25/25
- architecture: 0/25 (violates MVC pattern)
- docs: 20/20
- oss: 15/15
- root_cause: N/A
Total: 60/100 (60% confidence)
Decision: STOP (below 70% threshold)
Why: User's stated confidence is subjective. Algorithm score is objective and often reveals hidden gaps.
Pitfall 2: Skipping Checks for "Simple" Tasks
Wrong:
User: "Just add a login button"
Agent: "Simple task, proceeding..."
[Creates LoginButton.tsx, duplicating existing LoginButton.jsx]
Right:
User: "Just add a login button"
Agent: "Running confidence check..."
Check 1 (Duplicate): Searching for existing login buttons...
Found: src/components/LoginButton.jsx (active)
Result: FAIL (0/25) - Duplicate exists
Decision: STOP (25% confidence)
Recommendation: "Use existing LoginButton.jsx instead of creating new component"
Why: "Simple" tasks often duplicate existing code. 30-second check prevents 2-hour rework.
Pitfall 3: Using Memory Instead of Official Docs
Wrong:
User: "Use Redis caching"
Agent: "I'll use client.connect(callback)..."
[Uses Redis 3.x syntax in Redis 4.x project → Error]
Right:
User: "Use Redis caching"
Agent: "Checking official Redis documentation..."
Version in package.json: "redis": "^4.6.0"
Redis 4.x docs: client.connect() is async (no callback)
Correct syntax: await client.connect()
Result: docs check PASS (20/20)
Why: APIs change (Redis 3.x → 4.x breaking changes). Always verify current docs.
Pitfall 4: Designing From Scratch Instead of OSS Research
Wrong:
User: "Build real-time collaborative editing"
Agent: "I'll design a custom sync protocol..."
[Spends 2 weeks, hits race conditions, conflict resolution issues]
Right:
User: "Build real-time collaborative editing"
Agent: "Researching production OSS implementations..."
Found: Yjs (13.2k stars, used by Google/Microsoft)
Learnings: CRDT for conflict-free sync, state vector compression, offline support
Decision: Use Yjs patterns, adapt to our use case
Result: oss check PASS (15/15)
[Saves 2 weeks by learning from production-proven code]
Why: OSS has solved hard problems (race conditions, edge cases). Learn from battle-tested code.
Pitfall 5: Solution-First Instead of Diagnosis-First
Wrong:
User: "API is slow, add caching"
Agent: "Adding Redis caching..."
[Implements caching, API still slow - root cause was missing database index]
Right:
User: "API is slow, add caching"
Agent: "Requesting diagnostic evidence for root cause check..."
User: [provides profiler data]
Analysis: 95% time in database query (missing index on users.email)
Root Cause: Missing database index
Proposed Solution: "Add caching"
Alignment: MISMATCH
Correct Solution: "Add database index on users.email"
Result: root_cause check adjusted (proposes correct solution)
Why: Diagnosis before prescription. Caching treats symptom; index fixes cause.
Pitfall 6: "Close Enough" to 90%
Wrong:
Confidence: 85%
Agent: "85% is close to 90%, proceeding..."
Right:
Confidence: 85%
Decision: CLARIFY (70-89% band)
Action: "Missing OSS reference (0/15). Please provide working example to reach 90%."
[Wait for clarification before proceeding]
Why: Thresholds are exact, not approximate. 85% = CLARIFY, not PROCEED.
Examples
Example 1: PROCEED Scenario (100% Confidence)
Input:
User: "Add error logging to API endpoints using Winston library"
Process:
Duplicate Check (25/25):
- Search:
grep -r "winston" src/ - Found: package.json has "winston": "^3.8.0" but no middleware configured
- Result: ✅ PASS - Winston installed but not configured for endpoints
- Search:
Architecture Check (25/25):
- Read: src/middleware/ (exists, contains other middleware)
- Pattern: Express middleware in src/middleware/, registered in src/app.js
- Proposed: Create src/middleware/logger.js
- Result: ✅ PASS - Follows established middleware pattern
Docs Check (20/20):
- Source: https://github.com/winstonjs/winston#usage
- Verified: Current Winston 3.x syntax (createLogger, transports)
- Result: ✅ PASS - Official docs consulted, syntax verified
OSS Check (15/15):
- Found: Express + Winston examples on GitHub (express-winston, 800 stars)
- Learnings: Use express-winston for automatic request/response logging
- Result: ✅ PASS - Production OSS patterns identified
Root Cause Check (15/15):
- Note: N/A (new feature, not a fix)
- Result: ✅ SKIP (not applicable)
Output:
{
"feature": "winston_error_logging",
"confidence_score": 1.00,
"decision": "PROCEED",
"checks": [
{"name": "duplicate", "points": 25, "passed": true},
{"name": "architecture", "points": 25, "passed": true},
{"name": "docs", "points": 20, "passed": true},
{"name": "oss", "points": 15, "passed": true},
{"name": "root_cause", "points": 15, "passed": null}
],
"action": "Proceed to implementation with 100% confidence",
"next_steps": [
"Create src/middleware/logger.js using Winston",
"Register middleware in src/app.js",
"Test error logging on sample endpoint"
]
}
Decision: PROCEED ✅ (100% confidence)
Example 2: CLARIFY Scenario (85% Confidence)
Input:
User: "Implement WebSocket real-time notifications"
Process:
Duplicate Check (25/25):
- Search:
grep -r "websocket\|socket\.io" src/ - Found: No existing WebSocket implementation
- Result: ✅ PASS
- Search:
Architecture Check (25/25):
- Read: src/server.js (Express HTTP server)
- Pattern: HTTP server uses Express, can upgrade to WebSocket
- Proposed: Add Socket.io to existing HTTP server
- Result: ✅ PASS - Compatible with architecture
Docs Check (20/20):
- Source: https://socket.io/docs/v4/server-initialization/
- Verified: Socket.io 4.x syntax (server initialization, emit patterns)
- Result: ✅ PASS
OSS Check (0/15):
- Action: Searched GitHub for "socket.io real-time notifications"
- Found: Multiple examples but none reviewed for production patterns
- Result: ❌ FAIL - No OSS researched
Root Cause Check (15/15):
- Note: N/A (new feature)
- Result: ✅ SKIP
Output:
{
"feature": "websocket_notifications",
"confidence_score": 0.85,
"decision": "CLARIFY",
"checks": [
{"name": "duplicate", "points": 25, "passed": true},
{"name": "architecture", "points": 25, "passed": true},
{"name": "docs", "points": 20, "passed": true},
{"name": "oss", "points": 0, "passed": false, "reason": "No OSS researched"},
{"name": "root_cause", "points": 15, "passed": null}
],
"missing_checks": [
{
"name": "oss",
"missing_points": 15,
"recommendation": "Research Socket.io notification patterns from production apps"
}
],
"action": "Request OSS examples before proceeding",
"next_steps": [
"User: Provide working Socket.io notification example from GitHub",
"Suggested repos: socket.io-chat, slack-clone, discord-clone",
"Agent: Review OSS patterns (room management, broadcast strategies, reconnection logic)",
"Agent: Re-run confidence check after OSS research"
]
}
Decision: CLARIFY ⚠️ (85% confidence - need OSS research to reach 90%)
Example 3: STOP Scenario (50% Confidence)
Input:
User: "Fix the memory leak in the dashboard"
Process:
Duplicate Check (25/25):
- Search: No duplicate fix implementations
- Result: ✅ PASS
Architecture Check (0/25):
- Issue: User didn't specify which component has leak
- Dashboard has 15+ components (DashboardPage, Sidebar, Charts, Tables, Modals)
- Cannot determine architecture compliance without knowing affected component
- Result: ❌ FAIL - Insufficient specification
Docs Check (0/20):
- Issue: "Memory leak" is symptom, not specific API issue
- No specific docs to verify without knowing leak source
- Result: ❌ FAIL - Cannot verify docs without root cause
OSS Check (0/15):
- Issue: Cannot research OSS fixes without knowing leak type
- (Event listeners? Component unmount? Timers? Closures?)
- Result: ❌ FAIL - Cannot research without specifics
Root Cause Check (0/15):
- Issue: No diagnostic evidence provided
- Need: Heap snapshot, profiler data, reproduction steps
- Result: ❌ FAIL - No root cause identified
Output:
{
"feature": "memory_leak_fix",
"confidence_score": 0.25,
"decision": "STOP",
"checks": [
{"name": "duplicate", "points": 25, "passed": true},
{"name": "architecture", "points": 0, "passed": false, "reason": "Component not specified"},
{"name": "docs", "points": 0, "passed": false, "reason": "Cannot verify without root cause"},
{"name": "oss", "points": 0, "passed": false, "reason": "Cannot research without specifics"},
{"name": "root_cause", "points": 0, "passed": false, "reason": "No diagnostic evidence"}
],
"missing_checks": [
{"name": "architecture", "missing_points": 25},
{"name": "docs", "missing_points": 20},
{"name": "oss", "missing_points": 15},
{"name": "root_cause", "missing_points": 15}
],
"action": "STOP - Too many unknowns (25% confidence)",
"next_steps": [
"User: Provide diagnostic evidence:",
" 1. Which component has memory leak? (Chrome DevTools Memory profiler)",
" 2. Heap snapshot showing leak growth over time",
" 3. Reproduction steps (actions that trigger leak)",
" 4. Browser console warnings/errors",
"Alternative: Run /sh_spec for deeper analysis phase",
"Agent: Re-run confidence check after diagnostic evidence provided"
]
}
Decision: STOP 🛑 (25% confidence - critical gaps, requires investigation)
Validation
How to verify confidence-check executed correctly:
Check All 5 Checks Executed:
- ✅ duplicate_check: { passed: true|false, points: 0-25 }
- ✅ architecture_check: { passed: true|false, points: 0-25 }
- ✅ docs_check: { passed: true|false, points: 0-20 }
- ✅ oss_check: { passed: true|false|null, points: 0-15 }
- ✅ root_cause_check: { passed: true|false|null, points: 0-15 }
Check Score Calculation:
- ✅ confidence_score = sum(check.points) / 100.0
- ✅ Score in range [0.00, 1.00]
- ✅ Example: 25+25+20+0+15 = 85 → 0.85 ✅
Check Decision Threshold:
- ✅ ≥0.90 → "PROCEED"
- ✅ 0.70-0.89 → "CLARIFY"
- ✅ <0.70 → "STOP"
Check Missing Checks Identified (if <90%):
- ✅ missing_checks array not empty
- ✅ Each entry: { name, missing_points, recommendation }
Check Evidence Documented:
- ✅ duplicate_check: File paths or "no duplicate found"
- ✅ architecture_check: Pattern description or violation
- ✅ docs_check: Doc URL and verification note
- ✅ oss_check: Repository URLs with star counts (if applicable)
- ✅ root_cause_check: Diagnostic evidence (if applicable)
Run Validation Script (if available):
python3 shannon-plugin/tests/test_confidence_check.py # Expected: ✅ All validation checks passed
Progressive Disclosure
In SKILL.md (this file): ~1100 lines
- Overview, when to use, anti-rationalization
- 5-check algorithm with detailed steps
- Workflow with processing logic
- Integration with Shannon 8D scoring
- 3 examples (PROCEED, CLARIFY, STOP)
- Success criteria, pitfalls, validation
In references/ (for advanced usage):
references/CONFIDENCE_ALGORITHM.md: Mathematical formulas, edge casesreferences/OSS_RESEARCH_GUIDE.md: How to evaluate OSS quality, extract learningsreferences/ROOT_CAUSE_PATTERNS.md: Common root cause patterns by domain
Claude loads references/ when:
- Score calculation unclear (consult CONFIDENCE_ALGORITHM.md)
- OSS research guidance needed (consult OSS_RESEARCH_GUIDE.md)
- Root cause identification unclear (consult ROOT_CAUSE_PATTERNS.md)
References
- Spec Analysis: shannon-plugin/skills/spec-analysis/SKILL.md
- Wave Orchestration: shannon-plugin/skills/wave-orchestration/SKILL.md
- Context Management: shannon-plugin/core/CONTEXT_MANAGEMENT.md
- MCP Discovery: shannon-plugin/skills/mcp-discovery/SKILL.md
- Testing Philosophy (NO MOCKS): shannon-plugin/core/TESTING_PHILOSOPHY.md
Metadata
Version: 4.0.0 Last Updated: 2025-11-04 Author: Shannon Framework Team (Adapted from SuperClaude) License: MIT Status: Core (QUANTITATIVE skill, mandatory before implementation) Proven ROI: 25-250x token savings in SuperClaude production use