Claude Code Plugins

Community-maintained marketplace

Feedback

Extended reasoning with systematic investigation, hypothesis tracking, and confidence progression

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name thinkdeep
description Extended reasoning with systematic investigation, hypothesis tracking, and confidence progression

THINKDEEP

Overview

The THINKDEEP workflow provides multi-step investigation capabilities with hypothesis tracking, evidence collection, and confidence progression. Unlike simple conversations, THINKDEEP maintains investigation state across turns, allowing systematic analysis of complex problems through hypothesis formation, testing, and refinement.

Key Capabilities:

  • Multi-step investigation with state persistence across conversation turns
  • Hypothesis tracking and evolution as evidence accumulates
  • Confidence level progression (exploring � low � medium � high � very_high � almost_certain � certain)
  • Investigation step history with findings recorded
  • File examination tracking across the investigation
  • Optional expert validation for hypothesis verification

Use Cases:

  • Complex debugging scenarios requiring systematic hypothesis testing
  • Architecture decisions needing evidence-based reasoning
  • Security analysis with methodical investigation
  • Root cause analysis with confidence tracking
  • Performance optimization requiring step-by-step exploration

When to Use

Use the THINKDEEP workflow when you need to:

  • Systematic investigation - Complex problems requiring methodical, step-by-step analysis with hypothesis testing
  • Evidence accumulation - Building confidence through progressive evidence gathering across multiple investigation steps
  • Hypothesis evolution - Starting with initial theories and refining them based on findings
  • State persistence - Multi-turn investigations where context and progress must be maintained
  • Confidence tracking - Knowing how certain you are about conclusions as investigation progresses

When NOT to Use

Avoid the THINKDEEP workflow when:

Situation Use Instead
Simple conversational queries CHAT - Single-model conversation for straightforward questions
Need multiple model perspectives CONSENSUS - Parallel multi-model consultation
Need structured debate ARGUMENT - Dialectical analysis with creator/skeptic/moderator
Creative brainstorming IDEATE - Structured idea generation

Detailed Use Cases

Debugging Scenarios

Race Conditions and Timing Issues:

  • Intermittent bugs that don't reproduce consistently
  • Authentication failures with no clear pattern
  • State management issues in async code
  • Example: "Users report random 401 errors, 5% of requests fail but logs show valid tokens"

Memory Leaks and Resource Issues:

  • Gradually increasing memory usage
  • Connection pool exhaustion
  • File descriptor leaks
  • Example: "Service memory grows from 200MB to 2GB over 24 hours, then OOM crashes"

Integration Failures:

  • API calls failing with unclear errors
  • Database connection issues
  • Third-party service timeouts
  • Example: "Payment processing fails 10% of the time with 'transaction timeout' but payment provider shows success"

Configuration Problems:

  • Environment-specific bugs
  • Deployment issues
  • Feature flag interactions
  • Example: "Feature works in staging but fails in production, configuration appears identical"

Investigation Scenarios

Performance Analysis:

  • Response time degradation
  • Query performance issues
  • CPU/memory spikes under load
  • Example: "API latency increased from 100ms to 2s after deployment, need to identify bottleneck"

Security Analysis:

  • Vulnerability assessment
  • Access control verification
  • Input validation checking
  • Example: "Audit authentication flow for potential bypass vulnerabilities before security review"

Architecture Decisions:

  • Technology selection (microservices vs monolith)
  • Database choice (SQL vs NoSQL)
  • Caching strategy evaluation
  • Example: "Team of 5 developers, 10k users expected, evaluate architecture tradeoffs"

Root Cause Analysis:

  • Production incident investigation
  • Cascading failure analysis
  • Error spike investigation
  • Example: "500 error rate spiked from 0.1% to 15% at 2pm, trace back to root cause"

When to Use THINKDEEP for These Scenarios

Characteristics that indicate THINKDEEP is the right choice:

  • ✅ Problem requires multiple investigation steps
  • ✅ Initial hypothesis may be wrong and need refinement
  • ✅ Evidence needs to accumulate across steps
  • ✅ Confidence level matters for decision-making
  • ✅ Investigation may span multiple sessions
  • ✅ Need to track what's been examined

Example decision:

Problem: "Users report intermittent 401 errors"

Simple fix? → NO (intermittent, unclear cause)
Need hypothesis testing? → YES (multiple possible causes)
Multi-step investigation? → YES (need to examine logs, code, flow)
Confidence tracking valuable? → YES (want to know how certain about root cause)

Decision: Use THINKDEEP ✓

Hypothesis Tracking

THINKDEEP maintains hypothesis state across the investigation, allowing theories to evolve as evidence is gathered.

How Hypothesis Tracking Works

Initial Step:

  • First prompt starts with "exploring" confidence
  • Model forms initial hypothesis based on available information
  • Investigation begins with this working theory

Subsequent Steps:

  • Each turn adds evidence supporting or contradicting the hypothesis
  • Hypothesis may be refined, replaced, or strengthened
  • Multiple hypotheses can exist simultaneously
  • Investigation state tracks all hypotheses and their evolution

Hypothesis Evolution Example:

Step 1: "Why is authentication failing?"
� Hypothesis: "Session token expiration causing failures"
� Confidence: LOW (initial theory)

Step 2: "Check token validation logic"
� Evidence: Token expiration logic correct, but timing issue found
� Updated Hypothesis: "Race condition in async token validation"
� Confidence: MEDIUM (supporting evidence found)

Step 3: "Examine async/await patterns in auth flow"
� Evidence: Missing await in validation, non-blocking check
� Confirmed Hypothesis: "Race condition - token validated before expiry check completes"
� Confidence: HIGH (root cause identified with evidence)

State Persistence

  • Hypotheses are stored in conversation memory
  • Thread ID links all investigation steps
  • Each turn builds on previous findings
  • Investigation can be paused and resumed using continuation

Confidence Progression

THINKDEEP tracks confidence levels that progress as evidence accumulates, helping you understand how certain the conclusions are.

Confidence Levels

Level Code Description When to Use
Exploring exploring Just starting, no clear hypothesis Initial investigation phase
Low low Early investigation, hypothesis forming First theories with minimal evidence
Medium medium Some supporting evidence found Partial evidence supporting hypothesis
High high Strong evidence supporting hypothesis Substantial evidence, likely correct
Very High very_high Very strong evidence, high confidence Comprehensive evidence, minimal doubt
Almost Certain almost_certain Near complete confidence Overwhelming evidence, virtually certain
Certain certain 100% confidence, hypothesis validated Hypothesis proven beyond reasonable doubt

Confidence Progression Patterns

Typical Investigation Flow:

exploring � low � medium � high � very_high � almost_certain

Quick Resolution:

exploring � low � high (direct evidence found immediately)

Complex Investigation:

exploring � low � medium (blocked) � low (new hypothesis) � medium � high � very_high

Confidence can decrease if:

  • New evidence contradicts current hypothesis
  • Investigation reveals complexity not initially apparent
  • Hypothesis needs significant revision

Using Confidence Levels

For AI Agents:

  • exploring/low: Continue investigating, gather more evidence
  • medium: Validate findings, look for confirming/contradicting evidence
  • high: Consider hypothesis likely correct, verify edge cases
  • very_high/almost_certain: Hypothesis well-supported, ready for conclusion
  • certain: Hypothesis validated, investigation complete

When to Stop Investigating:

  • Confidence reaches high or above AND sufficient evidence collected
  • All relevant files examined and findings documented
  • Hypothesis explains all observed behavior
  • No contradicting evidence remains

Expert Validation (Optional)

THINKDEEP supports optional expert validation using a secondary AI model:

How it works:

  • Primary provider conducts investigation
  • Expert provider (optional) reviews hypothesis and evidence
  • Expert provides independent assessment of confidence level
  • Helps verify conclusions and identify blind spots

When to use expert validation:

  • High-stakes decisions requiring verification
  • Complex investigations where confirmation valuable
  • Security analysis requiring independent review
  • Architecture decisions benefiting from second opinion

Investigation Continuation

THINKDEEP supports multi-turn investigations where you can pause, resume, and build upon previous work using the session_id parameter.

How Continuation Works

session_id Parameter:

  • Unique session identifier linking investigation steps
  • Preserves full conversation history and context
  • Maintains hypothesis state, findings, and confidence levels
  • Enables seamless resumption across sessions

State Preservation:

  • All previous findings and evidence
  • Hypothesis evolution history
  • Files checked and examined
  • Confidence progression
  • Investigation step history

Continuation Patterns

Pattern 1: Multi-Step Investigation

Step 1 - Initial investigation:

model-chorus thinkdeep --step "Investigate authentication failures in production" --step-number 1 --total-steps 3 --next-step-required --findings "Examining auth service logs..." --confidence exploring

Returns: session_id = "auth-inv-abc123"

Step 2 - Continue with same thread:

model-chorus thinkdeep --session-id "auth-inv-abc123" --step "Check token validation logic based on log findings" --step-number 2 --total-steps 3 --next-step-required --findings "Found race condition in async token validation" --confidence medium

Step 3 - Final analysis:

model-chorus thinkdeep --session-id "auth-inv-abc123" --step "Verify race condition hypothesis with code analysis" --step-number 3 --total-steps 3 --findings "Confirmed: missing await in auth middleware" --confidence high

Pattern 2: Investigation Branching

Start a new investigation branch while preserving original:

Original investigation continues:

model-chorus thinkdeep --session-id "original-thread-xyz" --step "Continue original investigation..." --step-number 4 --total-steps 5 --next-step-required --findings "..." --confidence medium

New branch (omit continuation-id to start fresh):

model-chorus thinkdeep --step "Explore alternative: network latency causing timeouts" --step-number 1 --total-steps 2 --next-step-required --findings "Investigating network layer..." --confidence low

Pattern 3: Cross-Session Resume

Day 1 - Original session:

model-chorus thinkdeep --step "Analyze memory leak in service" --step-number 1 --total-steps 3 --next-step-required --findings "Found increasing heap usage over 24h" --confidence medium

Returns: session_id = "mem-leak-xyz789"

Day 2 - Resume with preserved context:

model-chorus thinkdeep --session-id "mem-leak-xyz789" --step "Trace heap allocations to identify source" --step-number 2 --total-steps 3 --next-step-required --findings "Identified unclosed database connections" --confidence high

State Inspection

What Gets Preserved:

State Element Preserved? Notes
Hypothesis history ✅ Yes All hypotheses and revisions
Findings ✅ Yes Complete findings log
Files checked ✅ Yes Full file examination history
Confidence levels ✅ Yes Progression tracking
Step history ✅ Yes All investigation steps
Model context ✅ Yes Full conversation thread

What Gets Reset:

When starting a new investigation (no session_id):

  • Hypothesis state (starts fresh)
  • Findings log (empty)
  • Confidence level (begins at exploring)
  • Files checked (new list)
  • Step counter (resets to 1)

Best Practices

When to Use Continuation:

  • ✅ Multi-step investigation requiring state persistence
  • ✅ Building on previous findings
  • ✅ Resuming after pause or break
  • ✅ Complex analysis spanning multiple sessions
  • ✅ Hypothesis refinement based on new evidence

When to Start Fresh:

  • ✅ Completely different investigation
  • ✅ Previous hypothesis proven wrong (branch instead)
  • ✅ Investigation scope changed significantly
  • ✅ Context window becoming saturated
  • ✅ Need clean slate for new approach

Continuation Examples

Example 1: Debugging Race Condition

Step 1 - Initial symptoms:

model-chorus thinkdeep --step "Users report intermittent 401 errors" --step-number 1 --total-steps 4 --next-step-required --findings "Error rate: 5% of requests, no pattern found in logs" --confidence exploring --hypothesis "Unknown cause - investigate auth flow"

Returns: session_id = "auth-race-001"

Step 2 - Investigate auth flow:

model-chorus thinkdeep --session-id "auth-race-001" --step "Examine token validation sequence" --step-number 2 --total-steps 4 --next-step-required --findings "Token checked before async validation completes" --confidence medium --hypothesis "Race condition in token validation"

Step 3 - Verify hypothesis:

model-chorus thinkdeep --session-id "auth-race-001" --step "Trace async execution order" --step-number 3 --total-steps 4 --next-step-required --findings "Missing await causes request to proceed before validation" --confidence high --hypothesis "Confirmed: race condition due to missing await"

Step 4 - Verify fix:

model-chorus thinkdeep --session-id "auth-race-001" --step "Verify adding await resolves issue" --step-number 4 --total-steps 4 --findings "With await added, validation completes before auth check" --confidence very_high --hypothesis "Root cause: missing await in middleware"

Example 2: Architecture Decision

Step 1 - Analyze requirements:

model-chorus thinkdeep --step "Should we use microservices or monolith?" --step-number 1 --total-steps 3 --next-step-required --findings "Team size: 5 devs, expected scale: 10k users" --confidence exploring --hypothesis "Need to evaluate tradeoffs"

Returns: session_id = "arch-decision-002"

Step 2 - Continue analysis:

model-chorus thinkdeep --session-id "arch-decision-002" --step "Evaluate team experience and deployment complexity" --step-number 2 --total-steps 3 --next-step-required --findings "Team has limited k8s experience, deployment simplicity important" --confidence medium --hypothesis "Monolith may be better fit for team/scale"

Step 3 - Final recommendation:

model-chorus thinkdeep --session-id "arch-decision-002" --step "Consider future scaling and migration path" --step-number 3 --total-steps 3 --findings "Can start monolith, extract services later if needed" --confidence high --hypothesis "Monolith is optimal: simpler ops, team fit, migration path exists"

Basic Usage

Simple Example

Basic investigation with single step:

model-chorus thinkdeep --step "Investigate why API latency increased from 100ms to 2s" --step-number 1 --total-steps 1 --findings "Need to analyze recent deployment changes" --confidence exploring

Common Options

Required Parameters:

  • --step: Investigation step description
  • --step-number: Current step index (starts at 1)
  • --total-steps: Estimated total investigation steps
  • --findings: What was discovered in this step

Required Boolean Flags:

  • --next-step-required: Include this flag if more investigation steps are needed (omit for final step)

Optional Parameters:

  • --provider: AI provider to use (claude, gemini, codex, cursor-agent; default: claude)
  • --continuation-id / --continue / -c / --session-id: Resume previous investigation (all aliases work identically)
  • --hypothesis: Current working theory
  • --confidence: Confidence level (exploring, low, medium, high, very_high, almost_certain, certain)
    • --files-checked: List of files examined (file contents are read and included in AI prompt; missing files emit warnings and are skipped)
  • --thinking-mode: Reasoning depth (minimal, low, medium, high, max)

Technical Contract

Context Provided to the AI

When modelchorus thinkdeep is invoked via CLI, it's critical to understand what context is actually available to the AI model. The AI only sees what you explicitly provide via CLI parameters - there is no automatic repository awareness or environment context.

What the AI SEES

The AI receives a structured prompt containing:

  1. Investigation Context (from CLI parameters):

    • Step description (--step)
    • Current findings (--findings)
    • Hypothesis (--hypothesis, if provided)
    • Confidence level (--confidence)
    • Thinking mode (--thinking-mode)
    • Step number and total steps
  2. File Contents (ONLY if explicitly provided):

    • Files listed in --files-checked are READ from disk and their FULL CONTENT is included in the prompt
    • Format: Each file appears as:
      --- File: path/to/file.py ---
      [complete file contents here]
      --- End of path/to/file.py ---
      
    • If files cannot be read, errors are logged and they are skipped
  3. File References (names only):

    • Files listed in --relevant-files appear as a simple list of filenames
    • NO content is included - only the paths are mentioned
    • These serve as references but don't provide the AI with any actual code/content
  4. Investigation History (when continuing):

    • Previous investigation steps (last 3 steps shown)
    • Previous findings (truncated to 150 characters per step)
    • Previous confidence levels
    • Files examined previously (up to 10 filenames listed)
  5. Conversation History (when continuing):

    • Full conversation thread from previous steps via --session-id
    • All previous messages and responses in the investigation

What the AI DOES NOT SEE

The AI has NO access to:

  • ❌ Current working directory or file system structure
  • ❌ Git repository status, branches, or commit history
  • ❌ Environment variables or system configuration
  • ❌ Directory listings or file trees
  • ❌ Any files not explicitly provided via --files-checked
  • ❌ Contents of files mentioned in --relevant-files (only filenames)
  • ❌ Implicit project context or repository metadata

Critical Guidance for AI Agents

When invoking modelchorus thinkdeep, you MUST:

  1. Provide ALL necessary context explicitly via CLI parameters

    • Don't assume the AI "knows" about your repository structure
    • Don't assume the AI can "see" files you've examined
    • Everything must be passed as explicit parameters
  2. Use --files-checked to provide file contents:

    • This is the ONLY way to give the AI actual code/content to analyze
    • Files are read from disk and included in full in the prompt
    • Example: --files-checked "src/auth.py,src/models/user.py"
  3. Understand that --relevant-files provides names only:

    • This parameter lists filenames but does NOT include their contents
    • Use it to track which files are relevant to the investigation
    • If you want the AI to analyze a file, use --files-checked instead
  4. File paths must be valid and readable:

    • Paths are resolved relative to the current working directory
    • Invalid paths will cause errors and those files will be skipped
  5. Previous context may be truncated:

    • Only last 3 investigation steps are included
    • Findings from old steps are truncated to 150 characters
    • Re-state important context if it might have been lost

Example: What the AI Actually Sees

When you run:

model-chorus thinkdeep \
  --step "Analyze authentication flow" \
  --findings "Found token validation in auth.py" \
  --hypothesis "Token expiration logic may have race condition" \
  --confidence medium \
  --files-checked "src/auth.py" \
  --relevant-files "tests/test_auth.py"

The AI receives:

  • Text: "Analyze authentication flow"
  • Text: "Found token validation in auth.py"
  • Text: "Token expiration logic may have race condition"
  • Text: "medium"
  • Full content of src/auth.py (read from disk)
  • Text: "tests/test_auth.py" (filename only, NO content)

The AI does NOT receive:

  • Your current directory path
  • Git status or branch information
  • Contents of tests/test_auth.py (only the filename is mentioned)
  • Any other files in the repository
  • Directory structure or project layout

Parameters

Required:

  • --step (string): Current investigation step description - What you're investigating in this step - Should be clear and focused on specific aspect
  • --step-number (integer): Current step index - Starts at 1 - Tracks position in investigation sequence
  • --total-steps (integer): Estimated total investigation steps - Can be adjusted as investigation progresses - Helps track overall progress
  • --findings (string): Discoveries from this step - Evidence, observations, and insights - Builds investigation knowledge base

Required Boolean Flags:

  • --next-step-required (flag): Include this flag to indicate more investigation steps are needed - Omit this flag when investigation is complete (final step) - Flag takes no value; its presence means "continue", absence means "stop"

Optional:

  • --provider (string): AI provider to use for investigation - Valid values: claude, gemini, codex, cursor-agent - Default: claude
  • --continuation-id / --continue / -c / --session-id (string): Session ID to resume previous investigation - All aliases work identically - Format: thinkdeep-{uuid} - Maintains full investigation history
  • --hypothesis (string): Current working theory about the problem - Should evolve as evidence accumulates - Can be revised or replaced in subsequent steps
  • --confidence (string): Confidence level in current hypothesis - Valid values: exploring, low, medium, high, very_high, almost_certain, certain - Default: exploring - Should increase as evidence strengthens
    • --files-checked (string): Comma-separated list of files to read and include in AI prompt - File contents are read from disk and included in full - This is the ONLY way to provide file contents to the AI - Format: file1.py,file2.js,file3.go - Invalid paths generate warnings and are skipped
  • --relevant-files (string): Comma-separated list of files relevant to findings - Only filenames are included, NOT file contents - Use this to track related files without including their content - To provide content to the AI, use --files-checked instead - Format: src/auth.py,tests/test_auth.py - Invalid paths cause a CLI error
  • --relevant-context (string): Comma-separated list of methods/functions involved - Specific code locations identified - Format: authenticate,validate_token,check_permissions
  • --issues-found (string): JSON array of issues with severity levels - Format: [{"severity":"high","description":"..."},...]
  • --thinking-mode (string): Reasoning depth for investigation - Valid values: minimal, low, medium, high, max - Default: medium - Higher modes for complex problems
  • --output (string): Path to save JSON output file - Creates or overwrites file at specified path
  • --verbose (boolean): Enable detailed execution information - Default: false - Shows provider details and timing

Return Format

The THINKDEEP workflow returns a JSON object with the following structure:

{
  "result": "Investigation analysis and next steps from the AI model...",
  "session_id": "thinkdeep-abc-123-def-456",
  "metadata": {
    "provider": "claude",
    "model": "claude-3-5-sonnet-20241022",
    "step_number": 2,
    "total_steps": 4,
    "next_step_required": true,
    "confidence": "medium",
    "hypothesis": "API latency caused by database connection pooling issue",
    "findings": "Found N+1 query pattern in recent ORM changes",
    "files_checked": ["src/models/user.py", "src/db/connection.py"],
    "relevant_files": ["src/db/connection.py", "config/database.yml"],
    "relevant_context": ["get_connection", "execute_query"],
    "issues_found": [
      {
        "severity": "high",
        "description": "N+1 query pattern in user listing endpoint"
      }
    ],
    "thinking_mode": "high",
    "timestamp": "2025-11-07T10:30:00Z"
  }
}

Field Descriptions:

Field Type Description
result string Investigation analysis, findings, and guidance for next steps
session_id string Session ID for continuing this investigation (format: thinkdeep-{uuid})
metadata.provider string The AI provider used for this investigation step
metadata.model string Specific model version used by the provider
metadata.step_number integer Current step number in the investigation sequence
metadata.total_steps integer Total estimated steps for complete investigation
metadata.next_step_required boolean Whether the investigation continues (true) or is complete (false)
metadata.confidence string Confidence level in the current hypothesis (exploring through certain)
metadata.hypothesis string Current working theory about the problem being investigated
metadata.findings string Cumulative findings and evidence from this step
metadata.files_checked array[string] List of files examined during investigation
metadata.relevant_files array[string] Files identified as relevant to the problem
metadata.relevant_files_this_step array[string] Relevant files supplied for the current step (validated paths)
metadata.relevant_context array[string] Methods/functions identified as involved in the issue
metadata.issues_found array[object] Issues discovered with severity levels and descriptions
metadata.thinking_mode string Reasoning depth used (minimal, low, medium, high, max)
metadata.timestamp string ISO 8601 timestamp of when this step was processed

Usage Notes:

  • Always save the session_id to continue multi-step investigations
  • The investigation accumulates context across steps through the session
  • Adjust total_steps estimate as the investigation reveals complexity
  • Confidence should generally increase as evidence accumulates
  • Set next_step_required: false only when the investigation is truly complete
  • The result contains both analysis of findings and guidance for next investigation steps

Advanced Usage

With Model Selection

Specify which AI model to use for the investigation:

model-chorus thinkdeep --step "Analyze security vulnerability in auth flow" --step-number 1 --total-steps 3 --next-step-required --findings "Reviewing authentication middleware" --confidence exploring

With File Context

Include specific files relevant to the investigation:

model-chorus thinkdeep --step "Debug race condition in token validation" --step-number 2 --total-steps 3 --next-step-required --findings "Found async timing issue" --confidence medium --files-checked "src/auth/middleware.ts,src/services/token.ts"

Multi-Step Investigation

Conduct systematic multi-step investigation with continuation:

model-chorus thinkdeep --step "Initial analysis of memory leak" --step-number 1 --total-steps 4 --next-step-required --findings "Heap growing 50MB/hour" --confidence exploring

Then continue:

model-chorus thinkdeep --session-id "RETURNED_ID" --step "Trace allocation sources" --step-number 2 --total-steps 4 --next-step-required --findings "Database connection pool not releasing" --confidence medium

Adjusting Reasoning Depth

Control how deeply the model thinks about the problem:

model-chorus thinkdeep --step "Complex architectural decision" --step-number 1 --total-steps 1 --findings "Need thorough analysis" --confidence exploring --thinking-mode max

Options: minimal, low, medium (default), high, max

Expert Validation

Get independent expert review of investigation findings:

model-chorus thinkdeep --step "Final hypothesis verification" --step-number 3 --total-steps 3 --findings "Root cause identified" --confidence high --use-assistant-model

Best Practices

Investigation Planning

Start with clear problem statement:

  • ✅ "Users report intermittent 401 errors, 5% failure rate, no pattern in logs"
  • ❌ "Auth is broken"

Estimate steps realistically:

  • Simple bugs: 1-2 steps
  • Medium complexity: 3-5 steps
  • Complex investigations: 6-10 steps
  • Adjust total-steps as investigation progresses

Use appropriate confidence levels:

  • Start at exploring or low
  • Progress through evidence accumulation
  • Reach high or very_high before concluding
  • Use certain only when hypothesis is proven

Hypothesis Management

Form specific hypotheses:

  • ✅ "Race condition in async token validation due to missing await"
  • ❌ "Something wrong with auth"

Revise based on evidence:

  • Update hypothesis when new evidence contradicts it
  • Track hypothesis evolution through investigation
  • Document why hypothesis changed

Test hypotheses systematically:

  • Identify what evidence would support/refute
  • Gather that specific evidence
  • Evaluate fairly (avoid confirmation bias)

State Management

Use continuation for multi-step work:

  • Always save session_id from first step
  • Pass to subsequent steps to maintain state
  • Enables cross-session resume

Track files examined:

  • List all files checked in --files-checked
  • Prevents re-examining same files
  • Shows investigation coverage

Document findings clearly:

  • Be specific about what was found
  • Include relevant details (error messages, patterns, metrics)
  • Note what was ruled out

When to Stop Investigating

Stop when:

  • Confidence reaches high or above AND hypothesis explains all evidence
  • All relevant areas examined
  • Cost of further investigation exceeds value
  • Need input from domain expert or stakeholder

Don't stop when:

  • Confidence still at low or medium with unanswered questions
  • Hypothesis doesn't explain all observed behavior
  • Contradicting evidence exists
  • Investigation feels incomplete

Examples

Example 1: Quick Bug Investigation

Problem: Users seeing 500 errors on checkout

model-chorus thinkdeep --step "Investigate 500 errors in checkout flow" --step-number 1 --total-steps 1 --findings "Error: 'payment_processor timeout'. Third-party API latency spike to 30s." --confidence high --hypothesis "Payment provider experiencing outage, not our bug"

Single-step investigation, clear finding, high confidence → done.

Example 2: Multi-Step Performance Investigation

Step 1 - Initial analysis:

model-chorus thinkdeep --step "API latency increased from 100ms to 2s after deployment" --step-number 1 --total-steps 3 --next-step-required --findings "Latency affects all endpoints equally, started at 3pm deployment" --confidence low --hypothesis "Deployment introduced performance regression"

Step 2 - Narrow down cause:

model-chorus thinkdeep --session-id "perf-inv-001" --step "Examine deployment changes" --step-number 2 --total-steps 3 --next-step-required --findings "New logging middleware added, logs every request body. Bodies average 50KB." --confidence medium --hypothesis "Excessive logging causing I/O bottleneck"

Step 3 - Verify:

model-chorus thinkdeep --session-id "perf-inv-001" --step "Test hypothesis by disabling verbose logging" --step-number 3 --total-steps 3 --findings "Latency drops to 120ms with logging disabled" --confidence very_high --hypothesis "Confirmed: verbose body logging causing 20x slowdown"

Example 3: Security Audit

Investigation with expert validation:

model-chorus thinkdeep --step "Audit authentication flow for bypass vulnerabilities" --step-number 1 --total-steps 2 --next-step-required --findings "Token validation occurs before permission check. JWT expiry not verified in middleware." --confidence medium --hypothesis "Potential bypass: expired tokens may pass through"

Then verify with expert:

model-chorus thinkdeep --session-id "sec-audit-001" --step "Verify vulnerability hypothesis" --step-number 2 --total-steps 2 --findings "Confirmed: expired tokens accepted if permission check passes. Critical vulnerability." --confidence very_high --use-assistant-model

Troubleshooting

Issue: Investigation feels stuck

Symptoms:

  • Can't increase confidence past low or medium
  • Hypothesis keeps changing without progress
  • Findings don't lead anywhere

Solutions:

  • Branch investigation: start new thread exploring alternative hypothesis
  • Consult expert or stakeholder for domain knowledge
  • Widen search: examine adjacent systems/layers
  • Narrow focus: concentrate on one specific aspect
  • Take break: resume with fresh perspective using session_id

Issue: Confidence too low despite strong evidence

Symptoms:

  • Strong evidence supporting hypothesis
  • All tests pass, behavior explained
  • But still feel uncertain

Solutions:

  • Review all evidence systematically
  • Check for contradicting evidence
  • Verify hypothesis explains ALL observations
  • Consider if seeking perfect certainty (which is rare)
  • Use expert validation for independent assessment

Issue: Multi-step investigation losing context

Symptoms:

  • Later steps don't reference earlier findings
  • Repeating work from previous steps
  • Losing track of what's been examined

Solutions:

  • Always use --continuation-id (or --continue/-c/--session-id) for multi-step investigations
  • Include comprehensive findings in each step
  • Reference earlier findings explicitly: "Building on Step 2's discovery of X..."
  • Use --files-checked to track examination history
  • Review session_id state before continuing

Issue: Investigation taking too long

Symptoms:

  • 10+ steps without resolution
  • Total_steps keeps increasing
  • Confidence not progressing

Solutions:

  • Reassess hypothesis: may be fundamentally wrong
  • Narrow scope: focus on specific sub-problem first
  • Check if problem actually solvable through investigation
  • Consider if need different approach (experiments, monitoring, etc.)
  • Set confidence threshold for "good enough" conclusion

Progress Reporting

The THINKDEEP workflow automatically displays progress updates to stderr as it executes. You will see messages like:

Starting thinkdeep workflow...
✓ thinkdeep workflow complete

Important: Progress updates are emitted automatically - do NOT use BashOutput to poll for progress. Simply invoke the command and wait for completion. All progress information streams automatically to stderr without interfering with stdout.

Related Workflows

For single-turn analysis: Use CHAT - Simple conversational queries don't need multi-step investigation state.

For multiple perspectives: Use CONSENSUS - When you need diverse viewpoints on a problem rather than systematic investigation.

For structured debate: Use ARGUMENT - When pros/cons analysis more appropriate than hypothesis testing.

For brainstorming: Use IDEATE - When generating ideas rather than investigating existing problem.

Combining workflows:

  • Start with THINKDEEP to identify root cause
  • Then use CONSENSUS to decide on solution approach
  • Then use CHAT for implementation questions