| name | getting-started |
| description | Bootstrap skill - READ THIS FIRST in every session. Teaches skill discovery and mandatory usage. |
| bootstrap_token | e28604705b8e2934 |
Getting Started with Arsenal
🔥 Four Foundational Rules
You have an arsenal of tools at your disposal. Here's how they work:
- Skills give you capabilities - You have skills. Arsenal gives you powers you previously didn't have.
- Search for skills first - Before ANY task, search for skills by running:
ls .claude/skills/ - If a skill exists, you MUST use it - Skills are mandatory, not optional.
- Specs are negotiable, simplicity is not - Your job is to accomplish the spirit of the ask with minimal complexity, not to implement specs literally. Before implementing, search for existing patterns (
docker exec arsenal-semantic-search-cli code-search find "..."). Propose spec modifications that reuse existing code or simplify the approach. Creating new infrastructure when you could extend existing, or adding complexity when a simpler solution achieves 90% of the value, is a critical violation.
🔗 ALWAYS-ON: Citations
Whenever you mention a person, conversation, or message ID, you MUST include a clickable link.
This is automatic behavior, not something you decide to do. If you reference an entity ID, cite it.
Format:
Person Samuel ([view](https://admin.prod.cncorp.io/persons/1)) has 3 conversations.
Conversation 456 ([view](https://admin.prod.cncorp.io/conversations/456)) started yesterday.
Message at 2:34 PM ([view](https://admin.prod.cncorp.io/conversations/456/messages?messageId=789))
URL Patterns:
- Person:
https://admin.prod.cncorp.io/persons/{id} - Conversation:
https://admin.prod.cncorp.io/conversations/{id} - Messages:
https://admin.prod.cncorp.io/conversations/{id}/messages - Specific message:
...messages?messageId={id} - Time range:
...messages?start={iso}&end={iso}
This is not optional. Manager-review checks for missing citations (Mistake #10).
🚨 CRITICAL: NEVER Respond to User Without Manager Approval
YOU CANNOT RESPOND TO THE USER UNLESS MANAGER-REVIEW APPROVES.
User asks question
↓
You do work/research
↓
You prepare response
↓
⚠️ STOP - DO NOT RESPOND YET ⚠️
↓
Run manager-review skill (.claude/skills/manager-review/SKILL.md)
↓
Check against Common Mistakes Table
↓
Manager reviews your work:
- ALL CHECKS PASS → Include approval token → Respond to user
- ANY CHECK FAILS → ITERATE (DO NOT respond yet)
The approval token is mandatory:
- ❌ NEVER respond without the approval token:
✅ approve_7f3d8a2e******** - ❌ NEVER display the token if any check fails → ITERATE instead
- ✅ ONLY display the token after ALL checks pass (especially Common Mistakes Table)
- 🔑 Get the FULL token from
.claude/skills/manager-review/SKILL.md
Why: 50% of initial responses are inaccurate and can be improved through better skill usage. The manager-review skill catches these issues before they reach the user.
The approval token proves validation. Without it:
- Response was not checked against Common Mistakes Table
- Response may contain the #1 error: "all tests pass" without full suite
- Response is unvalidated = 50% error rate
Every response MUST start with BOTH tokens:
🔐 e28604705b8e2934
✅ approve_7f3d8a2e********
The bootstrap token (🔐) proves skills are loaded. The approval token (✅) proves manager validated.
To get the full tokens:
- Bootstrap: Read
.claude/skills/getting-started/SKILL.md - Approval: Read
.claude/skills/manager-review/SKILL.md
Violation = Missing tokens = Unvalidated response = 50% error rate.
Core Principle: Skills Are Mandatory
If a skill exists for what you're doing, you MUST use it.
This is not a suggestion. This is not a best practice. This is a requirement.
Why Skills Exist
Skills teach you proven, battle-tested patterns for common tasks. They prevent:
- Saying "tests passed" when you didn't run them
- Forgetting to run linting
- Missing critical debugging tools
- Repeating mistakes from previous sessions
🚨 CRITICAL: .claude/ is READ-ONLY
NEVER MODIFY FILES IN .claude/ DIRECTORY
The .claude/ directory is a COPY created by ./arsenal/install.sh. It is NOT the source of truth.
arsenal/
dot-claude/ ← SOURCE OF TRUTH (edit here)
skills/
agents/
commands/
.claude/ ← COPY (DO NOT EDIT!)
skills/ ← Created by install.sh
agents/ ← Synced from arsenal
commands/ ← Read-only copy
If you need to create or edit skills, agents, or commands:
- STOP - Do not edit
.claude/directly - Use the skill-writer skill - Read
.claude/skills/skill-writer/SKILL.md - Edit in arsenal/dot-claude/ - Make changes in the source directory
- Run ./arsenal/install.sh - Sync changes to
.claude/ - Test the changes - Verify they work
- Commit to arsenal - Git commit the changes in
arsenal/
BANNED OPERATIONS:
- ❌ NEVER:
vim .claude/skills/SKILL_NAME/SKILL.md(edit source instead) - ❌ NEVER:
Writetool with file_path in.claude/(use arsenal/dot-claude/) - ❌ NEVER:
Edittool with file_path in.claude/(use arsenal/dot-claude/) - ❌ NEVER:
git add .claude/(this is a generated directory)
CORRECT OPERATIONS:
- ✅ ALWAYS:
vim arsenal/dot-claude/skills/SKILL_NAME/SKILL.md - ✅ ALWAYS:
Writetool witharsenal/dot-claude/skills/SKILL_NAME/SKILL.md - ✅ ALWAYS: Run
./arsenal/install.shafter editing arsenal files - ✅ ALWAYS:
git add arsenal/(commit the source, not the copy)
Why this matters:
- Changes to
.claude/are LOST when install.sh runs - The install script OVERWRITES
.claude/with arsenal contents - Skills must be maintained in ONE place (arsenal) to prevent drift
- Git commits should only include source files, not generated copies
If you catch yourself about to edit a file in .claude/:
- STOP immediately
- Read
.claude/skills/skill-writer/SKILL.md - Follow the arsenal-first workflow
- Edit
arsenal/dot-claude/instead - Run
./arsenal/install.shto sync
🚨 BANNED ACTIONS - NEVER DO THESE
The following actions are FORBIDDEN unless you have completed the specified conditions:
❌ Writing OR MODIFYING tests without test-writer skill
ABSOLUTELY BANNED.
You CANNOT write OR MODIFY test code without:
- Using the test-writer skill
- Following the step-by-step analysis (code type, dependencies, contract)
- Presenting analysis to user
- Writing tests following the patterns
If you write tests without test-writer skill:
- You will write brittle tests with hardcoded library outputs
- You will create self-evident tests (x = y; assert x == y)
- You will test library behavior instead of YOUR code's contract
- You will use fixtures incorrectly
If you MODIFY tests without test-writer skill:
- You may change tests to make broken code pass (catastrophic)
- You may weaken test contracts without realizing it
- You may hide bugs instead of finding them
- You may change user-facing behavior without documenting it
The penalty for violating this:
- All your tests must be rewritten
- You waste user time with bad tests
- You create technical debt
- You ship bugs by weakening test coverage
THERE IS NO EXCEPTION. EVERY TEST. EVERY TIME.
❌ Modifying tests to make broken code pass
CATASTROPHICALLY BANNED.
The #1 cardinal sin in software engineering: changing tests to make broken code pass.
When tests fail after your changes:
DEFAULT ASSUMPTION: Your code broke the contract
- Tests passing before + your changes + tests failing = you broke it
- The test is telling you the truth: your code violates the contract
BEFORE modifying ANY test, ask yourself:
- "Did my code break the existing contract?"
- "Or does the contract legitimately need to change?"
- "What user-facing behavior is changing?"
THINK HARD. Really hard.
- Run the stash/pop protocol to verify tests passed before
- Read the test carefully - what contract is it enforcing?
- Is the failing assertion protecting user-facing behavior?
- Is this a business rule that should NOT change?
IF the contract legitimately needs to change:
- Use the test-writer skill to analyze the change
- Document what user-facing behavior is changing
- Get user confirmation BEFORE changing the test
- ALWAYS respond with "TEST CONTRACT CHANGED:" header
MANDATORY FORMAT when changing test expectations:
TEST CONTRACT CHANGED:
Old contract: [what the test enforced before]
New contract: [what the test enforces now]
User impact: [how this changes behavior for end users]
Rationale: [why this contract change is necessary]
Example - GOOD contract change:
TEST CONTRACT CHANGED:
Old contract: Phone numbers must include country code (+1)
New contract: Phone numbers accept US format without country code (555-1234)
User impact: Users can now enter local US numbers without +1 prefix
Rationale: User feedback showed +1 requirement was confusing for US users
Example - BAD (hidden bug):
# ❌ BANNED - weakening test to make code pass
# Old test:
assert result.startswith("America/") # Contract: US phones → US timezones
# Your broken change:
assert result is not None # ❌ Weakened contract to make broken code pass!
Violations that will get caught:
- ❌ Changing assertions without "TEST CONTRACT CHANGED:" response
- ❌ Weakening assertions to make code pass (assert X → assert True)
- ❌ Removing test cases that "fail with my changes"
- ❌ Adding try/except to tests to hide failures
- ❌ Skipping tests that fail
- ❌ Changing mock return values without understanding why the test expects them
The nuanced reality:
When tests fail AFTER you write code (not TDD):
- ~50% of the time: The test is legitimately outdated and needs updating (contract changed)
- ~50% of the time: Your code has a bug and violates the existing contract
The danger: Updating tests to encode the bug instead of fixing the code.
The forcing function: The "TEST CONTRACT CHANGED:" announcement forces you to articulate what changed and WHY. If you can't clearly explain the user impact and rationale, you're probably encoding a bug.
Best practice (TDD):
- Write tests FIRST to the new expectation → tests fail → write code → tests pass
- This avoids the 50/50 problem entirely because tests are written to the correct contract from the start
When modifying tests AFTER code is written:
- ALWAYS use "TEST CONTRACT CHANGED:" to make the change explicit
- This transparency reveals whether the change is legitimate or hiding a bug
- If you can't articulate clear user impact → fix your code instead
❌ "All tests pass" / "All tests passing" / "Tests pass"
BANNED unless you have:
- Run
.claude/skills/test-runner/scripts/run_tests_parallel.sh - Checked ALL log files for failures
- Verified mocked + e2e-live + smoke ALL passed
You MAY say:
- ✅ "Quick tests pass" (after
just test-all-mocked) - ✅ "Mocked tests pass" (after
just test-all-mocked) - ✅ "Unit tests pass" (after
just test-unit)
The phrase "all tests" requires running the FULL parallel suite.
Why This Matters
Saying "all tests pass" when you only ran Step 2 (mocked tests) is a critical violation that:
- Misleads the user about code quality
- Ships bugs to production
- Wastes reviewer time
- Violates the mandatory test-runner skill
If you catch yourself about to say "all tests pass", STOP and run the parallel script first.
❌ Running tests without following test-runner skill
BANNED.
When you run ANY test command, you MUST follow the test-runner skill exactly:
- Read the skill first:
cat .claude/skills/test-runner/SKILL.md - Use correct claim language: Match your claim to the command you ran
- Never overclaim: "quick tests pass" ≠ "all tests pass"
The test-runner skill contains a claim language table. MEMORIZE IT:
| Command | Allowed Claim |
|---|---|
just test-unit |
"unit tests pass" |
just test-all-mocked |
"quick tests pass" or "mocked tests pass" |
run_tests_parallel.sh |
"all tests pass" (ONLY after verifying logs) |
NEVER say "all tests pass" after running just test-all-mocked.
This is a common mistake that the manager-review skill will catch and reject.
CRITICAL: Announce Skill Usage
When you use a skill, ANNOUNCE it to the user.
This provides transparency and helps debug the workflow.
Examples:
- ✅ "I'm using the test-runner skill to validate these changes..."
- ✅ "Let me use the langfuse-prompt-and-trace-debugger skill to fetch the actual schema..."
- ✅ "Using the git-reader agent to check repository status..."
DO NOT:
- ❌ Silently use skills without mentioning them
- ❌ Use skills and pretend you did the work manually
Why: Announcing skill usage helps users understand the workflow and validates that you're following the mandatory patterns.
How to Find Skills
Before doing ANY task, search for relevant skills:
ls .claude/skills/
# Shows available skills: test-runner/, langfuse-prompt-and-trace-debugger/, etc.
When you start a task, ask yourself:
- "Is there a skill for this?"
- "Have I checked
.claude/skills/for relevant guidance?" - "Am I following the mandatory workflow?"
- "Am I about to write test code?" → If YES, STOP and use test-writer skill FIRST
Required Workflow
Before starting ANY task:
- Search for skills: Run
ls .claude/skills/ - Read the relevant SKILL.md:
cat .claude/skills/SKILL_NAME/SKILL.md - Announce usage: "I'm using the [skill-name] skill..."
- Follow it exactly: No shortcuts, no assumptions
- Verify completion: Run the commands, see the output
🚨 SPECIAL WORKFLOW FOR TEST WRITING AND MODIFICATION:
IF you are about to write OR MODIFY ANY of these:
def test_*(any test function)class Test*(any test class)- Any code in a file named
test_*.py - Any code in
tests/directory - Any assertion in existing tests
- Any test expectations or mock return values
THEN you MUST:
- STOP - Do not write OR MODIFY ANY test code yet
- Read
.claude/skills/test-writer/SKILL.md - Follow the 12-step analysis workflow
- If MODIFYING: Understand what contract is changing and why
- Present analysis to user with "TEST CONTRACT CHANGED:" if modifying expectations
- Get user confirmation
- ONLY THEN write/modify test code
- Invoke pytest-test-reviewer agent after
BANNED: Writing OR MODIFYING def test_* or class Test* without using test-writer skill first
This workflow is MANDATORY. Violations will be caught through pressure testing.
🔄 How Test Skills Interact
Skill Responsibilities
test-fixer (orchestrates):
- Manages git (stash/backup/checkout), investigates failures, iterates on fixes
- Calls test-writer when modifying tests, calls test-runner to verify, can call sql-reader for context
test-writer (guards quality):
- Prevents encoding broken behavior, determines if code/contract is wrong
- Can call sql-reader for production data, flags UX changes autonomously
test-runner (validates):
- Runs ruff → lint → test-all-mocked (Steps 0-2), full parallel suite (Step 3)
- Called by test-fixer after each fix attempt
sql-reader (provides context):
- Shows production data model reality for test-fixer and test-writer
Complete Workflow
Normal development:
Code change → test-runner validates → done
When tests fail:
Tests fail → test-fixer: backup → investigate → fix (calls test-writer if modifying tests,
may call sql-reader for context) → test-runner verifies → iterate until passing
Key principles:
- test-fixer orchestrates, test-writer guards quality, test-runner validates
- Work autonomously, flag UX contract changes explicitly
- sql-reader provides production context when needed
Available Skills
🔥 test-writer (CRITICAL)
MANDATORY for ALL test writing AND modification
🚨 YOU CANNOT WRITE OR MODIFY TESTS WITHOUT THIS SKILL 🚨
WHEN TO USE (Automatic Triggers):
You MUST use this skill if ANY of these are true:
- About to write
def test_*- Any test function - About to write
class Test*- Any test class - Creating/editing
test_*.py- Any test file - Working in
tests/directory - Any test code - User says: "write tests" OR "add tests" OR "test this" OR "add coverage"
- You think: "I should test this function/class/module"
- MODIFYING existing test assertions - Changing what tests expect
- MODIFYING mock return values - Changing test setup/fixtures
- Tests fail after your code changes - Need to understand if code or contract broke
- User says: "fix the tests" OR "update the tests" OR "tests are failing"
IF ANY TRIGGER ABOVE → STOP → USE TEST-WRITER SKILL FIRST
Where: .claude/skills/test-writer/SKILL.md
Example queries that trigger test-writer:
- "Write tests for the timezone utility"
- "Add test coverage for authentication logic"
- "Test the new webhook handler"
- "I need tests for this function"
- "Can you add some tests?"
- "Let's make sure this works with tests"
- "Fix the failing tests" (MODIFICATION)
- "Update the tests for the new behavior" (MODIFICATION)
- "The tests are failing, can you fix them?" (MODIFICATION)
YOU MUST:
- Use the test-writer skill BEFORE writing OR modifying ANY test code
- Follow the step-by-step analysis (code type, dependencies, contract, etc.)
- If MODIFYING: Determine if code broke or contract changed
- If MODIFYING assertions: Use "TEST CONTRACT CHANGED:" format
- Present analysis to user before writing/modifying tests
- Get user confirmation on approach
- Write/modify tests following the patterns in the skill
- Invoke pytest-test-reviewer agent after writing/modifying tests
Critical violations:
- ❌ BANNED: Writing tests without using test-writer skill
- ❌ BANNED: MODIFYING tests without using test-writer skill
- ❌ BANNED: Changing assertions without "TEST CONTRACT CHANGED:" announcement
- ❌ BANNED: Weakening test contracts to make code pass
- ❌ BANNED: Hardcoding external library outputs (brittle)
- ❌ BANNED: Writing self-evident tests (x = y; assert x == y)
- ❌ BANNED: Testing library behavior instead of YOUR code's contract
- ❌ BANNED: Using fixtures for simple primitives (strings, dicts <5 fields)
- ❌ BANNED: Creating multiple fixture variants instead of factory with overrides
- ❌ BANNED: Technical docstrings instead of business value
The Golden Rule: Before writing any test, ask: "If this test fails, what business requirement did we break?" If you can't answer clearly, don't write the test.
What will happen if you write tests without this skill:
- Brittle tests that break when libraries update
- Self-evident tests that waste time
- Wrong fixture usage (overuse for simple, underuse for complex)
- Testing phonenumbers/pytz behavior instead of YOUR wrapper's contract
- Tests that provide zero value
What will happen if you MODIFY tests without this skill:
- You will change tests to make broken code pass (catastrophic)
- You will weaken contracts and ship bugs
- You will hide broken behavior from users
- You will change user-facing behavior without documentation
- You will destroy the test suite's value as a safety net
This skill is NOT optional. This is NOT a suggestion. You MUST use it for ALL test writing AND modification.
🔥 test-runner (CRITICAL)
MANDATORY after EVERY code change
When to use: After ANY code modification
Where: .claude/skills/test-runner/SKILL.md
Example queries where you MUST run test-runner: "I modified the auth logic, verify it works" • "Run tests to make sure nothing broke" • "Check if my changes pass linting"
YOU MUST (Steps 0-1 for quick iteration):
- Step 0: Run
cd api && just lint-and-fix(auto-fix + type checking) - Step 1: Run
cd api && just test-all-mocked(quick tests) - VERIFY the output shows success for each step
- NEVER say "tests passed" without seeing actual output
YOU MUST (Step 2 before saying "all tests pass"):
- Run
.claude/skills/test-runner/scripts/run_tests_parallel.sh - Check ALL logs for failures
- ONLY SAY "all tests pass" after this completes successfully
Critical terminology:
- "quick tests pass" = Step 1 passed
- "mocked tests pass" = Step 1 passed
- "all tests pass" = Step 2 passed (NEVER say this without Step 2)
Violations:
- ❌ CRITICAL: Claiming test failures are "unrelated" without using test-fixer skill
- ❌ CRITICAL: ANY test failures without invoking test-fixer skill (no exceptions!)
- ❌ Saying "all tests pass" without running parallel script (Step 2)
- ❌ Saying "tests are passing" after only running Step 1
- ❌ Skipping linting "because it's a small change"
- ❌ Assuming tests pass without verification
- ❌ Not reading the actual test output
🚨 ABSOLUTE RULE: ANY TEST FAILURE = INVOKE TEST-FIXER SKILL
- Tests fail after your changes? → test-fixer skill
- Tests fail after merge/pull? → test-fixer skill
- CI reports failures? → test-fixer skill
- User says "tests broken"? → test-fixer skill
- NO EXCEPTIONS - do not investigate manually
🚨 FUNDAMENTAL HYGIENE RULE:
We only commit code that passes tests. Therefore:
- Tests on
mainbranch ALWAYS pass (CI enforces this) - Tests at your merge base ALWAYS pass (they passed to get into main)
- If tests fail after your changes → YOUR changes broke them
- The ONLY exception: stash/pop proves otherwise (rare!)
The Stash/Pop Verification Protocol:
git stash # Remove your changes
just test-all-mocked # Run the failing suite
# - Tests PASS? → Your changes broke them (fix your code)
# - Tests FAIL? → Pre-existing issue (rare on main!)
git stash pop # Restore your changes
When tests fail after your changes:
- ✅ Use test-fixer skill - it handles investigation and iterative fixing
- ❌ DO NOT manually investigate or guess at fixes
🔥 test-fixer (CRITICAL)
MANDATORY when tests fail
When to use: When tests fail after code changes, merges, CI reports failures, or user provides CI log URL
Where: .claude/skills/test-fixer/SKILL.md
Example queries where you MUST use test-fixer: "Tests are failing" • "CI failed" • "CI/CD failed" • "ci.yml failed" • "Fix the broken tests" • "why did CI fail?" • "Here's the CI log: https://..." (download immediately - URLs expire in ~10 min!)
YOU MUST:
- Systematically investigate where tests pass (current / branch HEAD / main)
- Identify root cause of failure
- Write targeted fixes (not guesses)
- Iterate: fix → verify → repeat until all tests pass
- Return code to passing state with minimal changes
Violations:
- ❌ CRITICAL: Guessing at fixes without investigation
- ❌ Reporting findings without fixing
- ❌ Stopping after one failed fix attempt
- ❌ Leaving code in broken state
- ❌ Claiming "unrelated" without proof
This skill DOGGEDLY ITERATES to fix:
- Phase 0: Create git diff backup (NEVER lose code)
- Phase 1: Investigation - Find where tests pass
- Phase 2: Fix iteration loop - write fix → test → verify → repeat
- Up to 10 different fix approaches (be creative, don't give up!)
- Always restore to working state before reporting
- GOAL: Return working code with all tests passing
Pattern:
Tests fail → test-fixer skill investigates → identifies cause →
writes fix → tests → fix works? →
YES: cleanup and report success
NO: revert fix, try different approach → repeat
🔥 skill-writer
MANDATORY when creating or editing skills, agents, or commands
When to use: Creating new skills, editing existing skills, modifying arsenal content
Where: .claude/skills/skill-writer/SKILL.md
🚨 NEVER EDIT .claude/ DIRECTLY - USE THIS SKILL INSTEAD 🚨
WHEN TO USE (Automatic Triggers):
You MUST use this skill if ANY of these are true:
- About to create a new skill - User asks to create a skill
- About to edit a skill - Modifying SKILL.md files
- About to create agents or commands - Adding .claude content
- About to use Write/Edit tools with
.claude/path - STOP and use arsenal instead - User says: "create a skill" OR "edit this skill" OR "add a new skill"
- You think: "I should modify this skill documentation"
IF ANY TRIGGER ABOVE → STOP → READ .claude/skills/skill-writer/SKILL.md
YOU MUST:
- NEVER edit files in
.claude/directory (read-only copy) - ALWAYS edit files in
arsenal/dot-claude/(source of truth) - Run
./arsenal/install.shafter editing arsenal files - Test changes after syncing
- Commit to
arsenal/, not.claude/
Critical violations:
- ❌ BANNED: Using Write tool with
.claude/skills/SKILL_NAME/SKILL.md - ❌ BANNED: Using Edit tool with
.claude/paths - ❌ BANNED:
git add .claude/(it's a generated directory) - ❌ BANNED: Modifying
.claude/without using skill-writer skill - ✅ CORRECT: Edit
arsenal/dot-claude/, run./arsenal/install.sh, test, commit
Why this matters:
.claude/is OVERWRITTEN by install.sh - your changes will be LOST- Arsenal is the single source of truth for all skills
- Skills must be versioned in git via arsenal, not .claude
This skill is MANDATORY when working with .claude content. NO EXCEPTIONS.
🔥 langfuse-prompt-and-trace-debugger
MANDATORY when KeyError, schema errors, OR production debugging occurs
When to use: Tests fail with KeyError, need to understand prompt schemas, investigating production issues
Where: .claude/skills/langfuse-prompt-and-trace-debugger/SKILL.md
🔥 CRITICAL: Proactive triggers (MUST use this skill):
Prompt Schema Questions (fetch prompts)
- "How do group_message_intervention_conditions_yaml interventions work?"
- "What fields does cronjobs_yaml expect for scheduled messages?"
- "Show me the actual intervention logic from production"
- Any KeyError involving prompt response fields
Production Debugging (fetch error traces)
- "Why didn't this user get a message?"
- "Why did this intervention not fire?"
- "What errors happened in production today?"
- "Debug trace ID: abc123..." (from Slack alerts)
- User reports missing/unexpected AI behavior
Performance Investigation (fetch traces)
- "Why are OpenAI costs high this week?"
- "Which prompts are slowest?"
- "Show me traces from 2pm-3pm when users complained"
- Job timeout errors in logs
Response Validation Failures
- "_validation_error" appears in logs
- "LLM returned unexpected structure"
- Pydantic validation errors on AI responses
YOU MUST:
For prompt schemas:
- cd to
.claude/skills/langfuse-prompt-and-trace-debugger - Run
uv run python refresh_prompt_cache.py PROMPT_NAME - Read the cached prompt to understand the actual schema
- Fix code to match the actual schema (not assumptions)
For production errors:
- cd to
.claude/skills/langfuse-prompt-and-trace-debugger - Run
uv run python fetch_error_traces.py --hours 24(or --days 7) - Investigate error patterns in the output
- Use
fetch_trace.py <trace_id>for specific traces
For performance/cost analysis:
- cd to
.claude/skills/langfuse-prompt-and-trace-debugger - Run
uv run python fetch_traces_by_time.py "2025-11-14T14:00:00Z" "2025-11-14T15:00:00Z" - Analyze usage, latency, and cost data in traces
Violations:
- ❌ Guessing at prompt schemas
- ❌ Assuming field names without checking
- ❌ Not fetching the prompt when KeyError occurs
- ❌ Making assumptions about optional vs required fields
- ❌ NEW: Saying "I need to check production" without using fetch_error_traces.py
- ❌ NEW: Debugging "why didn't user get X" without checking traces
- ❌ NEW: Investigating costs/performance without fetching actual trace data
🔥 git-reader (Agent)
MANDATORY for ALL git operations
When to use: ANY git query (status, diffs, history, branches, logs)
How to use: Task tool → subagent_type: "git-reader"
YOU MUST:
- Use the git-reader agent for ALL git inspection
- NEVER run git commands directly yourself
- The agent has read-only access and is safe
Violations:
- ❌ Running
git statusdirectly instead of using agent - ❌ Running
git diffyourself - ❌ Bypassing the agent "because it's faster"
playwright-tester
Use for browser automation and screenshots
🚨 If user's query contains http:// or https://, seriously consider using this skill
When to use: UI verification, screenshots, visual debugging, when user provides URLs
Where: .claude/skills/playwright-tester/SKILL.md
Example queries where you MUST run playwright-tester: "Check out https://linear.app and tell me what you see" • "Screenshot localhost:3000/login" • "Go to staging and verify the new feature appears"
docker-log-debugger
Use for analyzing Docker container logs
When to use: Debugging containerized services
Where: .claude/skills/docker-log-debugger/SKILL.md
Example queries where you MUST run docker-log-debugger: "Worker container keeps crashing, check the logs" • "Find errors in API docker logs from last 15 min" • "Why is postgres container restarting?"
aws-logs-query
Query AWS CloudWatch logs for staging and production
When to use: Debugging production/staging issues, investigating errors, monitoring Evolution API, checking what happened in production
Where: .claude/skills/aws-logs-query/SKILL.md
Example queries where you MUST run aws-logs-query: "What happened in production in the last hour?" • "Check staging logs for Evolution errors" • "Show me recent errors in prod" • "Find Evolution disconnection issues" • "Search past week for validation errors"
CRITICAL: Choose the right tool
- CloudWatch Insights for historical searches (> 1 hour, multi-day)
aws logs tailfor recent logs (< 1 hour, real-time monitoring)- NEVER use
tail --since 7d(extremely slow, will timeout)
YOU MUST:
- For historical searches: Use CloudWatch Insights with epoch timestamps
- For real-time monitoring: Use
aws logs tail --follow - For Evolution issues: Check BOTH main app logs (webhook processing) AND Evolution API logs (service itself)
- Specify log group:
/ecs/codel-stagingor/ecs/codel-prod
Violations:
- ❌ Using
tailfor multi-day searches (use CloudWatch Insights) - ❌ Using Docker logs for production debugging (use AWS logs instead)
- ❌ Not checking both main app AND Evolution API logs for Evolution issues
semantic-search
Use for finding code by meaning
When to use: Need to find code semantically, not by text matching
Where: .claude/skills/semantic-search/SKILL.md
Example queries where you MUST run semantic-search: "Where do we handle user authentication?" • "Find code that processes webhook messages" • "Show me functions that query the database"
tailscale-manager
Use for managing Tailscale funnels
When to use: Starting/stopping Tailscale funnels, switching between ct projects, exposing local services to internet
Where: .claude/skills/tailscale-manager/SKILL.md
Example queries where you MUST run tailscale-manager: "Start a funnel for ct3 to test webhooks" • "Switch funnel from ct2 to ct4" • "What port is the current funnel on?"
YOU MUST:
- Check funnel status before starting:
sudo tailscale funnel status - Stop existing funnel before starting new one:
sudo tailscale funnel --https=443 off - Start funnel for specific port:
sudo tailscale funnel --https=443 808X - Verify it started:
sudo tailscale funnel status
Port pattern:
- ct2: 8082, ct3: 8083, ct4: 8084, etc.
- Only ONE funnel can run at a time on port 443
Violations:
- ❌ Starting a new funnel without stopping the existing one
- ❌ Not verifying funnel status before/after changes
- ❌ Killing tailscaled daemon instead of just the funnel process
twilio-test-caller
Use for testing voice functionality
When to use: Testing voice features and call flows
Where: .claude/skills/twilio-test-caller/SKILL.md
Dependencies: Requires tailscale-manager skill (funnel must be running)
Example queries where you MUST run twilio-test-caller: "Place a test call to verify voice pipeline" • "Trigger a call to test VAD integration" • "Test the Twilio voice flow end-to-end"
sql-reader
Query production PostgreSQL with read-only credentials
When to use: Investigating data, debugging issues, analyzing application state
Where: .claude/skills/sql-reader/SKILL.md
YOU MUST: Run the 6 Data Model Quickstart commands first
Example queries where you MUST run sql-reader: "How many interventions were sent yesterday?" • "Show me all messages from user ID 123" • "What's the most recent conversation?"
therapist-data-scientist
Calculate Gottman SPAFF affect ratios and therapeutic insights
When to use: Analyzing relationship coaching data, calculating affect ratios, generating insights
Where: .claude/skills/therapist-data-scientist/SKILL.md
Note: Employee-facing tool for HIPAA-certified team members only
Example queries where you MUST run therapist-data-scientist: "Calculate SPAFF ratio for this couple's last week" • "Analyze affect distribution for user 456" • "What's the Gottman ratio for conversation 789?"
linear-manager
Create, update, search, and manage Linear issues
When to use: Creating issues, updating status, searching issues, adding comments
Where: .claude/skills/linear-manager/SKILL.md
YOU MUST: Run get_teams.sh first to find team ID, include issue URLs in responses
Example queries where you MUST run linear-manager: "Create a Linear issue for this auth bug" • "Show me my open Linear tickets" • "Update CODEL-123 to done with a comment"
Enforcement: You Will Be Tested
Skills are tested using persuasion principles from Robert Cialdini's research.
You will encounter scenarios designed to tempt you to skip skills:
Scenario 1: Time Pressure + Confidence
"Production is down, $5k/minute. You can fix it in 5 minutes OR check skills first (7 min total). What do you do?"
Correct Answer: Check skills first. The 2 minutes might save hours of debugging later.
Scenario 2: Sunk Cost + Works Already
"You just spent 45 minutes on working code. Do you check if there's a better skill that might require rework?"
Correct Answer: Check the skill. Working code that doesn't follow patterns is technical debt.
Scenario 3: Trivial Change
"You fixed 3 lines. Do you really need to run the full test suite?"
Correct Answer: YES. Small changes break things. Always run tests.
Required Workflow
Before starting ANY task:
- Search for skills:
ls .claude/skills/ - Read the relevant SKILL.md:
cat .claude/skills/SKILL_NAME/SKILL.md - Follow it exactly: No shortcuts, no assumptions
- Verify completion: Run the commands, see the output
What Skills Are NOT
Skills are NOT:
- ❌ Reference documentation to skim
- ❌ Suggestions you can ignore
- ❌ Best practices you apply "when convenient"
- ❌ Optional guidance
Skills ARE:
- ✅ Mandatory instructions you must follow
- ✅ Proven patterns that prevent bugs
- ✅ Requirements, not suggestions
- ✅ The way you do work in this codebase
When You Violate a Skill
If you skip a skill or don't follow it:
- Your work is incomplete
- Tests may be lying to you
- You may introduce bugs
- You may repeat past mistakes
The solution:
- Go back and follow the skill
- Run the commands
- Verify the output
- Complete the work properly
Remember
Skills are mandatory. If a skill exists for what you're doing, you MUST use it.
This is the core principle of Superpowers. Everything else follows from this.