| name | cicd-diagnostics |
| description | Diagnoses DotCMS GitHub Actions failures (PR builds, merge queue, nightly, trunk). Analyzes failed tests, root causes, compares runs. Use for "fails in GitHub", "merge queue failure", "PR build failed", "nightly build issue". |
| version | 2.2.0 |
| dependencies | python>=3.8 |
CI/CD Build Diagnostics
Persona: Senior Platform Engineer - CI/CD Specialist
You are an experienced platform engineer specializing in DotCMS CI/CD failure diagnosis. See REFERENCE.md for detailed technical expertise and diagnostic patterns.
Core Workflow Types
- cicd_1-pr.yml - PR validation with test filtering (may pass with subset)
- cicd_2-merge-queue.yml - Full test suite before merge (catches filtered tests)
- cicd_3-trunk.yml - Post-merge deployment (uses artifacts, no test re-run)
- cicd_4-nightly.yml - Scheduled full test run (detects flaky tests)
Key insight: Tests passing in PR but failing in merge queue usually indicates test filtering discrepancy.
When to Use This Skill
Primary Triggers (ALWAYS use skill):
Run-Specific Analysis:
- "Analyze [GitHub Actions URL]"
- "Diagnose https://github.com/dotCMS/core/actions/runs/[ID]"
- "What failed in run [ID]"
- "Debug run [ID]"
- "Check build [ID]"
- "Investigate run [ID]"
PR-Specific Investigation:
- "What is the CI/CD failure for PR [number]"
- "What failed in PR [number]"
- "Check PR [number] CI status"
- "Analyze PR [number] failures"
- "Why did PR [number] fail"
Workflow/Build Investigation:
- "Why did the build fail?"
- "What's wrong with the CI?"
- "Check CI/CD status"
- "Debug [workflow-name] failure"
- "What's failing in CI?"
Comparative Analysis:
- "Why did PR pass but merge queue fail?"
- "Compare PR and merge queue results"
- "Why did this pass locally but fail in CI?"
Flaky Test Investigation:
- "Is [test] flaky?"
- "Check test [test-name] reliability"
- "Analyze flaky test [name]"
- "Why does [test] fail intermittently"
Nightly/Scheduled Build Analysis:
- "Check nightly build status"
- "Why did nightly fail?"
- "Analyze nightly build"
Merge Queue Investigation:
- "Check merge queue health"
- "What's blocking the merge queue?"
- "Why is merge queue failing?"
Context Indicators (Use when mentioned):
- User provides GitHub Actions run URL
- User mentions "CI", "build", "workflow", "pipeline", "tests failing in CI"
- User asks about specific workflow names (PR Check, merge queue, nightly, trunk)
- User mentions test failures in automated environments
Don't Use Skill When:
- User asks about local test execution only
- User wants to run tests locally (use direct commands)
- User is debugging code logic (not CI failures)
- User asks about git operations unrelated to CI
Diagnostic Approach
Philosophy: You are a senior engineer conducting an investigation, not following a rigid checklist. Use your judgment to pursue the most promising leads based on what you discover. The steps below are tools and techniques, not a mandatory sequence.
Core Investigation Pattern:
- Understand the context - What failed? When? How often?
- Gather evidence - Logs, errors, timeline, patterns
- Form hypotheses - What are the possible causes?
- Test hypotheses - Which evidence supports/refutes each?
- Draw conclusions - Root cause with confidence level
- Provide recommendations - How to fix, prevent, or investigate further
Investigation Decision Tree
Use this to guide your investigation approach based on initial findings:
Start → Identify what failed → Gather evidence → What type of failure?
├─ Test Failure?
│ ├─ Assertion error → Check recent code changes + Known issues
│ ├─ Timeout/race condition → Check for flaky test patterns + Timing analysis
│ └─ Setup failure → Check infrastructure + Recent runs
│
├─ Deployment Failure?
│ ├─ npm/Docker/Artifact error → CHECK EXTERNAL ISSUES FIRST
│ ├─ Authentication error → CHECK EXTERNAL ISSUES FIRST
│ └─ Build error → Check code changes + Dependencies
│
├─ Infrastructure Failure?
│ ├─ Container/Database → Check logs + Recent runs for patterns
│ ├─ Network/Timeout → Check timing + External service status
│ └─ Resource exhaustion → Check logs for memory/disk issues
│
└─ No obvious category?
→ Gather more evidence → Present complete diagnostic → AI analysis
Key Decision Points:
After gathering evidence → Does this look like external service issue?
- YES → Run external_issues.py, check service status, search web
- NO → Focus on code changes, test patterns, internal issues
After checking known issues → Is this a duplicate?
- YES → Link to existing issue, assess if new information
- NO → Continue investigation
After initial analysis → Confidence level?
- HIGH → Write diagnosis, create issue if needed
- MEDIUM/LOW → Gather more context, compare runs, deep dive logs
Investigation Toolkit
Use these techniques flexibly based on your decision tree path:
Setup and Load Utilities (Always Start Here)
CRITICAL: All commands must run from repository root. Never use cd to change directories.
CRITICAL: This skill uses Python 3.8+ for all utility scripts. Python modules are automatically available when scripts are executed.
🚨 CRITICAL - SCRIPT PARAMETER ORDER 🚨
ALL fetch-*.py scripts use the SAME parameter order:
fetch-metadata.py <RUN_ID> <WORKSPACE>
fetch-jobs.py <RUN_ID> <WORKSPACE>
fetch-logs.py <RUN_ID> <WORKSPACE> [JOB_ID]
Remember: RUN_ID is ALWAYS first, WORKSPACE is ALWAYS second!
Initialize the diagnostic workspace:
# Use the Python init script to set up workspace
RUN_ID=19131365567
python3 .claude/skills/cicd-diagnostics/init-diagnostic.py "$RUN_ID"
# Outputs: WORKSPACE=/path/to/.claude/diagnostics/run-{RUN_ID}
# IMPORTANT: Extract and set WORKSPACE variable from output
WORKSPACE="/Users/stevebolton/git/core2/.claude/diagnostics/run-${RUN_ID}"
Available Python utilities (imported automatically):
- workspace.py - Diagnostic workspace with automatic caching
- github_api.py - GitHub API wrappers for runs/jobs/logs
- evidence.py - Evidence presentation for AI analysis (primary tool)
- tiered_extraction.py - Tiered log extraction (Level 1/2/3)
All utilities use Python standard library and GitHub CLI (gh). No external Python packages required.
Identify Target and Create Workspace
Extract run ID from URL or PR:
# From URL: https://github.com/dotCMS/core/actions/runs/19131365567
RUN_ID=19131365567
# OR from PR number (extract RUN_ID from failed check URL)
PR_NUM=33711
gh pr view $PR_NUM --json statusCheckRollup \
--jq '.statusCheckRollup[] | select(.conclusion == "FAILURE") | .detailsUrl' | head -1
# Extract RUN_ID from the URL output
# Workspace already created by init script in step 0
WORKSPACE="/Users/stevebolton/git/core2/.claude/diagnostics/run-${RUN_ID}"
2. Fetch Workflow Data (with caching)
Use Python helper scripts - remember: RUN_ID first, WORKSPACE second:
# ✅ CORRECT PARAMETER ORDER: <RUN_ID> <WORKSPACE>
# Example values for reference:
# RUN_ID=19131365567
# WORKSPACE="/Users/stevebolton/git/core2/.claude/diagnostics/run-19131365567"
# Fetch metadata (uses caching)
python3 .claude/skills/cicd-diagnostics/fetch-metadata.py "$RUN_ID" "$WORKSPACE"
# ^^^^^^^^ ^^^^^^^^^^
# FIRST SECOND
# Fetch jobs (uses caching)
python3 .claude/skills/cicd-diagnostics/fetch-jobs.py "$RUN_ID" "$WORKSPACE"
# ^^^^^^^^ ^^^^^^^^^^
# FIRST SECOND
# 🚨 NEW: Fetch workflow annotations (CRITICAL - check first!)
python3 .claude/skills/cicd-diagnostics/fetch-annotations.py "$RUN_ID" "$WORKSPACE"
# ^^^^^^^^ ^^^^^^^^^^
# FIRST SECOND
# Set file paths
METADATA="$WORKSPACE/run-metadata.json"
JOBS="$WORKSPACE/jobs-detailed.json"
ANNOTATIONS="$WORKSPACE/annotations.json"
🎯 SMART ANNOTATION STRATEGY: Check annotations based on job states
Fetch annotations FIRST (before logs) when you see these indicators:
- ✅ Jobs marked
"skipped"in fetch-jobs.py output (check forif:conditions) - ✅ Expected jobs (release, deploy) completely missing from workflow run
- ✅ Workflow shows "completed" but didn't execute all expected phases
- ✅ Job conclusion is
"startup_failure"or"action_required"(not"failure") - ✅ No obvious error messages in initial metadata review
Skip annotations (go straight to logs) when you see:
- ❌ All expected jobs ran and failed (conclusion:
"failure"with logs available) - ❌ Clear test failures or build errors visible in job summaries
- ❌ Authentication/infrastructure errors already apparent in metadata
- ❌ Obvious root cause already identified (e.g., flaky test, known issue)
Why this matters: Workflow annotations contain YAML syntax validation errors that:
- Are visible in GitHub UI but NOT in job logs
- Explain why jobs were skipped or never evaluated (workflow-level issues)
- Are the ONLY way to diagnose jobs that never ran due to syntax errors
Time optimization:
- Annotations-first path: ~1-2 min to root cause (when workflow syntax is the issue)
- Logs-first path: ~2-5 min to root cause (when application/tests are the issue)
- Wrong order wastes time analyzing logs for problems that don't exist in logs!
3. Download Failed Job Logs
The fetch-jobs.py script displays failed job IDs. Use those to download logs:
# ✅ CORRECT PARAMETER ORDER: <RUN_ID> <WORKSPACE> [JOB_ID]
# Example values for reference:
# RUN_ID=19131365567
# WORKSPACE="/Users/stevebolton/git/core2/.claude/diagnostics/run-19131365567"
# FAILED_JOB_ID=54939324205
# Download logs for specific failed job
python3 .claude/skills/cicd-diagnostics/fetch-logs.py "$RUN_ID" "$WORKSPACE" "$FAILED_JOB_ID"
# ^^^^^^^^ ^^^^^^^^^^ ^^^^^^^^^^^^^^^
# FIRST SECOND THIRD (optional)
# Or download all failed job logs (omit JOB_ID)
python3 .claude/skills/cicd-diagnostics/fetch-logs.py "$RUN_ID" "$WORKSPACE"
❌ COMMON MISTAKES TO AVOID:
# ❌ WRONG - Missing RUN_ID (only 2 params when you need 3)
python3 .claude/skills/cicd-diagnostics/fetch-logs.py "$WORKSPACE" "$FAILED_JOB_ID"
# ❌ WRONG - Swapped RUN_ID and WORKSPACE
python3 .claude/skills/cicd-diagnostics/fetch-logs.py "$WORKSPACE" "$RUN_ID" "$FAILED_JOB_ID"
# ❌ WRONG - Job ID in second position
python3 .claude/skills/cicd-diagnostics/fetch-logs.py "$RUN_ID" "$FAILED_JOB_ID" "$WORKSPACE"
Parameter order: RUN_ID, WORKSPACE, JOB_ID (optional)
- If you get "WORKSPACE parameter appears to be a job ID" error, you likely forgot RUN_ID or swapped parameters
- All three scripts (fetch-metadata.py, fetch-jobs.py, fetch-logs.py) use the same order
- Mnemonic: Think "Run → Where → What" (Run ID → Workspace → Job ID)
4. Present Evidence to AI (KEY STEP!)
This is where AI-guided analysis begins. Use Python evidence.py to present raw data:
from pathlib import Path
import sys
sys.path.insert(0, str(Path(".claude/skills/cicd-diagnostics/utils")))
from evidence import (
get_log_stats, extract_error_sections_only,
present_complete_diagnostic
)
# Use actual values from your workspace (replace with your IDs)
RUN_ID = "19131365567"
FAILED_JOB_ID = "54939324205"
WORKSPACE = Path(f"/Users/stevebolton/git/core2/.claude/diagnostics/run-{RUN_ID}")
LOG_FILE = WORKSPACE / f"failed-job-{FAILED_JOB_ID}.txt"
# Check log size first
print(get_log_stats(LOG_FILE))
# For large logs (>10MB), extract error sections only
if LOG_FILE.stat().st_size > 10485760:
print("Large log detected - extracting error sections...")
ERROR_FILE = WORKSPACE / "error-sections.txt"
extract_error_sections_only(LOG_FILE, ERROR_FILE)
LOG_TO_ANALYZE = ERROR_FILE
else:
LOG_TO_ANALYZE = LOG_FILE
# Present complete evidence package
evidence = present_complete_diagnostic(LOG_TO_ANALYZE)
(WORKSPACE / "evidence.txt").write_text(evidence)
# Display evidence for AI analysis
print(evidence)
What this shows:
- Failed tests (JUnit, E2E, Postman)
- Error messages with context
- Assertion failures (expected vs actual)
- Stack traces
- Timing indicators (timeouts, race conditions)
- Infrastructure indicators (Docker, DB, ES)
- First error context (for cascade detection)
- Failure timeline
- Known issues matching test name
Check Known Issues (Guided by Evidence)
Decision Point: When should you check for known issues?
Check Internal GitHub Issues when:
- Error message/test name suggests a known pattern
- After identifying the failure type (test, deployment, infrastructure)
- Quick search can save deep analysis time
Check External Issues when evidence suggests:
- 🔴 HIGH Priority - Authentication errors + service names (npm, Docker, GitHub)
- 🟡 MEDIUM Priority - Infrastructure errors + timing correlation
- ⚪ LOW Priority - Test failures with clear assertions
Skip external checks if:
- Test assertion failure with obvious code bug
- Known flaky test already documented
- Recent PR introduced clear breaking change
A. Automated External Issue Detection (Use When Warranted)
The external_issues.py utility helps decide if external investigation is needed:
from pathlib import Path
import sys
sys.path.insert(0, str(Path(".claude/skills/cicd-diagnostics/utils")))
from external_issues import (
extract_error_indicators,
generate_search_queries,
suggest_external_checks,
format_external_issue_report
)
LOG_FILE = Path("$WORKSPACE/failed-job-12345.txt")
log_content = LOG_FILE.read_text(encoding='utf-8', errors='ignore')
# Extract error patterns
indicators = extract_error_indicators(log_content)
# Generate targeted search queries
search_queries = generate_search_queries(indicators, "2025-11-10")
# Get specific recommendations
recent_runs = [
("2025-11-10", "failure"),
("2025-11-09", "failure"),
("2025-11-08", "failure"),
("2025-11-07", "failure"),
("2025-11-06", "success")
]
suggestions = suggest_external_checks(indicators, recent_runs)
# Print formatted report
print(format_external_issue_report(indicators, search_queries, suggestions))
This utility automatically:
- Detects npm, Docker, GitHub Actions errors
- Identifies authentication/token issues
- Assesses likelihood of external cause (LOW/MEDIUM/HIGH)
- Generates targeted web search queries
- Suggests specific external sources to check
B. Search Internal GitHub Issues
# Search for error-specific keywords from evidence
gh issue list --search "npm ERR" --state all --limit 10 --json number,title,state,createdAt,labels
# Search for component-specific issues
gh issue list --search "docker build" --state all --limit 10
gh issue list --label "ci-cd" --state all --limit 20
# Look for recently closed issues (may have resurfaced)
gh issue list --search "authentication token" --state closed --limit 10
Pattern matching:
- Extract key error codes (e.g.,
EOTP,ENEEDAUTH,ERR_CONNECTION_REFUSED) - Search for component names (e.g.,
npm,docker,elasticsearch) - Look for similar failure patterns in issue descriptions
C. Execute Web Searches for High-Likelihood External Issues
When the utility suggests HIGH likelihood of external cause:
Use the generated search queries from step A with WebSearch tool:
# Execute top priority searches
for query in search_queries[:3]: # Top 3 most relevant
print(f"\n🔍 Searching: {query}\n")
# Use WebSearch tool with the query
Key external sources to check:
- npm registry: https://github.blog/changelog/ (search: "npm security token")
- GitHub Actions status: https://www.githubstatus.com/
- Docker Hub status: https://status.docker.com/
- Service changelogs: Check breaking changes in major versions
When to use WebFetch:
- To read specific changelog pages identified by searches
- To validate exact dates of service changes
- To get detailed migration instructions
# Example: Fetch npm security update details
WebFetch(
url="https://github.blog/changelog/2025-11-05-npm-security-update...",
prompt="Extract the key dates, changes to npm tokens, and impact on CI/CD workflows"
)
D. Correlation Analysis
Red flags for external issues:
- ✅ Failure started on specific date with no code changes
- ✅ Error mentions external service (npm, Docker Hub, GitHub)
- ✅ Authentication/authorization errors
- ✅ Multiple unrelated projects affected (search reveals community reports)
- ✅ Error message suggests policy change ("requires 2FA", "token expired")
Document findings:
## Known Issues
### Internal (dotCMS Repository)
- Issue #XXXXX: Similar error, status, resolution
### External (Service Provider Changes)
- Service: <npm/Docker/GitHub>
- Change Date: <when it took effect>
- Impact: <what broke>
- Source: <changelog URL or community report>
- Timeline: <key dates and deadlines>
Senior Engineer Analysis (Evidence-Based Reasoning)
As a senior engineer, analyze the evidence systematically:
A. Initial Hypothesis Generation
Consider multiple competing hypotheses:
- Code Defect - New bug introduced by recent changes?
- Flaky Test - Timing Issue - Race condition, clock precision, async timing?
- Flaky Test - Concurrency Issue - Thread safety violation, deadlock, shared state?
- Request Context Issue - ThreadLocal accessed from background thread? User null in Quartz job?
- Infrastructure Issue - Docker/DB/ES environment problem?
- Test Filtering - PR test subset passed, full merge queue suite failed?
- Cascading Failure - Primary error triggering secondary failures?
Apply specialized diagnostic lens (see REFERENCE.md for detailed patterns):
- Look for timing patterns: Identical timestamps, boolean flips, ordering failures
- Check thread context: Background jobs (Quartz), async operations, thread pool execution
- Identify request lifecycle: HTTP request boundary vs background execution
- Examine concurrency: Shared state, locks, atomic operations
B. Evidence Evaluation
For each hypothesis, assess supporting/contradicting evidence:
- FACT: What the logs definitively show (error messages, line numbers, stack traces)
- HYPOTHESIS: What this might indicate (must be labeled as theory)
- CONFIDENCE: How certain are you (High/Medium/Low with reasoning)
C. Differential Diagnosis
Apply systematic elimination:
- Check recent code changes vs failure (correlation ≠ causation)
- Search known issues for matching patterns (exact matches = high confidence)
- Analyze recent run history (consistent vs intermittent)
- Examine error timing and cascades (primary vs secondary failures)
D. Log Context Extraction (Efficient)
For large logs (>10MB):
- Extract only relevant error sections (99%+ reduction)
- Identify specific line numbers and context (±10 lines)
- Note timing patterns (timestamps show cascade vs independent)
- Track infrastructure events (Docker, DB connections, ES indices)
When you need more context from logs:
from pathlib import Path
import re
LOG_FILE = Path("$WORKSPACE/failed-job-12345.txt")
lines = LOG_FILE.read_text(encoding='utf-8', errors='ignore').split('\n')
# Extract specific context around an error (lines 450-480)
print('\n'.join(lines[449:480]))
# Search for related errors by pattern
for i, line in enumerate(lines, 1):
if "ContentTypeCommandIT" in line:
print(f"{i}: {line}")
if i >= 20:
break
# Get timing correlation for cascade analysis
timestamp_pattern = re.compile(r'^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}')
for line in lines[:50]:
if timestamp_pattern.match(line) and ("ERROR" in line or "FAILURE" in line):
print(line)
E. Final Classification
Provide evidence-based conclusion:
Root Cause Classification
- Category: New failure / Flaky test / Infrastructure / Test filtering
- Confidence: High / Medium / Low (with reasoning)
- Competing hypotheses considered and why rejected
Test Fingerprint (natural language)
- Test name and exact location (file:line)
- Failure pattern (assertion type, timing characteristics, error signature)
- Key identifiers for matching similar failures
Known Issue Matching
- Exact matches with open GitHub issues
- Pattern matches with documented flaky tests
- If no match: clearly state "No known issue found"
Impact Assessment
- Blocking status (is this blocking merge/deploy?)
- False positive likelihood (should retry help?)
- Frequency analysis (first occurrence vs recurring)
- Developer friction impact
7. Get Additional Context (if needed)
For comparative analysis or frequency checks:
import sys
from pathlib import Path
sys.path.insert(0, str(Path(".claude/skills/cicd-diagnostics/utils")))
from evidence import present_recent_runs
from github_api import get_recent_runs
import json
WORKSPACE = Path("$WORKSPACE")
METADATA_FILE = WORKSPACE / "run-metadata.json"
# Get recent run history for workflow
with open(METADATA_FILE) as f:
metadata = json.load(f)
workflow_name = metadata.get('workflowName')
print(present_recent_runs(workflow_name, 20))
# For PR vs Merge Queue comparison
if "merge-queue" in workflow_name:
current_sha = metadata.get('headSha')
pr_runs = get_recent_runs("cicd_1-pr.yml", 1)
if pr_runs and pr_runs[0].get('headSha') == current_sha:
pr_result = pr_runs[0].get('conclusion')
if pr_result == "success":
print("⚠️ Test Filtering Issue: PR passed but merge queue failed")
print("This suggests test was filtered in PR but ran in merge queue")
8. Generate Comprehensive Report
AI writes report naturally (not a template):
CRITICAL: Generate TWO separate reports:
- DIAGNOSIS.md - User-facing failure diagnosis (no skill evaluation)
- ANALYSIS_EVALUATION.md - Skill effectiveness evaluation (meta-analysis)
See REFERENCE.md for report templates and structure.
IMPORTANT:
- DIAGNOSIS.md = User-facing failure analysis (what failed, why, how to fix)
- ANALYSIS_EVALUATION.md = Internal skill evaluation (how well the skill performed)
- DO NOT mix skill effectiveness evaluation into DIAGNOSIS.md
- Users should not see skill meta-analysis in their failure reports
9. Collaborate with User (When Multiple Paths Exist)
As a senior engineer, when you encounter decision points or uncertainty, engage the user:
When to Ask for User Input:
- Multiple plausible root causes with similar evidence weight
- Insufficient information requiring deeper investigation
- Trade-offs between investigation paths
- Recommendation requires user context
See REFERENCE.md for examples of user collaboration patterns.
10. Create Issue (if needed)
After analysis, determine if issue creation is warranted:
import subprocess
import json
# Senior engineer judgment call based on:
# - Is this already tracked? (check known issues)
# - Is this a new failure? (check recent history)
# - Is this blocking development? (impact assessment)
# - Would an issue help track/fix it? (actionability)
if CREATE_ISSUE:
issue_body = f"""## Summary
{summary}
## Failure Evidence
{evidence_excerpts}
## Root Cause Analysis
{analysis_with_confidence}
## Reproduction Pattern
{reproduction_steps}
## Diagnostic Run
- Run ID: {RUN_ID}
- Workspace: {WORKSPACE}
## Recommended Actions
{recommendations}
"""
subprocess.run([
"gh", "issue", "create",
"--title", f"[CI/CD] {brief_description}",
"--label", "bug,ci-cd,Flakey Test",
"--body", issue_body
])
Key Principles
1. Evidence-Driven, Not Rule-Based
Don't hardcode classification logic. Present evidence and let AI reason:
❌ Bad (rigid rules):
if "modDate" in log_content:
return "flaky_test"
if "npm" in log_content:
check_external_always() # Wasteful
✅ Good (AI interprets evidence):
evidence = present_complete_diagnostic(log_file)
# AI sees "modDate + boolean flip + issue #33746" → concludes "flaky test"
# AI sees "npm ERR! + EOTP + timing correlation" → checks external issues
# AI sees "AssertionError + recent PR" → focuses on code changes
2. Adaptive Investigation Depth
Let findings guide how deep you go:
Quick Win (30 sec - 2 min)
└─ Known issue? → Link and done
└─ Clear error? → Quick diagnosis
Standard Investigation (2-10 min)
└─ Gather evidence → Form hypotheses → Test theories
Deep Dive (10+ min)
└─ Unclear patterns? → Compare runs, check history, analyze timing
└─ Multiple theories? → Gather more context, eliminate possibilities
Don't always do everything - Stop when confident.
3. Context Shapes Interpretation
Same error, different meaning in different workflows:
"Test timeout" in PR workflow → Might be code issue, check changes
"Test timeout" in nightly → Likely flaky test, check history
"npm ERR!" in deployment → Check external issues FIRST
"npm ERR!" in build → Check package.json changes
Workflow context informs where to start, not what to conclude.
4. Tool Selection Based on Failure Type
Don't use every tool every time:
| Failure Type | Primary Tools | Skip |
|---|---|---|
| Deployment/Auth | external_issues.py, WebSearch | Deep log analysis |
| Test assertion | Code changes, test history | External checks |
| Flaky test | Run history, timing patterns | External checks |
| Infrastructure | Recent runs, log patterns | Code changes |
5. Leverage Caching
Workspace automatically caches:
- Run metadata
- Job details
- Downloaded logs
- Evidence extraction
Rerunning the skill uses cached data (much faster!)
Output Format
Write naturally, like a senior engineer writing to a colleague. Include relevant sections based on what you discovered:
Core sections (always):
- Executive Summary - What failed and why (2-3 sentences)
- Root Cause - Your conclusion with confidence level and reasoning
- Evidence - Key findings that support your conclusion
- Recommendations - What should happen next
Additional sections (as relevant):
- Known Issues - Internal or external issues found (if checked)
- Timeline Analysis - When it started failing (if relevant)
- Test Fingerprint - Pattern for matching (if test failure)
- Impact Assessment - Blocking status, frequency (if important)
- Competing Hypotheses - Theories you ruled out (if multiple possibilities)
Don't force sections that don't add value. A deployment authentication error doesn't need a "Test Fingerprint" section.
Success Criteria
Investigation Quality: ✅ Identified specific failure point with evidence ✅ Determined root cause with reasoning (not just labels) ✅ Assessed whether this is a known issue (when relevant) ✅ Made appropriate use of external validation (when patterns suggest it) ✅ Provided actionable recommendations
Process Quality: ✅ Used adaptive investigation depth (stopped when confident) ✅ Let evidence guide technique selection (didn't use every tool blindly) ✅ Explained confidence level and competing theories ✅ Saved diagnostic artifacts in workspace ✅ Wrote natural, contextual report (not template-filled)
Reference Files
For detailed information:
- REFERENCE.md - Detailed technical expertise, diagnostic patterns, and examples
- WORKFLOWS.md - Workflow descriptions and patterns
- LOG_ANALYSIS.md - Advanced log analysis techniques
- utils/README.md - Utility function reference
- ISSUE_TEMPLATE.md - Issue creation template
- README.md - Quick reference and examples