Claude Code Plugins

Community-maintained marketplace

Feedback

debug-nonlinear-editor

@Dreamrealai/nonlinear-editor
0
0

Systematic debugging workflow for the non-linear video editor project. Use when investigating bugs, errors, crashes, or unexpected behavior. CRITICAL - NEVER ask user to check dashboards/websites - ALWAYS use CLI commands (Vercel, Supabase, Axiom, Stripe, GitHub) and MCP tools (Chrome DevTools, Firecrawl, Axiom) instead. Execute all checks via CLI/MCP tools directly. Focus on the specific issue being debugged rather than scanning everything.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name debug-nonlinear-editor
description Systematic debugging workflow for the non-linear video editor project. Use when investigating bugs, errors, crashes, or unexpected behavior. CRITICAL - NEVER ask user to check dashboards/websites - ALWAYS use CLI commands (Vercel, Supabase, Axiom, Stripe, GitHub) and MCP tools (Chrome DevTools, Firecrawl, Axiom) instead. Execute all checks via CLI/MCP tools directly. Focus on the specific issue being debugged rather than scanning everything.

Non-Linear Editor Debugging Skill

A systematic approach to debugging issues in the non-linear video editor project, leveraging production monitoring tools, CLI commands, MCP tools, and thoughtful investigation.

⚡ CLI & MCP First - NEVER Ask User to Check Dashboards

CRITICAL: Always use CLI commands and MCP tools INSTEAD OF asking the user to visit dashboards/websites.

❌ NEVER Do This:

  • ❌ "Check the Vercel dashboard for deployment status"
  • ❌ "Go to Supabase dashboard and view the logs"
  • ❌ "Visit Axiom to see recent errors"
  • ❌ "Open GitHub Actions to see workflow runs"
  • ❌ "Check Stripe dashboard for payment status"
  • ❌ "Look at the browser console for errors"
  • ❌ "Visit this URL and tell me what you see"

✅ ALWAYS Do This Instead:

  • vercel ls - Check deployment status
  • supabase migration list - View database state
  • mcp__axiom__queryApl() - Query production logs
  • gh run list - Check GitHub Actions
  • stripe logs tail - Monitor Stripe events
  • mcp__chrome_devtools__list_console_messages() - Get browser console
  • mcp__firecrawl__firecrawl_scrape(url) - Fetch URL content

Available CLIs

  • Vercel CLI (vercel) - Deployment, logs, env vars, domains
  • Axiom CLI (axiom) - Query logs, manage datasets
  • Supabase CLI (supabase) - Database migrations, storage, functions
  • GitHub CLI (gh) - PRs, issues, actions, releases
  • Git (git) - Version control
  • Docker (docker) - Container management
  • npm/npx - Package management, scripts

CLI Usage Examples

# Vercel - Check deployment status
vercel ls
vercel logs [deployment-url]
vercel env ls
vercel domains ls

# Axiom - Query logs directly
axiom dataset query nonlinear-editor "['logs'] | where ['severity'] == 'error'"

# Supabase - Database operations
supabase migration list
supabase db push
supabase storage ls

# GitHub - Check CI/CD
gh run list --limit 10
gh run view [run-id]
gh pr checks

# Jest - Run tests
npm test -- --testPathPattern="api/assets"
npm test -- --listTests
npm run test:coverage

Overview

This skill provides a structured debugging workflow that helps identify root causes quickly by checking the right sources in the right order. It focuses investigation on the specific bug rather than broad scanning.

Key Principle: Execute CLI commands directly. Never ask the user to run commands manually unless absolutely necessary.

When to Use This Skill

  • User reports a bug or error
  • Production issue needs investigation
  • Feature not working as expected
  • API failures or timeout issues
  • Frontend crashes or rendering problems
  • Database query issues
  • Authentication or authorization problems
  • Jest test failures (see ISSUES.md #155, #120)
  • Deployment issues on Vercel
  • GitHub Actions workflow failures

Debugging Workflow

Step 1: Understand the Bug Context

Before diving into logs, gather context:

  1. What is the expected behavior?
  2. What is the actual behavior?
  3. When did this start happening? (recent deploy? specific time?)
  4. What user actions trigger it? (specific workflow, API endpoint, UI interaction)
  5. Is it reproducible? (always, sometimes, specific conditions)
  6. What component/feature is affected? (timeline, export, upload, auth, etc.)

Key principle: Focus your investigation on the specific area affected by the bug, not the entire system.

Step 2: Check Axiom for Related Errors

ALWAYS check Axiom first - it's your primary source of production telemetry.

Default Timeframe: Last 15 minutes - Focus on recent errors for active debugging.

Logging Infrastructure (2025-10-27 Update):

  • ✅ next-axiom integration active with hybrid logging
  • ✅ Correlation IDs for end-to-end request tracing
  • ✅ Web Vitals auto-capture
  • ✅ Flattened data structure for queryability

Tools to use:

  • mcp__axiom__listDatasets - See available datasets
  • mcp__axiom__getDatasetSchema('nonlinear-editor') - Understand log structure
  • mcp__axiom__queryApl - Query logs with APL

What to look for:

# Get schema first to understand available fields
['nonlinear-editor'] | take 1 | project-keep *

# Search for errors in the last 15 minutes (DEFAULT for active debugging)
['nonlinear-editor']
| where ['_time'] > ago(15m)  # Start with 15m, expand if needed: 1h, 24h, 7d
| where ['level'] == "error"  # Note: field is 'level', not 'severity'
| summarize count() by ['message'], bin_auto(['_time'])
| order by count_ desc

# Filter by source (browser vs server logs) - last 15 minutes
['nonlinear-editor']
| where ['_time'] > ago(15m)
| where ['source'] == "browser"  # or "server"
| where ['level'] in ("error", "warn")
| project ['_time'], ['level'], ['message'], ['url'], ['userId']

# Trace a specific request using correlation IDs - last 15 minutes
['nonlinear-editor']
| where ['_time'] > ago(15m)
| where ['correlationId'] == "your-correlation-id"
| project ['_time'], ['source'], ['level'], ['message'], ['url']
| order by ['_time'] asc

# Check specific component/endpoint (using flattened data fields) - last 15 minutes
['nonlinear-editor']
| where ['_time'] > ago(15m)
| where ['data_operation'] == "saveTimeline"  # Flattened fields have data_ prefix
| where ['level'] in ("error", "warn")
| project ['_time'], ['message'], ['data_projectId'], ['userId']

# Web Vitals performance issues - last 15 minutes
['nonlinear-editor']
| where ['_time'] > ago(15m)
| where ['message'] contains "Web Vital"
| where ['data_rating'] in ("poor", "needs-improvement")
| summarize avg(['data_value']) by ['data_metric'], ['url']

# Check for specific error patterns - expand to 1 hour for pattern analysis
['nonlinear-editor']
| where ['_time'] > ago(1h)  # Use longer window for pattern detection
| search "failed" or "timeout" or "undefined" or "null"
| where ['level'] == "error"
| take 50

Analysis questions:

  • Are there error spikes correlating with the bug report time?
  • What's the error message and stack trace?
  • Which user(s) are affected? (check userId)
  • Which API endpoint or component is failing?
  • Are there patterns (e.g., always fails on specific action)?

Timeframe Strategy:

  • 15 minutes: Active debugging of recent issues (DEFAULT)
  • 1 hour: Pattern detection, frequency analysis
  • 24 hours: Trend analysis, intermittent issues
  • 7 days: Historical context, regression hunting

Step 3: Check Sentry for Errors (Last 15 Minutes)

Check Sentry alongside Axiom - it captures client-side errors and stack traces.

Use Sentry CLI wrapper (fastest):

# Recent errors only (recommended)
npm run sentry:errors

# Recent warnings
npm run sentry:warnings

# Error statistics
npm run sentry:stats

# Raw CLI for last 15 minutes
npx sentry-cli issues list --status unresolved --limit 20

What to look for:

  • JavaScript errors in browser
  • Unhandled promise rejections
  • React component errors
  • Network request failures
  • Stack traces for debugging

Step 4: Investigate API Endpoints (if relevant)

If the bug involves API calls:

  1. Check the API route implementation:

    # Find the relevant API route
    # Use Glob or Grep to locate files
    
  2. Verify in Axiom (last 15 minutes):

    # Check API endpoint logs - last 15 minutes
    ['logs']
    | where ['_time'] > ago(15m)
    | where ['endpoint'] == "/api/specific/endpoint"  # Replace with actual endpoint
    | summarize
        total_requests=count(),
        errors=countif(['status'] >= 400),
        avg_duration=avg(['duration']),
        p95_duration=percentile(['duration'], 95)
      by ['endpoint'], ['status']
    
  3. Common API issues to check:

    • Authentication middleware (withAuth) properly applied?
    • Rate limiting configuration correct?
    • Input validation working?
    • Error handling returning proper responses?
    • Service layer called correctly?

Step 5: Check Firecrawl API Calls (if web scraping involved)

If the bug involves web scraping, documentation fetching, or Firecrawl:

  1. Check Axiom for Firecrawl-related logs:

    ['logs']
    | where ['_time'] > ago(4h)
    | where ['message'] contains "firecrawl" or ['component'] contains "Firecrawl"
    | project ['_time'], ['severity'], ['message'], ['endpoint']
    
  2. Verify Firecrawl API status:

    • Check if API key is valid (env var: FIRECRAWL_API_KEY)
    • Look for rate limiting errors (429 responses)
    • Check for timeout issues
    • Verify URL format being passed to Firecrawl
  3. Test Firecrawl endpoints if needed:

    // Check if Firecrawl call is structured correctly
    // Review: mcp__firecrawl__firecrawl_scrape, firecrawl_search, etc.
    

Step 6: Review Frontend Console Errors

If the bug manifests in the UI:

  1. Use Chrome DevTools MCP tools:

    • mcp__chrome_devtools__list_console_messages - Get console logs
    • mcp__chrome_devtools__get_console_message - Get specific error details
    • mcp__chrome_devtools__list_network_requests - Check failed network calls
    • mcp__chrome_devtools__get_network_request - Inspect specific requests
  2. What to look for:

    • React errors or warnings
    • Network request failures (check status codes)
    • JavaScript runtime errors
    • Zustand state management issues
    • Missing or undefined props

Step 7: Check Database (if data issue)

If the bug involves data persistence or retrieval:

  1. Check Supabase logs in the Supabase dashboard

  2. Verify RLS policies aren't blocking legitimate access

  3. Check migration status:

    supabase migration list
    
  4. Query Axiom for database errors:

    ['logs']
    | where ['_time'] > ago(3h)
    | where ['message'] contains "database" or ['message'] contains "supabase"
    | where ['severity'] == "error"
    | project ['_time'], ['message'], ['userId'], ['endpoint']
    

Step 8: Debug Jest Test Failures (CLI-First)

If the bug involves test failures (see ISSUES.md #155, #120):

Use CLI commands directly:

# Run specific test file
npm test -- __tests__/api/assets/assetId-update.test.ts

# Run tests matching pattern
npm test -- --testPathPattern="api/assets"

# Run tests with verbose output
npm test -- --verbose __tests__/api/assets

# Check test pass rate
npm run test:coverage
npm run test:health

# List all test files
npm test -- --listTests | grep "assets"

# Run flaky test detection
npm run test:detect-flaky

# Debug specific test with Node inspector
node --inspect-brk node_modules/.bin/jest __tests__/api/assets/assetId-update.test.ts

Common Jest Issues in Codebase:

  1. Issue #155: Test Timeouts - requestDeduplication.test.ts, webhooks.test.ts

    # Check specific timeout test
    npm test -- __tests__/lib/requestDeduplication.test.ts --verbose
    
  2. Issue #120: API Assets Tests - 97.3% passing, 4 tests failing

    # Run failing asset tests
    npm test -- __tests__/api/assets/\[assetId\]/update.test.ts
    
  3. Mock Pollution - Tests pass individually but fail in suite

    # Run single test
    npm test -- -t "should update asset metadata"
    
    # Run full suite to check pollution
    npm test -- __tests__/api/assets
    

Query Axiom for test-related errors:

['logs']
| where ['_time'] > ago(1h)
| where ['message'] contains "jest" or ['message'] contains "test"
| where ['severity'] == "error"
| project ['_time'], ['message'], ['test_name']

Step 9: Debug Vercel Deployments (CLI-First)

If the bug involves deployment or is only in production:

Use Vercel CLI directly:

# List recent deployments
vercel ls --limit 10

# Check latest deployment logs
vercel logs

# Check specific deployment
vercel logs [deployment-url]

# Check environment variables
vercel env ls
vercel env pull .env.production.local

# Inspect deployment details
vercel inspect [deployment-url]

# Check build logs
vercel logs [deployment-url] --follow

# Check domains
vercel domains ls

Query Axiom for production errors:

['logs']
| where ['_time'] > ago(2h)
| where ['env'] == "production"
| where ['severity'] == "error"
| summarize count() by ['endpoint'], ['message']
| order by count_ desc

Common Vercel Issues:

  1. Build failures - Check Vercel build logs

    vercel logs [deployment-url] | grep "error"
    
  2. Environment variable issues

    # Verify all required env vars exist
    vercel env ls | grep -E "SUPABASE|AXIOM|FAL|ELEVENLABS"
    
  3. Function timeouts - Serverless functions time out

    # Check function logs
    vercel logs [deployment-url] --filter "/api/"
    

Step 10: Debug GitHub Actions (CLI-First)

If CI/CD failing:

Use GitHub CLI directly:

# List recent workflow runs
gh run list --limit 10

# View specific run details
gh run view [run-id]

# View run logs
gh run view [run-id] --log

# Check failed jobs
gh run view [run-id] --log-failed

# Re-run failed jobs
gh run rerun [run-id] --failed

# Watch live run
gh run watch

# List workflows
gh workflow list

# Check workflow status
gh workflow view "Deploy Supabase Migrations"

Common GitHub Actions Issues:

  1. Supabase auto-deployment - Missing SUPABASE_ACCESS_TOKEN

    # Check if secret exists
    gh secret list | grep SUPABASE
    
    # Set secret
    gh secret set SUPABASE_ACCESS_TOKEN
    
  2. Test failures in CI

    # View test job logs
    gh run view [run-id] --log | grep "FAIL"
    
  3. Lint-staged failures

    # Run lint locally first
    npm run lint
    npm run format:check
    npm run lint:backups
    

Step 11: Review Recent Changes (CLI-First)

Check git history for potentially related changes:

# Find recent commits that touched relevant files
git log --since="3 days ago" --oneline -- path/to/affected/component

# Review specific commit
git show <commit-hash>

# Find commits by author
git log --author="name" --since="1 week ago"

# Search commit messages
git log --grep="feature-name" --oneline

# Find when a bug was introduced (git bisect)
git bisect start
git bisect bad  # Current version has bug
git bisect good <commit>  # Known good version

# View file history
git log --follow -- path/to/file

# Compare branches
git diff main..feature-branch -- path/to/file

Step 12: Reproduce Locally (CLI-First)

If you can reproduce locally:

  1. Set up local environment:

    # Ensure .env.local has BYPASS_AUTH=true
    grep BYPASS_AUTH .env.local
    
    # Verify environment variables
    npm run validate:env
    
  2. Run development server:

    npm run dev
    # Visit http://localhost:3000
    
  3. Run specific API route locally:

    # Start dev server and test API
    curl http://localhost:3000/api/assets -H "Content-Type: application/json"
    
  4. Check database state:

    # Connect to Supabase
    supabase db remote
    
    # Check migration status
    supabase migration list
    
  5. Test credentials for local testing:

    • Email: test@example.com
    • Password: test_password_123
    • (When BYPASS_AUTH=true, auth is skipped)

Debugging Checklist

Before declaring a bug "fixed":

  • Root cause identified and understood
  • Fix implemented with proper error handling
  • Tested locally (if reproducible)
  • Verified in Axiom that errors stopped occurring
  • Checked for edge cases
  • Updated ISSUES.md if this was a tracked issue
  • Build passes (npm run build)
  • Tests pass (if applicable)
  • Changes committed and pushed

Common Issue Patterns (From ISSUES.md)

Issue #155: Jest Test Timeouts & Failures

Symptoms: Tests timeout, requestDeduplication.test.ts hangs, 76.7% pass rate

CLI Debug Commands:

# Run specific failing test
npm test -- __tests__/lib/requestDeduplication.test.ts --verbose

# Check overall pass rate
npm run test:coverage

# Run all tests and generate report
npm run test:full-check

# Check test health
npm run test:health

Related Files:

  • __tests__/lib/requestDeduplication.test.ts - AbortController cleanup issues
  • __tests__/api/webhooks.test.ts - Timeout in abort handling
  • __tests__/lib/thumbnailService.test.ts - FFmpeg encoding failures

Fix Strategy:

  • Mock FFmpeg/Sharp in tests
  • Fix AbortController cleanup
  • Use proper test timeout values

Issue #120: API Assets Test Failures (97.3% Fixed)

Symptoms: 4 tests failing, mock pollution, tests pass individually but fail in suite

CLI Debug Commands:

# Run asset API tests
npm test -- --testPathPattern="api/assets"

# Run specific failing test
npm test -- -t "should update asset metadata"

# Check for mock pollution
npm test -- __tests__/api/assets/\[assetId\]/update.test.ts --verbose

Related Files:

  • __tests__/api/assets/[assetId]/update.test.ts - formData() mock issues
  • __tests__/api/assets/upload.test.ts - Storage mock separation

Fix Strategy:

  • Add formData() mocks to tests
  • Separate storage copy and upload operations
  • Improve mock table implementations

Issue #133: Plain String Types (See ISSUES.md P2)

Symptoms: API routes use string instead of branded types like UserId, ProjectId

CLI Debug Commands:

# Find files with plain string types
grep -r "params.userId as string" app/api/

# Check TypeScript errors
npx tsc --noEmit

Fix Strategy:

  • Replace string with branded types: UserId, ProjectId, AssetId

Issue #140: Missing Input Validation (See ISSUES.md P2)

Symptoms: 15+ API routes lack comprehensive input validation

CLI Debug Commands:

# Find routes without validation
grep -r "export async function" app/api/ | grep -v "assertString"

Fix Strategy:

  • Add assertion functions: assertString, assertNumber, assertUserId

Authentication Errors

Symptoms: 401/403 responses, "Unauthorized" errors

CLI Debug Commands:

# Check Axiom for auth errors
axiom dataset query nonlinear-editor "['logs'] | where ['status'] in (401, 403) | where ['_time'] > ago(1h)"

# Verify JWT token in local storage (browser)
# Use Chrome DevTools MCP

Axiom Query:

['logs'] | where ['status'] in (401, 403) | where ['_time'] > ago(1h)

Check:

  1. Verify JWT token validity
  2. Check withAuth middleware is applied (see lib/api/withAuth.ts)
  3. Verify RLS policies in Supabase: supabase db remote

Timeout Errors

Symptoms: 504 Gateway Timeout, slow responses

CLI Debug Commands:

# Check Vercel function logs
vercel logs --filter "/api/" | grep "timeout"

# Test API locally
curl -w "@curl-format.txt" http://localhost:3000/api/endpoint

Axiom Query:

['logs']
| where ['duration'] > 10000
| summarize avg(['duration']), p95=percentile(['duration'], 95) by ['endpoint']

Check:

  1. Database query performance (N+1 queries?)
  2. External API calls (Firecrawl, FAL, ElevenLabs) timing out
  3. Rate limiting delays
  4. Vercel function timeout (10s default, 60s max)

Undefined/Null Errors

Symptoms: "Cannot read property of undefined", "null is not an object"

CLI Debug Commands:

# Search for undefined errors in logs
vercel logs | grep "undefined"

# Check TypeScript strict mode
grep "strict" tsconfig.json

Axiom Query:

['logs']
| search "undefined" or "null"
| where ['severity'] == "error"
| project ['_time'], ['message'], ['stack_trace'], ['component']

Check:

  1. TypeScript type guards missing (see ISSUES.md #145)
  2. Optional chaining needed (?.)
  3. Default values missing
  4. Async timing issues (data not loaded yet)

API Failures (500 Errors)

Symptoms: 500 errors, "Internal Server Error"

CLI Debug Commands:

# Check production API errors
vercel logs | grep "500"

# Run API tests
npm test -- --testPathPattern="api/"

# Check environment variables
vercel env ls | grep -E "SUPABASE|FAL|ELEVENLABS"

Axiom Query:

['logs']
| where ['status'] >= 500
| where ['_time'] > ago(2h)
| summarize count() by ['endpoint'], ['message']
| order by count_ desc

Check:

  1. Error handling in API route
  2. Service layer errors
  3. Database connection issues: supabase status
  4. Environment variables missing: vercel env ls

Deployment/Build Failures

Symptoms: Vercel build fails, TypeScript errors in production

CLI Debug Commands:

# Check latest deployment
vercel ls --limit 5

# View build logs
vercel logs [deployment-url] | grep "error"

# Test build locally
npm run build

# Check TypeScript errors
npm run type-check

Common Causes:

  1. Missing environment variables in Vercel
  2. TypeScript strict mode violations
  3. Missing dependencies
  4. Next.js config issues

Best Practices

  1. CLI-First Always: Execute commands via CLI (Vercel, Axiom, Supabase, Git, Jest) instead of asking the user to do it manually
  2. Start specific, not broad: Query Axiom for the specific timeframe, component, and error type related to the bug
  3. Follow the data flow: Trace from where the error manifests backward to its origin
  4. One hypothesis at a time: Test one potential cause before moving to the next
  5. Document findings: Add notes to ISSUES.md or create reports in /docs/reports/
  6. Think before querying: Formulate what you're looking for before writing APL queries
  7. Verify fixes with CLI:
    • Always check Axiom after deploying a fix: axiom dataset query nonlinear-editor "query"
    • Run tests: npm test
    • Check deployment: vercel ls
    • Verify build: npm run build
  8. Reference ISSUES.md: Tie your debugging to known issues (#155, #120, etc.)
  9. Use real file paths: Reference actual files from the codebase (e.g., __tests__/api/assets/[assetId]/update.test.ts)
  10. Prefer automation: Use GitHub Actions, Vercel auto-deploy, and Supabase auto-migrations when possible

Tool Reference

CLI Tools (Prefer These!)

Vercel CLI:

vercel ls                    # List deployments
vercel logs [url]            # View deployment logs
vercel env ls                # List environment variables
vercel inspect [url]         # Inspect deployment details
vercel domains ls            # List domains

Axiom CLI:

axiom dataset list                                # List datasets
axiom dataset query [dataset] "APL query"        # Query logs directly
axiom auth status                                 # Check authentication

Supabase CLI:

supabase migration list      # List migrations
supabase db push             # Push migrations
supabase status              # Check status
supabase storage ls          # List storage buckets
supabase db remote           # Connect to remote DB

GitHub CLI:

gh run list                  # List workflow runs
gh run view [id]             # View run details
gh run view [id] --log       # View logs
gh secret list               # List secrets
gh pr checks                 # Check PR status

Jest CLI:

npm test -- [pattern]        # Run tests matching pattern
npm test -- --listTests      # List all tests
npm run test:coverage        # Run with coverage
npm run test:health          # Check test health
npm run test:detect-flaky    # Detect flaky tests

Axiom MCP Tools

  • mcp__axiom__listDatasets - List available datasets
  • mcp__axiom__getDatasetSchema - Get schema for a dataset
  • mcp__axiom__queryApl - Query logs with Axiom Processing Language
  • mcp__axiom__getSavedQueries - View saved queries
  • mcp__axiom__getMonitors - List configured monitors
  • mcp__axiom__getMonitorsHistory - Check monitor history

Chrome DevTools MCP Tools

  • mcp__chrome_devtools__list_console_messages - Get browser console logs
  • mcp__chrome_devtools__get_console_message - Get specific console message
  • mcp__chrome_devtools__list_network_requests - List network requests
  • mcp__chrome_devtools__get_network_request - Get specific network request details
  • mcp__chrome_devtools__take_screenshot - Capture visual state
  • mcp__chrome_devtools__take_snapshot - Get accessibility tree snapshot

Firecrawl MCP Tools

  • mcp__firecrawl__firecrawl_scrape - Scrape single URL
  • mcp__firecrawl__firecrawl_search - Search and scrape
  • mcp__firecrawl__firecrawl_map - Map website URLs
  • mcp__firecrawl__firecrawl_crawl - Crawl multiple pages

Examples

Example 1: Debugging Timeline Export Failure

Context: User reports video export fails at 50% progress

Investigation:

  1. Check Axiom for export-related errors:

    ['logs']
    | where ['_time'] > ago(24h)
    | where ['component'] == "Export" or ['endpoint'] contains "export"
    | where ['severity'] == "error"
    | project ['_time'], ['message'], ['userId'], ['duration']
    
  2. Found: "FFmpeg process exited with code 1"

  3. Check FFmpeg logs in Axiom:

    ['logs']
    | where ['message'] contains "ffmpeg"
    | where ['_time'] > ago(24h)
    | project ['_time'], ['message']
    
  4. Root cause: Memory limit exceeded during encoding

  5. Solution: Implement chunked encoding for large videos

Example 2: Debugging Authentication Loop

Context: Users keep getting redirected to login after successful authentication

Investigation:

  1. Check Axiom for auth-related patterns:

    ['logs']
    | where ['_time'] > ago(2h)
    | where ['endpoint'] contains "auth" or ['component'] == "Auth"
    | summarize count() by ['endpoint'], ['status'], ['userId']
    
  2. Pattern found: Multiple /api/auth/session calls with 401 responses

  3. Check browser console via Chrome DevTools:

    • Found: JWT token not being stored in localStorage
  4. Root cause: Cookie same-site policy blocking token storage

  5. Solution: Update cookie settings in auth middleware

Example 3: Debugging Firecrawl Timeout

Context: Documentation scraping hangs and never completes

Investigation:

  1. Check Axiom for Firecrawl calls:

    ['logs']
    | where ['_time'] > ago(6h)
    | where ['message'] contains "firecrawl"
    | project ['_time'], ['endpoint'], ['duration'], ['status']
    
  2. Found: Calls timing out after 30 seconds

  3. Check Firecrawl API status in logs

  4. Root cause: Target website blocking scrapers, Firecrawl retrying indefinitely

  5. Solution: Add explicit timeout, implement fallback strategy

Additional Resources

  • Project Documentation: /docs/ARCHITECTURE_OVERVIEW.md
  • API Reference: /docs/api/API_REFERENCE.md
  • Issues Tracker: /ISSUES.md
  • Coding Standards: /docs/CODING_BEST_PRACTICES.md
  • Axiom Dashboard: Check for visual dashboards and saved queries
  • Supabase Dashboard: Database logs and monitoring

Remember: The goal is to find the root cause efficiently, not to check every possible thing. Stay focused on the specific bug and follow the evidence.