name	debug
description	Production-first debugging workflow specific to DreamReal's infrastructure (Axiom, Sentry, Supabase, Vercel, FAL, ElevenLabs). ALWAYS checks Sentry/Axiom before code, adds logging if gaps found, uses Firecrawl to verify API usage, checks common bugs (signed URLs, cleanup, RLS, log.flush), then validates fixes via subagent. Use when user says "debug", mentions "debugging", reports bugs, errors, unexpected behavior, or asks to investigate/troubleshoot issues.
allowed-tools	mcp__axiom__, mcp__chrome_devtools__, mcp__firecrawl__*, Bash, Read, Grep, Glob, Edit, Task

DreamReal Production-First Debugging Workflow

Purpose: Rapid root cause identification through production telemetry in DreamReal's specific infrastructure, followed by validated fixes that adhere to repo best practices.

Infrastructure: Axiom (next-axiom hybrid logging), Sentry, Supabase (DB + Storage), Vercel, FAL.ai, ElevenLabs, Google Vertex AI, Firecrawl, Chrome DevTools MCP

When to Use

User reports a bug, error, or unexpected behavior
Production issues need investigation
Feature not working as expected
API integration issues (FAL, ElevenLabs, Vertex AI)
Storage/database access problems

Debugging Checklist

1. Production Telemetry FIRST (MANDATORY)

Before touching any code, query production logs:

# Axiom: Check recent errors (note: field is 'level', not 'severity')
mcp__axiom__queryApl("['nonlinear-editor'] | where ['_time'] > ago(1h) | where ['level'] == 'error' | summarize count() by ['message'], ['source']")

# Axiom: Get schema first if unsure of fields
mcp__axiom__getDatasetSchema("nonlinear-editor")

# Sentry: Check unresolved issues
npx sentry-cli issues list --status unresolved --limit 20

What to extract:

Error frequency & patterns (group by message, source)
Stack traces & affected files
User IDs & request correlation IDs
Timestamp of first occurrence
Source: "browser" vs "server" logs

DreamReal-specific fields (flattened with `data_` prefix):

`correlationId` - End-to-end request tracing
`userId` - User-scoped debugging
`source` - "browser" or "server"
`data_operation` - Business operation (e.g., "saveTimeline", "uploadAsset")
`data_projectId`, `data_assetId` - Entity references
`data_metric`, `data_value`, `data_rating` - Web Vitals

Why: Production logs reveal the ACTUAL problem users face, not assumptions.

2. Check for Common DreamReal Bugs (Quick Scan)

Before deep investigation, check these frequent issues:

Storage Issues:

❌ Accessing storage_url directly instead of getting signed URL ```typescript // WRONG: Using storage_url directly

// CORRECT: Get signed URL first const { data } = await supabase.storage.from('assets').createSignedUrl(path, 3600) ```
❌ Missing cleanup in finally blocks for temp files
❌ Non-user-scoped storage paths (missing `{userId}/{projectId}/`)
❌ localStorage for persistent data (use Supabase instead)

API Route Issues:

❌ Missing `withAuth` middleware (`app/api/*/route.ts`)
❌ Missing input validation (no `assertString`, `assertUserId`, etc.)
❌ Not flushing logs before return (`await log.flush()` missing)
❌ Missing correlation ID (use `addCorrelationHeader()`)
❌ Plain string types instead of branded types (`UserId`, `ProjectId`)

External API Issues:

❌ Rate limiting (FAL: 5 req/min, ElevenLabs: limits per tier)
❌ Wrong API usage (check docs via Firecrawl - see Step 6)
❌ Missing error handling for API timeouts
❌ Not checking API response structure (API changes)

Database Issues:

❌ RLS policies blocking access (check `user_id` scoping)
❌ Missing `user_id` in inserts (fails RLS)
❌ Not handling null/undefined from queries

Frontend Issues:

❌ Missing formData() mock in tests
❌ AbortController not cleaned up (memory leaks)
❌ localStorage NaN values (missing try/catch + NaN checks)

Check Axiom for these patterns: ```apl

Storage URL issues

['nonlinear-editor'] | where ['_time'] > ago(2h) | where ['message'] contains "signed" or ['message'] contains "storage_url" | where ['level'] == "error"

RLS policy failures

['nonlinear-editor'] | where ['message'] contains "RLS" or ['message'] contains "policy" | where ['level'] == "error"

Missing log flush (API returns before logs sent)

['nonlinear-editor'] | where ['source'] == "server" | where ['endpoint'] == "/api/specific" | where ['_time'] > ago(1h) | summarize count() by bin(['_time'], 5m) ```

3. Understand Context Deeply

Ask these questions explicitly:

What is the expected vs actual behavior?
When did this start? (recent deploy? time correlation?)
What triggers it? (user workflow, API endpoint, UI interaction)
Is it reproducible? (always, sometimes, conditions)
Which component is affected? (timeline, export, auth, etc.)

Trace data flow: Follow the error from manifestation → upstream cause

4. Think Upstream & Downstream

Critical: Don't fix symptoms. Investigate the chain:

Upstream (Root Causes):

What dependencies does this rely on? (Supabase, FAL, ElevenLabs, Vertex AI, FFmpeg)
What changed recently? (`git log --since="3 days ago" -- path/to/file`)
Are there config/env issues? (`vercel env ls`, `supabase migration list`)
Is this a timing/race condition?
Are external APIs rate-limited or down?
Are storage signed URLs expired?

Downstream (Side Effects):

What depends on this component?
Will fixing this break other features?
Are there related errors in logs?
What's the blast radius?
Will this affect other users' data?

DreamReal-specific dependencies:

Timeline → Assets → Supabase Storage (signed URLs!)
Export → FFmpeg → Vercel serverless limits
AI Generation → FAL/ElevenLabs → Rate limits
Scene Detection → GCS → Vertex AI

Example: Export fails → Upstream: FFmpeg config, memory limits, Vercel timeout → Downstream: Affects timeline, AI generations, user projects

5. Query Axiom with Precision

Get schema first, then query the specific issue:

```apl

Schema (if unfamiliar with fields)

['nonlinear-editor'] | take 1 | project-keep *

Specific error pattern with DreamReal fields

['nonlinear-editor'] | where ['_time'] > ago(2h) | where ['message'] contains "specific-error" | project ['_time'], ['source'], ['userId'], ['correlationId'], ['data_operation'], ['data_projectId']

Trace request end-to-end (correlation ID)

['nonlinear-editor'] | where ['correlationId'] == "xyz" | project ['_time'], ['source'], ['level'], ['message'], ['endpoint'] | order by ['_time'] asc

Check for missing logs (indicates log.flush() issue)

['nonlinear-editor'] | where ['_time'] > ago(1h) | where ['endpoint'] == "/api/specific/endpoint" | summarize count() by bin(['_time'], 5m)

External API issues

['nonlinear-editor'] | where ['_time'] > ago(2h) | where ['message'] contains "fal" or ['message'] contains "elevenlabs" or ['message'] contains "vertex" | where ['level'] in ("error", "warn") ```

Focus: Query the specific timeframe, component, and error type—not everything.

6. Verify External API Usage with Firecrawl (If API-Related)

If bug involves FAL, ElevenLabs, Vertex AI, or any external API:

```typescript // Use Firecrawl to fetch latest API documentation mcpfirecrawlfirecrawl_scrape({ url: "https://fal.ai/docs/model-endpoints/...", formats: ["markdown"] })

// OR search for specific API usage mcpfirecrawlfirecrawl_search({ query: "FAL.ai text to speech API parameters 2025", limit: 3, sources: ["web"] }) ```

What to verify:

API endpoint URLs (changed?)
Required parameters (new requirements?)
Authentication method (API key format, headers)
Rate limits (updated?)
Response structure (fields changed?)
Error codes (new error types?)

Common API issues in DreamReal:

FAL: 5 requests/min limit, webhook vs polling
ElevenLabs: Different limits per tier, voice IDs change
Vertex AI: GCS URI requirements, region restrictions
Supabase Storage: Signed URL expiry (default 3600s)

After Firecrawl check, compare with codebase usage: ```bash

Find where API is called

grep -r "fal.ai" app/ lib/ --include=".ts" --include=".tsx" ```

7. Assess Logging Coverage & Add If Needed

Check if sufficient logging exists for the affected area:

```apl

Check log density for endpoint

['nonlinear-editor'] | where ['_time'] > ago(1h) | where ['endpoint'] == "/api/problematic/endpoint" | summarize count() by bin(['_time'], 5m)

Expected: Multiple logs per request (entry, operation, exit)

If < 3 per request: ADD MORE LOGGING

```

Add logging if missing (follow hybrid logger pattern):

```typescript // Client-side logging import { useHybridLogger } from '@/lib/logger'; const log = useHybridLogger();

log.info('Operation starting', { operation: 'saveTimeline', projectId }); // ... operation ... log.error('Operation failed', { operation: 'saveTimeline', error: error.message });

// Server-side logging (API routes) import { createServerLogger, extractCorrelationId } from '@/lib/logger'; const log = createServerLogger(); const correlationId = extractCorrelationId(request.headers);

log.info('API request received', { endpoint: '/api/assets', method: 'POST', correlationId }); // ... operation ... log.error('API request failed', { endpoint: '/api/assets', error: error.message, correlationId });

// CRITICAL: Always flush before return! await log.flush(); return Response.json({ data }); ```

Logging checklist:

Entry point logged (with correlationId on server)
Key operations logged (DB queries, API calls, file operations)
Error paths logged (with stack traces)
Exit point logged (success/failure)
Server logs flushed before return (`await log.flush()`)

8. Use CLI/MCP Tools Directly

NEVER ask user to check dashboards. Execute commands:

```bash

Vercel (deployment & logs)

vercel ls --limit 5 vercel logs [deployment-url] | grep "error" vercel env ls | grep -E "SUPABASE|FAL|ELEVENLABS"

Supabase (database & storage)

supabase migration list supabase storage ls assets supabase db remote # Connect to DB

GitHub Actions (CI/CD)

gh run list --limit 5 gh run view [run-id] --log-failed gh secret list | grep -E "SUPABASE|FAL"

Chrome DevTools MCP (frontend)

mcpchrome_devtoolslist_console_messages({ pageIdx: 0 }) mcpchrome_devtoolslist_network_requests({ resourceTypes: ["xhr", "fetch"] })

Firecrawl MCP (API docs)

mcpfirecrawlfirecrawl_scrape({ url: "https://docs.api.com" }) ```

9. Check DreamReal-Specific Systems

Database (Supabase):

RLS policies: `supabase db remote` → check policies
Migration status: `supabase migration list`
User-scoped queries: Ensure `user_id` in WHERE clause

Storage (Supabase):

Signed URLs: Always use `createSignedUrl()`, never direct `storage_url`
User-scoped paths: `{userId}/{projectId}/{type}/{filename}`
Cleanup: Temp files in `os.tmpdir()`, cleanup in `finally` blocks
Buckets: `assets` (500MB), `frames` (50MB), `frame-edits` (100MB)

Auth:

`withAuth` middleware applied to protected routes
JWT token validity
RLS policies match `user_id`

API Routes:

Rate limiting configured
Input validation: `assertString`, `assertUserId`, etc.
Service layer used (not direct DB access in route)
Error handling: Standardized error responses
Log flushing: `await log.flush()` before return

External APIs:

FAL: 5 req/min, webhook preferred over polling
ElevenLabs: Check tier limits
Vertex AI: Requires GCS URIs (temp upload to `dreamreal-video-editor-uploads`)
Firecrawl: Check API key, rate limits

10. Root Cause → Solution

Document explicitly:

Root Cause: What actually caused the issue?
Solution: What specific change will fix it?
Files Modified: Which files need changes?
Testing Plan: How will you verify the fix?

Example:

Root Cause: Missing keyframe interval in FFmpeg export
Solution: Add `-g 30` flag to FFmpeg command
Files: `lib/export/ffmpegExporter.ts`
Testing: Export 3 test videos, verify playback

11. Implement Fix with Best Practices

Follow DreamReal repo standards (see `/docs/CODING_BEST_PRACTICES.md`):

TypeScript: Branded IDs (`UserId`, `ProjectId`), no `any`, explicit return types
API Routes: `withAuth` middleware, input validation, service layer, log flushing
Storage: User-scoped paths, signed URLs, cleanup in `finally`
Error Handling: Proper try/catch, standardized error responses
Logging: Correlation IDs, entry/exit points, flush before return
Testing: Write/update tests if applicable

12. Validate with Subagent (MANDATORY)

After implementing fix, launch validation subagent:

```typescript Task({ subagent_type: "general-purpose", description: "Validate debug fix follows DreamReal best practices", prompt: ` Review the fix I just implemented for [describe issue].

Validate against DreamReal repo best practices:

TypeScript:

Branded IDs used? (UserId, ProjectId, AssetId - not plain string)
No 'any' types?
Explicit return types?
Discriminated unions for state?

API Routes:

withAuth middleware applied?
Input validation (assertString, assertUserId)?
Service layer used (not direct DB in route)?
Correlation ID extracted (extractCorrelationId)?
Logs flushed before return (await log.flush())?

Storage (Supabase):

User-scoped paths ({userId}/{projectId}/...)?
Signed URLs used (not direct storage_url)?
Cleanup in finally blocks?
Correct bucket (assets/frames/frame-edits)?
NO localStorage for persistent data?

External APIs:

Rate limiting handled (FAL: 5 req/min)?
Error handling for timeouts?
API docs checked via Firecrawl?
Correct parameters per latest docs?

Logging:

Entry/exit points logged?
Correlation IDs used?
Server logs flushed (await log.flush())?
Error paths logged with context?

Security:

RLS policies enforced?
Ownership verification?
user_id in queries?

Testing:

Tests written/updated?
formData() mocks if needed?
AbortController cleanup?

Files modified: [list files]

Report:

✅ Passes best practices
⚠️ Issues found (with specific recommendations)
🚨 Critical violations (MUST fix before deploy)

Reference: /docs/CODING_BEST_PRACTICES.md, /docs/STORAGE_GUIDE.md, CLAUDE.md ` }) ```

Why: Ensures fixes don't introduce technical debt, violate DreamReal standards, or repeat common bugs.

13. Verify Fix in Production

After deploying:

```bash

Build locally first

npm run build

Commit & push

git add . && git commit -m "fix: [description]" && git push

Check deployment

vercel ls --limit 5

Query Axiom to verify error stopped

mcpaxiomqueryApl("['nonlinear-editor'] | where ['_time'] > ago(30m) | where ['message'] contains 'previous-error' | count") ```

If errors persist: Return to Step 3, investigate deeper.

Success Criteria

✅ Root cause identified via production logs (not guessing)
✅ Common DreamReal bugs checked (signed URLs, RLS, cleanup, log flush)
✅ Upstream/downstream impacts analyzed
✅ External API usage verified via Firecrawl (if applicable)
✅ Logging added/improved if coverage was insufficient
✅ Fix implemented following DreamReal best practices
✅ Subagent validation passed (or issues addressed)
✅ Verified in production (Axiom shows error resolved)
✅ Tests pass (`npm test`)
✅ Build succeeds (`npm run build`)

Common DreamReal Bugs - Quick Reference

Storage:

Using `storage_url` directly → Use `createSignedUrl()`
Missing cleanup in `finally` → Add cleanup
Non-user-scoped paths → Use `{userId}/{projectId}/...`
localStorage for data → Use Supabase

API Routes:

Missing `withAuth` → Add middleware
No input validation → Add `assertString`, etc.
No `log.flush()` → Add `await log.flush()` before return
Missing correlation ID → Use `extractCorrelationId()`
Plain `string` types → Use `UserId`, `ProjectId`

External APIs:

Ignoring rate limits → Check FAL (5/min), ElevenLabs limits
Wrong parameters → Firecrawl docs first
No timeout handling → Add timeout + retry logic

Database:

RLS blocking → Check `user_id` scoping
Missing `user_id` → Add to WHERE/INSERT

Frontend:

Missing `formData()` mock → Add to tests
AbortController leak → Cleanup on unmount
localStorage NaN → Add try/catch + `isNaN()` check

Quick Reference

```bash

1. Production telemetry (ALWAYS START HERE)

mcpaxiomqueryApl("['nonlinear-editor'] | where ['_time'] > ago(1h) | where ['level'] == 'error' | summarize count() by ['message'], ['source']") npx sentry-cli issues list --status unresolved --limit 20

2. Check common DreamReal bugs in logs

mcpaxiomqueryApl("['nonlinear-editor'] | where ['message'] contains 'signed' or ['message'] contains 'storage_url' | where ['level'] == 'error'")

3. Verify API usage (if API-related bug)

mcpfirecrawlfirecrawl_scrape({ url: "https://docs.fal.ai/...", formats: ["markdown"] })

4. Check logging coverage

mcpaxiomqueryApl("['nonlinear-editor'] | where ['endpoint'] == '/api/endpoint' | where ['_time'] > ago(1h) | summarize count() by bin(['_time'], 5m)")

5. Recent changes

git log --since="3 days ago" --oneline -- path/to/file

6. Verify deployment/env

vercel ls && vercel env ls supabase migration list

7. Validate fix

npm run build && npm test

8. Launch validation subagent (MANDATORY)

Task(subagent_type: "general-purpose", description: "Validate DreamReal fix") ```

Remember: Production logs → Common bugs → Root cause → Firecrawl API docs (if needed) → Add logging → Best practices → Validation → Verification

Install Skill

SKILL.md

DreamReal Production-First Debugging Workflow

When to Use

Debugging Checklist

1. Production Telemetry FIRST (MANDATORY)

2. Check for Common DreamReal Bugs (Quick Scan)

Storage URL issues

RLS policy failures

Missing log flush (API returns before logs sent)

3. Understand Context Deeply

4. Think Upstream & Downstream

5. Query Axiom with Precision

Schema (if unfamiliar with fields)

Specific error pattern with DreamReal fields

Trace request end-to-end (correlation ID)

Check for missing logs (indicates log.flush() issue)

External API issues

6. Verify External API Usage with Firecrawl (If API-Related)

Find where API is called

7. Assess Logging Coverage & Add If Needed

Check log density for endpoint

Expected: Multiple logs per request (entry, operation, exit)

If < 3 per request: ADD MORE LOGGING

8. Use CLI/MCP Tools Directly

Vercel (deployment & logs)

Supabase (database & storage)

GitHub Actions (CI/CD)

Chrome DevTools MCP (frontend)

Firecrawl MCP (API docs)

9. Check DreamReal-Specific Systems

10. Root Cause → Solution

11. Implement Fix with Best Practices

12. Validate with Subagent (MANDATORY)

13. Verify Fix in Production

Build locally first

Commit & push

Check deployment

Query Axiom to verify error stopped

Success Criteria

Common DreamReal Bugs - Quick Reference

Quick Reference

1. Production telemetry (ALWAYS START HERE)

2. Check common DreamReal bugs in logs

3. Verify API usage (if API-related bug)

4. Check logging coverage

5. Recent changes

6. Verify deployment/env

7. Validate fix

8. Launch validation subagent (MANDATORY)