| name | debug |
| description | Production-first debugging workflow specific to DreamReal's infrastructure (Axiom, Sentry, Supabase, Vercel, FAL, ElevenLabs). ALWAYS checks Sentry/Axiom before code, adds logging if gaps found, uses Firecrawl to verify API usage, checks common bugs (signed URLs, cleanup, RLS, log.flush), then validates fixes via subagent. Use when user says "debug", mentions "debugging", reports bugs, errors, unexpected behavior, or asks to investigate/troubleshoot issues. |
| allowed-tools | mcp__axiom__*, mcp__chrome_devtools__*, mcp__firecrawl__*, Bash, Read, Grep, Glob, Edit, Task |
DreamReal Production-First Debugging Workflow
Purpose: Rapid root cause identification through production telemetry in DreamReal's specific infrastructure, followed by validated fixes that adhere to repo best practices.
Infrastructure: Axiom (next-axiom hybrid logging), Sentry, Supabase (DB + Storage), Vercel, FAL.ai, ElevenLabs, Google Vertex AI, Firecrawl, Chrome DevTools MCP
When to Use
- User reports a bug, error, or unexpected behavior
- Production issues need investigation
- Feature not working as expected
- API integration issues (FAL, ElevenLabs, Vertex AI)
- Storage/database access problems
Debugging Checklist
1. Production Telemetry FIRST (MANDATORY)
Before touching any code, query production logs:
# Axiom: Check recent errors (note: field is 'level', not 'severity')
mcp__axiom__queryApl("['nonlinear-editor'] | where ['_time'] > ago(1h) | where ['level'] == 'error' | summarize count() by ['message'], ['source']")
# Axiom: Get schema first if unsure of fields
mcp__axiom__getDatasetSchema("nonlinear-editor")
# Sentry: Check unresolved issues
npx sentry-cli issues list --status unresolved --limit 20
What to extract:
- Error frequency & patterns (group by message, source)
- Stack traces & affected files
- User IDs & request correlation IDs
- Timestamp of first occurrence
- Source: "browser" vs "server" logs
DreamReal-specific fields (flattened with `data_` prefix):
- `correlationId` - End-to-end request tracing
- `userId` - User-scoped debugging
- `source` - "browser" or "server"
- `data_operation` - Business operation (e.g., "saveTimeline", "uploadAsset")
- `data_projectId`, `data_assetId` - Entity references
- `data_metric`, `data_value`, `data_rating` - Web Vitals
Why: Production logs reveal the ACTUAL problem users face, not assumptions.
2. Check for Common DreamReal Bugs (Quick Scan)
Before deep investigation, check these frequent issues:
Storage Issues:
❌ Accessing storage_url directly instead of getting signed URL ```typescript // WRONG: Using storage_url directly
// CORRECT: Get signed URL first const { data } = await supabase.storage.from('assets').createSignedUrl(path, 3600) ```
❌ Missing cleanup in finally blocks for temp files
❌ Non-user-scoped storage paths (missing `{userId}/{projectId}/`)
❌ localStorage for persistent data (use Supabase instead)
API Route Issues:
- ❌ Missing `withAuth` middleware (`app/api/*/route.ts`)
- ❌ Missing input validation (no `assertString`, `assertUserId`, etc.)
- ❌ Not flushing logs before return (`await log.flush()` missing)
- ❌ Missing correlation ID (use `addCorrelationHeader()`)
- ❌ Plain string types instead of branded types (`UserId`, `ProjectId`)
External API Issues:
- ❌ Rate limiting (FAL: 5 req/min, ElevenLabs: limits per tier)
- ❌ Wrong API usage (check docs via Firecrawl - see Step 6)
- ❌ Missing error handling for API timeouts
- ❌ Not checking API response structure (API changes)
Database Issues:
- ❌ RLS policies blocking access (check `user_id` scoping)
- ❌ Missing `user_id` in inserts (fails RLS)
- ❌ Not handling null/undefined from queries
Frontend Issues:
- ❌ Missing formData() mock in tests
- ❌ AbortController not cleaned up (memory leaks)
- ❌ localStorage NaN values (missing try/catch + NaN checks)
Check Axiom for these patterns: ```apl
Storage URL issues
['nonlinear-editor'] | where ['_time'] > ago(2h) | where ['message'] contains "signed" or ['message'] contains "storage_url" | where ['level'] == "error"
RLS policy failures
['nonlinear-editor'] | where ['message'] contains "RLS" or ['message'] contains "policy" | where ['level'] == "error"
Missing log flush (API returns before logs sent)
['nonlinear-editor'] | where ['source'] == "server" | where ['endpoint'] == "/api/specific" | where ['_time'] > ago(1h) | summarize count() by bin(['_time'], 5m) ```
3. Understand Context Deeply
Ask these questions explicitly:
- What is the expected vs actual behavior?
- When did this start? (recent deploy? time correlation?)
- What triggers it? (user workflow, API endpoint, UI interaction)
- Is it reproducible? (always, sometimes, conditions)
- Which component is affected? (timeline, export, auth, etc.)
Trace data flow: Follow the error from manifestation → upstream cause
4. Think Upstream & Downstream
Critical: Don't fix symptoms. Investigate the chain:
Upstream (Root Causes):
- What dependencies does this rely on? (Supabase, FAL, ElevenLabs, Vertex AI, FFmpeg)
- What changed recently? (`git log --since="3 days ago" -- path/to/file`)
- Are there config/env issues? (`vercel env ls`, `supabase migration list`)
- Is this a timing/race condition?
- Are external APIs rate-limited or down?
- Are storage signed URLs expired?
Downstream (Side Effects):
- What depends on this component?
- Will fixing this break other features?
- Are there related errors in logs?
- What's the blast radius?
- Will this affect other users' data?
DreamReal-specific dependencies:
- Timeline → Assets → Supabase Storage (signed URLs!)
- Export → FFmpeg → Vercel serverless limits
- AI Generation → FAL/ElevenLabs → Rate limits
- Scene Detection → GCS → Vertex AI
Example: Export fails → Upstream: FFmpeg config, memory limits, Vercel timeout → Downstream: Affects timeline, AI generations, user projects
5. Query Axiom with Precision
Get schema first, then query the specific issue:
```apl
Schema (if unfamiliar with fields)
['nonlinear-editor'] | take 1 | project-keep *
Specific error pattern with DreamReal fields
['nonlinear-editor'] | where ['_time'] > ago(2h) | where ['message'] contains "specific-error" | project ['_time'], ['source'], ['userId'], ['correlationId'], ['data_operation'], ['data_projectId']
Trace request end-to-end (correlation ID)
['nonlinear-editor'] | where ['correlationId'] == "xyz" | project ['_time'], ['source'], ['level'], ['message'], ['endpoint'] | order by ['_time'] asc
Check for missing logs (indicates log.flush() issue)
['nonlinear-editor'] | where ['_time'] > ago(1h) | where ['endpoint'] == "/api/specific/endpoint" | summarize count() by bin(['_time'], 5m)
External API issues
['nonlinear-editor'] | where ['_time'] > ago(2h) | where ['message'] contains "fal" or ['message'] contains "elevenlabs" or ['message'] contains "vertex" | where ['level'] in ("error", "warn") ```
Focus: Query the specific timeframe, component, and error type—not everything.
6. Verify External API Usage with Firecrawl (If API-Related)
If bug involves FAL, ElevenLabs, Vertex AI, or any external API:
```typescript // Use Firecrawl to fetch latest API documentation mcpfirecrawlfirecrawl_scrape({ url: "https://fal.ai/docs/model-endpoints/...", formats: ["markdown"] })
// OR search for specific API usage mcpfirecrawlfirecrawl_search({ query: "FAL.ai text to speech API parameters 2025", limit: 3, sources: ["web"] }) ```
What to verify:
- API endpoint URLs (changed?)
- Required parameters (new requirements?)
- Authentication method (API key format, headers)
- Rate limits (updated?)
- Response structure (fields changed?)
- Error codes (new error types?)
Common API issues in DreamReal:
- FAL: 5 requests/min limit, webhook vs polling
- ElevenLabs: Different limits per tier, voice IDs change
- Vertex AI: GCS URI requirements, region restrictions
- Supabase Storage: Signed URL expiry (default 3600s)
After Firecrawl check, compare with codebase usage: ```bash
Find where API is called
grep -r "fal.ai" app/ lib/ --include=".ts" --include=".tsx" ```
7. Assess Logging Coverage & Add If Needed
Check if sufficient logging exists for the affected area:
```apl
Check log density for endpoint
['nonlinear-editor'] | where ['_time'] > ago(1h) | where ['endpoint'] == "/api/problematic/endpoint" | summarize count() by bin(['_time'], 5m)
Expected: Multiple logs per request (entry, operation, exit)
If < 3 per request: ADD MORE LOGGING
```
Add logging if missing (follow hybrid logger pattern):
```typescript // Client-side logging import { useHybridLogger } from '@/lib/logger'; const log = useHybridLogger();
log.info('Operation starting', { operation: 'saveTimeline', projectId }); // ... operation ... log.error('Operation failed', { operation: 'saveTimeline', error: error.message });
// Server-side logging (API routes) import { createServerLogger, extractCorrelationId } from '@/lib/logger'; const log = createServerLogger(); const correlationId = extractCorrelationId(request.headers);
log.info('API request received', { endpoint: '/api/assets', method: 'POST', correlationId }); // ... operation ... log.error('API request failed', { endpoint: '/api/assets', error: error.message, correlationId });
// CRITICAL: Always flush before return! await log.flush(); return Response.json({ data }); ```
Logging checklist:
- Entry point logged (with correlationId on server)
- Key operations logged (DB queries, API calls, file operations)
- Error paths logged (with stack traces)
- Exit point logged (success/failure)
- Server logs flushed before return (`await log.flush()`)
8. Use CLI/MCP Tools Directly
NEVER ask user to check dashboards. Execute commands:
```bash
Vercel (deployment & logs)
vercel ls --limit 5 vercel logs [deployment-url] | grep "error" vercel env ls | grep -E "SUPABASE|FAL|ELEVENLABS"
Supabase (database & storage)
supabase migration list supabase storage ls assets supabase db remote # Connect to DB
GitHub Actions (CI/CD)
gh run list --limit 5 gh run view [run-id] --log-failed gh secret list | grep -E "SUPABASE|FAL"
Chrome DevTools MCP (frontend)
mcpchrome_devtoolslist_console_messages({ pageIdx: 0 }) mcpchrome_devtoolslist_network_requests({ resourceTypes: ["xhr", "fetch"] })
Firecrawl MCP (API docs)
mcpfirecrawlfirecrawl_scrape({ url: "https://docs.api.com" }) ```
9. Check DreamReal-Specific Systems
Database (Supabase):
- RLS policies: `supabase db remote` → check policies
- Migration status: `supabase migration list`
- User-scoped queries: Ensure `user_id` in WHERE clause
Storage (Supabase):
- Signed URLs: Always use `createSignedUrl()`, never direct `storage_url`
- User-scoped paths: `{userId}/{projectId}/{type}/{filename}`
- Cleanup: Temp files in `os.tmpdir()`, cleanup in `finally` blocks
- Buckets: `assets` (500MB), `frames` (50MB), `frame-edits` (100MB)
Auth:
- `withAuth` middleware applied to protected routes
- JWT token validity
- RLS policies match `user_id`
API Routes:
- Rate limiting configured
- Input validation: `assertString`, `assertUserId`, etc.
- Service layer used (not direct DB access in route)
- Error handling: Standardized error responses
- Log flushing: `await log.flush()` before return
External APIs:
- FAL: 5 req/min, webhook preferred over polling
- ElevenLabs: Check tier limits
- Vertex AI: Requires GCS URIs (temp upload to `dreamreal-video-editor-uploads`)
- Firecrawl: Check API key, rate limits
10. Root Cause → Solution
Document explicitly:
- Root Cause: What actually caused the issue?
- Solution: What specific change will fix it?
- Files Modified: Which files need changes?
- Testing Plan: How will you verify the fix?
Example:
- Root Cause: Missing keyframe interval in FFmpeg export
- Solution: Add `-g 30` flag to FFmpeg command
- Files: `lib/export/ffmpegExporter.ts`
- Testing: Export 3 test videos, verify playback
11. Implement Fix with Best Practices
Follow DreamReal repo standards (see `/docs/CODING_BEST_PRACTICES.md`):
- TypeScript: Branded IDs (`UserId`, `ProjectId`), no `any`, explicit return types
- API Routes: `withAuth` middleware, input validation, service layer, log flushing
- Storage: User-scoped paths, signed URLs, cleanup in `finally`
- Error Handling: Proper try/catch, standardized error responses
- Logging: Correlation IDs, entry/exit points, flush before return
- Testing: Write/update tests if applicable
12. Validate with Subagent (MANDATORY)
After implementing fix, launch validation subagent:
```typescript Task({ subagent_type: "general-purpose", description: "Validate debug fix follows DreamReal best practices", prompt: ` Review the fix I just implemented for [describe issue].
Validate against DreamReal repo best practices:
TypeScript:
- Branded IDs used? (UserId, ProjectId, AssetId - not plain string)
- No 'any' types?
- Explicit return types?
- Discriminated unions for state?
API Routes:
- withAuth middleware applied?
- Input validation (assertString, assertUserId)?
- Service layer used (not direct DB in route)?
- Correlation ID extracted (extractCorrelationId)?
- Logs flushed before return (await log.flush())?
Storage (Supabase):
- User-scoped paths ({userId}/{projectId}/...)?
- Signed URLs used (not direct storage_url)?
- Cleanup in finally blocks?
- Correct bucket (assets/frames/frame-edits)?
- NO localStorage for persistent data?
External APIs:
- Rate limiting handled (FAL: 5 req/min)?
- Error handling for timeouts?
- API docs checked via Firecrawl?
- Correct parameters per latest docs?
Logging:
- Entry/exit points logged?
- Correlation IDs used?
- Server logs flushed (await log.flush())?
- Error paths logged with context?
Security:
- RLS policies enforced?
- Ownership verification?
- user_id in queries?
Testing:
- Tests written/updated?
- formData() mocks if needed?
- AbortController cleanup?
Files modified: [list files]
Report:
- ✅ Passes best practices
- ⚠️ Issues found (with specific recommendations)
- 🚨 Critical violations (MUST fix before deploy)
Reference: /docs/CODING_BEST_PRACTICES.md, /docs/STORAGE_GUIDE.md, CLAUDE.md ` }) ```
Why: Ensures fixes don't introduce technical debt, violate DreamReal standards, or repeat common bugs.
13. Verify Fix in Production
After deploying:
```bash
Build locally first
npm run build
Commit & push
git add . && git commit -m "fix: [description]" && git push
Check deployment
vercel ls --limit 5
Query Axiom to verify error stopped
mcpaxiomqueryApl("['nonlinear-editor'] | where ['_time'] > ago(30m) | where ['message'] contains 'previous-error' | count") ```
If errors persist: Return to Step 3, investigate deeper.
Success Criteria
- ✅ Root cause identified via production logs (not guessing)
- ✅ Common DreamReal bugs checked (signed URLs, RLS, cleanup, log flush)
- ✅ Upstream/downstream impacts analyzed
- ✅ External API usage verified via Firecrawl (if applicable)
- ✅ Logging added/improved if coverage was insufficient
- ✅ Fix implemented following DreamReal best practices
- ✅ Subagent validation passed (or issues addressed)
- ✅ Verified in production (Axiom shows error resolved)
- ✅ Tests pass (`npm test`)
- ✅ Build succeeds (`npm run build`)
Common DreamReal Bugs - Quick Reference
Storage:
- Using `storage_url` directly → Use `createSignedUrl()`
- Missing cleanup in `finally` → Add cleanup
- Non-user-scoped paths → Use `{userId}/{projectId}/...`
- localStorage for data → Use Supabase
API Routes:
- Missing `withAuth` → Add middleware
- No input validation → Add `assertString`, etc.
- No `log.flush()` → Add `await log.flush()` before return
- Missing correlation ID → Use `extractCorrelationId()`
- Plain `string` types → Use `UserId`, `ProjectId`
External APIs:
- Ignoring rate limits → Check FAL (5/min), ElevenLabs limits
- Wrong parameters → Firecrawl docs first
- No timeout handling → Add timeout + retry logic
Database:
- RLS blocking → Check `user_id` scoping
- Missing `user_id` → Add to WHERE/INSERT
Frontend:
- Missing `formData()` mock → Add to tests
- AbortController leak → Cleanup on unmount
- localStorage NaN → Add try/catch + `isNaN()` check
Quick Reference
```bash
1. Production telemetry (ALWAYS START HERE)
mcpaxiomqueryApl("['nonlinear-editor'] | where ['_time'] > ago(1h) | where ['level'] == 'error' | summarize count() by ['message'], ['source']") npx sentry-cli issues list --status unresolved --limit 20
2. Check common DreamReal bugs in logs
mcpaxiomqueryApl("['nonlinear-editor'] | where ['message'] contains 'signed' or ['message'] contains 'storage_url' | where ['level'] == 'error'")
3. Verify API usage (if API-related bug)
mcpfirecrawlfirecrawl_scrape({ url: "https://docs.fal.ai/...", formats: ["markdown"] })
4. Check logging coverage
mcpaxiomqueryApl("['nonlinear-editor'] | where ['endpoint'] == '/api/endpoint' | where ['_time'] > ago(1h) | summarize count() by bin(['_time'], 5m)")
5. Recent changes
git log --since="3 days ago" --oneline -- path/to/file
6. Verify deployment/env
vercel ls && vercel env ls supabase migration list
7. Validate fix
npm run build && npm test
8. Launch validation subagent (MANDATORY)
Task(subagent_type: "general-purpose", description: "Validate DreamReal fix") ```
Remember: Production logs → Common bugs → Root cause → Firecrawl API docs (if needed) → Add logging → Best practices → Validation → Verification