name	tzurot-deployment
description	Railway deployment operations for Tzurot v3 - Service management, log analysis, environment variables, health checks, and troubleshooting. Use when deploying, debugging production issues, or managing Railway infrastructure.
lastUpdated	2025-12-08

Deployment Skill - Tzurot v3

Critical Context: Tzurot v3 is deployed on Railway with BYOK (Bring Your Own Key) implemented. Public beta is ready for users.

🎯 Use This Skill When

Deploying changes to Railway
Checking service logs
Managing environment variables
Debugging production issues
Verifying service health
Rolling back deployments
Managing Railway infrastructure

🚂 Railway CLI Reference

IMPORTANT: Always consult docs/reference/RAILWAY_CLI_REFERENCE.md before running Railway commands. AI training data may be outdated for Railway CLI 4.5.3.

Core Deployment Operations

1. Service Status Checks

# Check all services status
railway status

# Check specific service
railway status --service api-gateway
railway status --service ai-worker
railway status --service bot-client

# Check deployment history
railway deploy --service api-gateway

2. Viewing Logs

Pattern: Always use correlation IDs to trace requests across services.

# Tail logs for a service (last 50 lines, follow mode)
railway logs --service api-gateway --tail 50

# Tail logs for ai-worker
railway logs --service ai-worker --tail 50

# Tail logs for bot-client
railway logs --service bot-client --tail 50

# Search logs for specific pattern
railway logs --service api-gateway | grep "ERROR"

# Find logs with correlation ID (trace request across services)
railway logs --service api-gateway | grep "requestId:abc123"
railway logs --service ai-worker | grep "requestId:abc123"

Log Analysis Tips:

Look for correlation IDs to trace requests end-to-end
Check for ERROR level logs first
Use timestamps to correlate events across services
See tzurot-observability skill for log structure details

3. Environment Variables

Pattern: Use Railway dashboard for sensitive values, CLI for non-sensitive.

# List all environment variables for a service
railway variables --service api-gateway

# Set environment variable
railway variables set OPENROUTER_API_KEY=sk-or-v1-... --service ai-worker

# Set multiple variables
railway variables set \
  AI_PROVIDER=openrouter \
  LOG_LEVEL=info \
  --service ai-worker

# Delete environment variable
railway variables delete OLD_VAR_NAME --service ai-worker

Security Reminder: See tzurot-security skill for secret management best practices.

4. Health Checks

# API Gateway health endpoint
curl https://api-gateway-development-83e8.up.railway.app/health

# Expected response:
# {
#   "status": "healthy",
#   "timestamp": "2025-11-19T14:30:00.000Z",
#   "services": {
#     "database": "connected",
#     "redis": "connected"
#   }
# }

Troubleshooting Unhealthy Services:

Check logs: railway logs --service <service-name> --tail 100
Verify environment variables: railway variables --service <service-name>
Check database/Redis connectivity
Review recent deployments: railway status --service <service-name>

5. Deployment Workflow

Standard Deployment Process:

Merge PR to develop (Railway auto-deploys):
```
gh pr merge <PR-number> --rebase
```

Verify deployment started:

railway status --service api-gateway
# Look for "Deploying" status

Monitor deployment logs:

railway logs --service api-gateway --tail 100
# Watch for "Server started" or deployment errors

Verify health endpoint:

curl https://api-gateway-development-83e8.up.railway.app/health

Check all services are healthy:

railway status  # Should show all services as "Running"

Auto-Deploy Configuration:

Branch: develop (feature branches do NOT auto-deploy)
Trigger: Push to develop branch on GitHub
Services: All 3 services (bot-client, api-gateway, ai-worker) deploy independently

6. Rolling Back Deployments

If deployment breaks production:

# 1. Check deployment history
railway status --service api-gateway

# 2. Identify last known good commit
git log --oneline -10

# 3. Revert to previous commit (creates new commit)
git revert HEAD
git push origin develop

# 4. Railway will auto-deploy the revert

# OR: Force deploy a specific commit (use with caution)
# See docs/reference/RAILWAY_CLI_REFERENCE.md for correct syntax

Alternative: Use GitHub to revert the PR merge commit and push to develop.

7. Manual Deployments

When auto-deploy fails or you need to redeploy:

# Redeploy without code changes (useful for env var updates)
railway up --service api-gateway

# Deploy from current branch (use with caution - usually deploy from develop)
railway up --service ai-worker

⚠️ Warning: Manual deployments from feature branches can cause inconsistencies. Always deploy from develop in production.

Common Operations

Restarting a Service

# Restart a service (useful for picking up new env vars)
railway restart --service bot-client

When to restart:

After changing environment variables
Service is stuck (check logs first!)
Memory leak suspected (check metrics)

Database Operations

# Connect to PostgreSQL database
railway run psql

# Run Prisma migrations
railway run npx prisma migrate deploy

# Generate Prisma client (after schema changes)
railway run npx prisma generate

# View database in Prisma Studio (local only, connects to Railway DB)
npx prisma studio

Redis Operations

# Connect to Redis CLI (if Redis CLI is installed)
railway run redis-cli

# Common Redis commands:
# - PING (test connection)
# - KEYS * (list all keys - don't use in production!)
# - GET key_name
# - DEL key_name
# - FLUSHDB (clear all keys - DANGEROUS!)

Troubleshooting Guide

Service Won't Start

Symptoms: Service shows "Crashed" or "Failed" status

Steps:

Check logs for errors:

railway logs --service <service-name> --tail 100

Common issues:
- Missing environment variables: Check railway variables --service <service-name>
- Database connection failed: Verify DATABASE_URL is set
- Redis connection failed: Verify REDIS_URL is set
- Port already in use: Check for duplicate deployments
- Build failed: Check build logs for TypeScript/dependency errors

Verify environment variables are set:

railway variables --service <service-name> | grep -E "(DATABASE_URL|REDIS_URL|DISCORD_TOKEN)"

Slow Response Times

Symptoms: API responses taking >5 seconds, timeouts

Steps:

Check service logs for slow operations:

railway logs --service api-gateway | grep "duration"

Check database query performance:

railway logs --service ai-worker | grep "prisma"

Check BullMQ job processing times:

railway logs --service ai-worker | grep "job completed"

Verify Railway service resources (use Railway dashboard):
- CPU usage
- Memory usage
- Active connections

Discord Bot Not Responding

Symptoms: Bot appears online but doesn't respond to commands

Steps:

Check bot-client logs:

railway logs --service bot-client --tail 50

Verify webhook creation (should see "Webhook created" in logs)

Check API Gateway is reachable:

curl https://api-gateway-development-83e8.up.railway.app/health

Verify Discord token is set:

railway variables --service bot-client | grep DISCORD_TOKEN

Check if bot has proper permissions in Discord server

Database Migrations Failed

Symptoms: Service crashes after deployment with Prisma errors

Steps:

Check migration status:
```
railway run npx prisma migrate status
```
Apply missing migrations:
```
railway run npx prisma migrate deploy
```
If migrations are corrupted, see docs/migration/ for recovery procedures

Memory Leaks

Symptoms: Service memory usage gradually increases, eventual crash

Steps:

Monitor memory usage in Railway dashboard

Check for unclosed connections:

railway logs --service ai-worker | grep "connection"

Look for growing caches or queues:

railway logs --service bot-client | grep "cache"

Temporary fix: Restart service

railway restart --service <service-name>

Long-term fix: Investigate code for resource leaks

Railway-Specific Patterns

Private Networking

Railway services communicate via private networking (no public internet):

// ✅ CORRECT - Use Railway-provided URLs (internal networking)
const GATEWAY_URL = process.env.GATEWAY_URL; // e.g., "http://api-gateway.railway.internal"

// ❌ WRONG - Don't use public URLs for internal communication
const GATEWAY_URL = 'https://api-gateway-development-83e8.up.railway.app';

Environment Variable Injection

Railway automatically injects service URLs:

// Railway provides these automatically:
const DATABASE_URL = process.env.DATABASE_URL; // PostgreSQL addon
const REDIS_URL = process.env.REDIS_URL; // Redis addon
const GATEWAY_URL = process.env.GATEWAY_URL; // Service reference

No need to manually configure these - Railway handles it.

Service Dependencies

Startup Order: Services start in parallel, handle connection retries:

// ✅ GOOD - Retry database connection on startup
async function connectWithRetry(maxAttempts = 5) {
  for (let attempt = 1; attempt <= maxAttempts; attempt++) {
    try {
      await prisma.$connect();
      logger.info('Database connected');
      return;
    } catch (error) {
      logger.warn({ attempt, maxAttempts }, 'Database connection failed, retrying...');
      await new Promise(resolve => setTimeout(resolve, 2000 * attempt));
    }
  }
  throw new Error('Database connection failed after max attempts');
}

See tzurot-async-flow skill for retry patterns.

Cost Optimization

Development Environment

Plan: Hobby (free tier with credits)
Services: 3 services (bot-client, api-gateway, ai-worker)
Addons: PostgreSQL, Redis

Cost Monitoring:

Check usage in Railway dashboard
Set up billing alerts (if available)
Monitor AI API costs (OpenRouter/Gemini) separately

Note: With BYOK, users provide their own API keys. Guest mode uses free models for users without keys.

Scaling Considerations

Monitor usage with admin commands (/admin usage)
Rate limiting implemented per guild/user
Consider autoscaling for api-gateway and ai-worker if load increases

Deployment Checklist

Before Every Deployment:

✅ Tests passing (pnpm test)
✅ Linting passing (pnpm lint)
✅ PR approved and merged to develop
✅ Verified no secrets committed (see tzurot-security)

After Deployment:

✅ All services show "Running" status
✅ Health endpoint returns 200 OK
✅ No ERROR logs in first 5 minutes
✅ Discord bot responds to test command
✅ Database migrations applied successfully (if any)

Railway Dashboard

Useful Sections:

Deployments: View build logs and deployment history
Metrics: CPU, memory, bandwidth usage
Variables: Manage environment variables (easier than CLI for viewing)
Logs: Alternative to CLI for log viewing (with filtering)
Settings: Service configuration, custom domains, sleep settings

Dashboard URL: https://railway.app/project/[project-id]

Related Skills

tzurot-observability - Log analysis and correlation IDs
tzurot-security - Secret management and environment variables
tzurot-git-workflow - Deployment triggers and branch strategy
tzurot-docs - Update CURRENT_WORK.md after deployments

References

Railway CLI Reference: docs/reference/RAILWAY_CLI_REFERENCE.md
Railway deployment guide: docs/deployment/RAILWAY_DEPLOYMENT.md
Project README: README.md#deployment
Railway official docs: https://docs.railway.app/

Red Flags - When to Consult This Skill

About to deploy to Railway
Service is down or unhealthy
Need to check production logs
Environment variables need updating
Database migration needed
Performance issues in production
Cost concerns or billing alerts

tzurot-deployment

Install Skill

SKILL.md

Deployment Skill - Tzurot v3

🎯 Use This Skill When

🚂 Railway CLI Reference

Core Deployment Operations

1. Service Status Checks

2. Viewing Logs

3. Environment Variables

4. Health Checks

5. Deployment Workflow

6. Rolling Back Deployments

7. Manual Deployments

Common Operations

Restarting a Service

Database Operations

Redis Operations

Troubleshooting Guide

Service Won't Start

Slow Response Times

Discord Bot Not Responding

Database Migrations Failed

Memory Leaks

Railway-Specific Patterns

Private Networking

Environment Variable Injection

Service Dependencies

Cost Optimization

Development Environment

Scaling Considerations

Deployment Checklist

Railway Dashboard

Related Skills

References

Red Flags - When to Consult This Skill