| name | tzurot-deployment |
| description | Railway deployment operations for Tzurot v3 - Service management, log analysis, environment variables, health checks, and troubleshooting. Use when deploying, debugging production issues, or managing Railway infrastructure. |
| lastUpdated | 2025-12-08 |
Deployment Skill - Tzurot v3
Critical Context: Tzurot v3 is deployed on Railway with BYOK (Bring Your Own Key) implemented. Public beta is ready for users.
🎯 Use This Skill When
- Deploying changes to Railway
- Checking service logs
- Managing environment variables
- Debugging production issues
- Verifying service health
- Rolling back deployments
- Managing Railway infrastructure
🚂 Railway CLI Reference
IMPORTANT: Always consult
docs/reference/RAILWAY_CLI_REFERENCE.mdbefore running Railway commands. AI training data may be outdated for Railway CLI 4.5.3.
Core Deployment Operations
1. Service Status Checks
# Check all services status
railway status
# Check specific service
railway status --service api-gateway
railway status --service ai-worker
railway status --service bot-client
# Check deployment history
railway deploy --service api-gateway
2. Viewing Logs
Pattern: Always use correlation IDs to trace requests across services.
# Tail logs for a service (last 50 lines, follow mode)
railway logs --service api-gateway --tail 50
# Tail logs for ai-worker
railway logs --service ai-worker --tail 50
# Tail logs for bot-client
railway logs --service bot-client --tail 50
# Search logs for specific pattern
railway logs --service api-gateway | grep "ERROR"
# Find logs with correlation ID (trace request across services)
railway logs --service api-gateway | grep "requestId:abc123"
railway logs --service ai-worker | grep "requestId:abc123"
Log Analysis Tips:
- Look for correlation IDs to trace requests end-to-end
- Check for ERROR level logs first
- Use timestamps to correlate events across services
- See
tzurot-observabilityskill for log structure details
3. Environment Variables
Pattern: Use Railway dashboard for sensitive values, CLI for non-sensitive.
# List all environment variables for a service
railway variables --service api-gateway
# Set environment variable
railway variables set OPENROUTER_API_KEY=sk-or-v1-... --service ai-worker
# Set multiple variables
railway variables set \
AI_PROVIDER=openrouter \
LOG_LEVEL=info \
--service ai-worker
# Delete environment variable
railway variables delete OLD_VAR_NAME --service ai-worker
Security Reminder: See tzurot-security skill for secret management best practices.
4. Health Checks
# API Gateway health endpoint
curl https://api-gateway-development-83e8.up.railway.app/health
# Expected response:
# {
# "status": "healthy",
# "timestamp": "2025-11-19T14:30:00.000Z",
# "services": {
# "database": "connected",
# "redis": "connected"
# }
# }
Troubleshooting Unhealthy Services:
- Check logs:
railway logs --service <service-name> --tail 100 - Verify environment variables:
railway variables --service <service-name> - Check database/Redis connectivity
- Review recent deployments:
railway status --service <service-name>
5. Deployment Workflow
Standard Deployment Process:
Merge PR to develop (Railway auto-deploys):
gh pr merge <PR-number> --rebaseVerify deployment started:
railway status --service api-gateway # Look for "Deploying" statusMonitor deployment logs:
railway logs --service api-gateway --tail 100 # Watch for "Server started" or deployment errorsVerify health endpoint:
curl https://api-gateway-development-83e8.up.railway.app/healthCheck all services are healthy:
railway status # Should show all services as "Running"
Auto-Deploy Configuration:
- Branch:
develop(feature branches do NOT auto-deploy) - Trigger: Push to
developbranch on GitHub - Services: All 3 services (bot-client, api-gateway, ai-worker) deploy independently
6. Rolling Back Deployments
If deployment breaks production:
# 1. Check deployment history
railway status --service api-gateway
# 2. Identify last known good commit
git log --oneline -10
# 3. Revert to previous commit (creates new commit)
git revert HEAD
git push origin develop
# 4. Railway will auto-deploy the revert
# OR: Force deploy a specific commit (use with caution)
# See docs/reference/RAILWAY_CLI_REFERENCE.md for correct syntax
Alternative: Use GitHub to revert the PR merge commit and push to develop.
7. Manual Deployments
When auto-deploy fails or you need to redeploy:
# Redeploy without code changes (useful for env var updates)
railway up --service api-gateway
# Deploy from current branch (use with caution - usually deploy from develop)
railway up --service ai-worker
⚠️ Warning: Manual deployments from feature branches can cause inconsistencies. Always deploy from develop in production.
Common Operations
Restarting a Service
# Restart a service (useful for picking up new env vars)
railway restart --service bot-client
When to restart:
- After changing environment variables
- Service is stuck (check logs first!)
- Memory leak suspected (check metrics)
Database Operations
# Connect to PostgreSQL database
railway run psql
# Run Prisma migrations
railway run npx prisma migrate deploy
# Generate Prisma client (after schema changes)
railway run npx prisma generate
# View database in Prisma Studio (local only, connects to Railway DB)
npx prisma studio
Redis Operations
# Connect to Redis CLI (if Redis CLI is installed)
railway run redis-cli
# Common Redis commands:
# - PING (test connection)
# - KEYS * (list all keys - don't use in production!)
# - GET key_name
# - DEL key_name
# - FLUSHDB (clear all keys - DANGEROUS!)
Troubleshooting Guide
Service Won't Start
Symptoms: Service shows "Crashed" or "Failed" status
Steps:
Check logs for errors:
railway logs --service <service-name> --tail 100Common issues:
- Missing environment variables: Check
railway variables --service <service-name> - Database connection failed: Verify
DATABASE_URLis set - Redis connection failed: Verify
REDIS_URLis set - Port already in use: Check for duplicate deployments
- Build failed: Check build logs for TypeScript/dependency errors
- Missing environment variables: Check
Verify environment variables are set:
railway variables --service <service-name> | grep -E "(DATABASE_URL|REDIS_URL|DISCORD_TOKEN)"
Slow Response Times
Symptoms: API responses taking >5 seconds, timeouts
Steps:
Check service logs for slow operations:
railway logs --service api-gateway | grep "duration"Check database query performance:
railway logs --service ai-worker | grep "prisma"Check BullMQ job processing times:
railway logs --service ai-worker | grep "job completed"Verify Railway service resources (use Railway dashboard):
- CPU usage
- Memory usage
- Active connections
Discord Bot Not Responding
Symptoms: Bot appears online but doesn't respond to commands
Steps:
Check bot-client logs:
railway logs --service bot-client --tail 50Verify webhook creation (should see "Webhook created" in logs)
Check API Gateway is reachable:
curl https://api-gateway-development-83e8.up.railway.app/healthVerify Discord token is set:
railway variables --service bot-client | grep DISCORD_TOKENCheck if bot has proper permissions in Discord server
Database Migrations Failed
Symptoms: Service crashes after deployment with Prisma errors
Steps:
Check migration status:
railway run npx prisma migrate statusApply missing migrations:
railway run npx prisma migrate deployIf migrations are corrupted, see
docs/migration/for recovery procedures
Memory Leaks
Symptoms: Service memory usage gradually increases, eventual crash
Steps:
Monitor memory usage in Railway dashboard
Check for unclosed connections:
railway logs --service ai-worker | grep "connection"Look for growing caches or queues:
railway logs --service bot-client | grep "cache"Temporary fix: Restart service
railway restart --service <service-name>Long-term fix: Investigate code for resource leaks
Railway-Specific Patterns
Private Networking
Railway services communicate via private networking (no public internet):
// ✅ CORRECT - Use Railway-provided URLs (internal networking)
const GATEWAY_URL = process.env.GATEWAY_URL; // e.g., "http://api-gateway.railway.internal"
// ❌ WRONG - Don't use public URLs for internal communication
const GATEWAY_URL = 'https://api-gateway-development-83e8.up.railway.app';
Environment Variable Injection
Railway automatically injects service URLs:
// Railway provides these automatically:
const DATABASE_URL = process.env.DATABASE_URL; // PostgreSQL addon
const REDIS_URL = process.env.REDIS_URL; // Redis addon
const GATEWAY_URL = process.env.GATEWAY_URL; // Service reference
No need to manually configure these - Railway handles it.
Service Dependencies
Startup Order: Services start in parallel, handle connection retries:
// ✅ GOOD - Retry database connection on startup
async function connectWithRetry(maxAttempts = 5) {
for (let attempt = 1; attempt <= maxAttempts; attempt++) {
try {
await prisma.$connect();
logger.info('Database connected');
return;
} catch (error) {
logger.warn({ attempt, maxAttempts }, 'Database connection failed, retrying...');
await new Promise(resolve => setTimeout(resolve, 2000 * attempt));
}
}
throw new Error('Database connection failed after max attempts');
}
See tzurot-async-flow skill for retry patterns.
Cost Optimization
Development Environment
- Plan: Hobby (free tier with credits)
- Services: 3 services (bot-client, api-gateway, ai-worker)
- Addons: PostgreSQL, Redis
Cost Monitoring:
- Check usage in Railway dashboard
- Set up billing alerts (if available)
- Monitor AI API costs (OpenRouter/Gemini) separately
Note: With BYOK, users provide their own API keys. Guest mode uses free models for users without keys.
Scaling Considerations
- Monitor usage with admin commands (
/admin usage) - Rate limiting implemented per guild/user
- Consider autoscaling for api-gateway and ai-worker if load increases
Deployment Checklist
Before Every Deployment:
- ✅ Tests passing (
pnpm test) - ✅ Linting passing (
pnpm lint) - ✅ PR approved and merged to
develop - ✅ Verified no secrets committed (see
tzurot-security)
After Deployment:
- ✅ All services show "Running" status
- ✅ Health endpoint returns 200 OK
- ✅ No ERROR logs in first 5 minutes
- ✅ Discord bot responds to test command
- ✅ Database migrations applied successfully (if any)
Railway Dashboard
Useful Sections:
- Deployments: View build logs and deployment history
- Metrics: CPU, memory, bandwidth usage
- Variables: Manage environment variables (easier than CLI for viewing)
- Logs: Alternative to CLI for log viewing (with filtering)
- Settings: Service configuration, custom domains, sleep settings
Dashboard URL: https://railway.app/project/[project-id]
Related Skills
- tzurot-observability - Log analysis and correlation IDs
- tzurot-security - Secret management and environment variables
- tzurot-git-workflow - Deployment triggers and branch strategy
- tzurot-docs - Update CURRENT_WORK.md after deployments
References
- Railway CLI Reference:
docs/reference/RAILWAY_CLI_REFERENCE.md - Railway deployment guide:
docs/deployment/RAILWAY_DEPLOYMENT.md - Project README:
README.md#deployment - Railway official docs: https://docs.railway.app/
Red Flags - When to Consult This Skill
- About to deploy to Railway
- Service is down or unhealthy
- Need to check production logs
- Environment variables need updating
- Database migration needed
- Performance issues in production
- Cost concerns or billing alerts