| name | backend-dev-guidelines |
| description | Comprehensive backend development guide for Cheerful codebase. Covers Temporal.io workflows, SQLAlchemy 2.0 patterns, Gmail/Sheets APIs, FastAPI, Supabase, testing with pytest, and Fly.io deployment. Use when working with backend code, debugging issues, implementing features, database queries, API endpoints, durable workflows, email processing, or deployment. |
| version | 1.0.0 |
| license | MIT |
| allowed-tools | Read, Write, Edit, Bash, Grep, Glob |
Backend Development Guide for Cheerful
This skill provides critical gotchas, pre-flight checklists, and quick fixes for backend development in the Cheerful codebase. The system implements an email campaign management platform using FastAPI, Temporal.io for durable workflows, SQLAlchemy 2.0 Core for database access, Gmail API for email processing, and Supabase for PostgreSQL, auth, and storage.
When to Use This Skill
- Working with Temporal.io workflows or activities
- Writing SQLAlchemy database queries or models
- Implementing Gmail API integrations
- Working with Google Sheets data
- Writing or debugging tests
- Deploying to Fly.io (staging/production)
- Building FastAPI endpoints
- Configuring Supabase (PostgreSQL, Storage, Auth with RLS)
Critical Gotchas (Top 10)
1. Temporal Activities Retry INDEFINITELY by Default
Activities retry forever without explicit limits. ALWAYS set retry_policy=RetryPolicy(maximum_attempts=N).
# BAD - Retries forever
await workflow.execute_activity(my_activity, params)
# GOOD - Explicit retry limit
await workflow.execute_activity(
my_activity,
params,
retry_policy=RetryPolicy(maximum_attempts=3),
)
See {baseDir}/references/temporal.md for retry patterns.
2. Database Sessions Must Be Short-Lived
NEVER hold database sessions during LLM calls, API requests, or long operations. Extract scalars before closing session.
# BAD - Session held during 30+ second LLM call
with get_db_session_context() as db:
campaign = CampaignRepository(db).get_by_id(id)
llm_result = llm_service.generate(campaign.goal) # Session stays open!
# GOOD - Extract scalar, close session, then call LLM
with get_db_session_context() as db:
campaign = CampaignRepository(db).get_by_id(id)
campaign__goal = campaign.goal # Extract scalar
# Session closed
llm_result = llm_service.generate(campaign__goal) # No session held
See {baseDir}/references/sqlalchemy.md for session patterns.
3. gmail_thread_id is NOT Globally Unique
gmail_thread_id is unique per Gmail account, NOT globally. Always scope by gmail_account_id.
# BAD - Can return wrong thread from different account
stmt = select(GmailThreadState).where(
GmailThreadState.gmail_thread_id == thread_id
)
# GOOD - Scoped to account
stmt = select(GmailThreadState).where(
GmailThreadState.gmail_account_id == account_id,
GmailThreadState.gmail_thread_id == thread_id
)
See {baseDir}/references/gmail-api.md for Gmail patterns.
4. Gmail API Requires format='raw'
Must use format='raw' when fetching messages to get base64url-encoded RFC 2822 email for stdlib parsing.
# GOOD - Returns raw RFC 2822 email
response = (
self.service.users()
.messages()
.get(userId="me", id=message_id, format="raw")
.execute()
)
Missing or wrong format parameter will cause ValueError in create_gmail_message_from_raw().
See {baseDir}/references/gmail-api.md for details.
5. Tests Cannot Run in Parallel
Tests use shared database and storage bucket. No pytest-xdist support. Each test drops all tables.
# GOOD - Sequential execution
(cd apps/backend && uv run pytest)
# BAD - Will fail with conflicts
(cd apps/backend && uv run pytest -n auto)
See {baseDir}/references/testing.md for test setup.
6. Graceful Shutdown Timeout Must Be Less Than Kill Timeout
Worker graceful_shutdown_timeout (4min 30sec) must be less than Fly.io kill_timeout (5min) or workers will be SIGKILL'ed mid-operation.
Current configuration: Correctly set at 4.5min < 5min.
See {baseDir}/references/fly-deployment.md for deployment config.
7. USE_MOCK_WORKFLOW_TOOLS Must Be False in Production
Production MUST set USE_MOCK_WORKFLOW_TOOLS=false in .production.env or real API calls won't happen. Default is true (mock mode).
# In .production.env
USE_MOCK_WORKFLOW_TOOLS=false
See {baseDir}/references/fly-deployment.md for secrets management.
8. Use workflow.logger in Workflows, structlog in Activities
CRITICAL: Use workflow.logger in workflows, NOT structlog.get_logger().
# In workflows - GOOD
workflow.logger.info(f"Processing: {status}")
# In activities - GOOD
log = structlog.get_logger()
log.info("Activity started", state_id=params.state__id)
See {baseDir}/references/temporal.md for logging patterns.
9. Database Sessions via Context Manager, NOT Depends()
DB sessions use with get_db_session_context(), not FastAPI's Depends(). Auto-commits on success, auto-rollback on exception.
# GOOD - Context manager pattern
with get_db_session_context() as db:
repo = CampaignRepository(db)
campaign = repo.get_by_id(id)
# Auto-commits here
# BAD - Don't use Depends() for DB sessions
async def endpoint(db: Session = Depends(get_db)):
# Not the pattern used in this codebase
See {baseDir}/references/fastapi.md for API patterns.
10. SQLAlchemy Models Cannot Be Passed Through Temporal
Never pass SQLAlchemy models as Temporal workflow/activity parameters. Use Pydantic models or extract scalars.
# BAD - SQLAlchemy model through Temporal
await workflow.execute_activity(process_campaign, campaign) # campaign is SQLAlchemy model
# GOOD - Pydantic model
campaign_dto = CampaignDto.model_validate(campaign)
await workflow.execute_activity(process_campaign, campaign_dto)
# ALSO GOOD - Extract scalars
campaign__id = campaign.id
campaign__goal = campaign.goal
await workflow.execute_activity(process_campaign, campaign__id, campaign__goal)
See {baseDir}/references/temporal.md and {baseDir}/references/sqlalchemy.md.
Pre-Flight Checklists
Before Writing Temporal Workflow
- Workflow must be deterministic (no random, no
datetime.now(), no direct I/O) - All I/O operations in activities, not workflows
- Set
retry_policyon ALL activity executions - Use Pydantic models for inputs/outputs (never SQLAlchemy models)
- Use
workflow.loggerfor logging (not structlog) - Set timeouts on activity executions
- Activities must be idempotent (safe to retry)
Before Database Operations
- Use SQLAlchemy 2.0 Core (not ORM methods like
.query()) - Extract scalars before long operations:
campaign__goal = campaign.goal - Use
with get_db_session_context()for auto-commit/rollback - Never pass SQLAlchemy models through Temporal
- Use
on_conflict_do_nothing()for idempotent inserts - Never hold session during LLM/API calls
Before Deployment
- Verify
USE_MOCK_WORKFLOW_TOOLS=falsein.production.env - Check graceful shutdown timeout < kill timeout (4.5min < 5min)
- Verify all required secrets in
.production.envor.staging.env - Confirm encryption keys are exact hex length (64 and 32 chars)
- Test health check endpoint returns correct environment
Before Gmail API Integration
- Always scope
gmail_thread_idqueries bygmail_account_id - Use
format='raw'for message fetching - Implement Gmail email normalization (dots, plus addressing, googlemail.com)
- Handle Optional returns (drafts return None from processor)
- Use idempotent operations:
ON CONFLICT DO NOTHING - Test with actual API call (not just
credentials.refresh())
Quick Fixes for Common Issues
"Session is already closed" Error
# WRONG - Session closed after context exit
with get_db_session_context() as db:
campaign = CampaignRepository(db).get_by_id(id)
# Session closed!
llm_result = llm_service.generate(campaign.goal) # Error: Session closed
# CORRECT - Extract scalar before session closes
with get_db_session_context() as db:
campaign = CampaignRepository(db).get_by_id(id)
campaign__goal = campaign.goal # Extract to local variable
# Session closed
llm_result = llm_service.generate(campaign__goal) # Works!
Temporal Activity Retries Forever
# WRONG - No retry limit, will retry forever
await workflow.execute_activity(
check_is_latest_activity,
state_id,
start_to_close_timeout=timedelta(seconds=30),
)
# CORRECT - Explicit retry limit
await workflow.execute_activity(
check_is_latest_activity,
state_id,
start_to_close_timeout=timedelta(seconds=30),
retry_policy=RetryPolicy(maximum_attempts=1),
)
Wrong Gmail Thread Returned
# WRONG - gmail_thread_id not scoped to account
def get_latest_by_gmail_thread_id(self, gmail_thread_id: str):
stmt = select(GmailThreadState).where(
GmailThreadState.gmail_thread_id == gmail_thread_id
)
return self.db.execute(stmt).scalar_one_or_none()
# CORRECT - Scope by gmail_account_id
def get_latest_by_gmail_thread_id(
self, gmail_account_id: uuid.UUID, gmail_thread_id: str
):
stmt = select(GmailThreadState).where(
GmailThreadState.gmail_account_id == gmail_account_id,
GmailThreadState.gmail_thread_id == gmail_thread_id
)
return self.db.execute(stmt).scalar_one_or_none()
Tests Failing Due to Missing auth.users
# WRONG - Insert UserGmailAccount without auth.users entry
with get_db_session_context() as db:
account = UserGmailAccount(user_id=user_id, ...)
db.add(account)
# Fails: Foreign key constraint on auth.users
# CORRECT - Insert into auth.users first
with engine.connect() as conn:
conn.execute(text("""
INSERT INTO auth.users (id, email)
VALUES (:user_id, :email)
ON CONFLICT DO NOTHING
"""), {"user_id": str(user_id), "email": test_email})
conn.commit()
with get_db_session_context() as db:
account = UserGmailAccount(user_id=user_id, ...)
db.add(account)
Mock Tools Enabled in Production
# Check if mock tools are enabled
fly ssh console -a prd-cheerful
echo $USE_MOCK_WORKFLOW_TOOLS # Should output: false
# If it outputs "true" or empty, update secrets:
# In .production.env:
USE_MOCK_WORKFLOW_TOOLS=false
# Apply secrets (will trigger deployment):
flyctl secrets import --app prd-cheerful < ./infra/prd/.production.env
Core Patterns by Technology
Temporal.io
Durable workflow orchestration for long-running operations (email processing, LLM calls).
Key Files:
- Worker setup:
apps/backend/src/temporal/worker.py - Workflows:
apps/backend/src/temporal/workflow/(all files end with_workflow.py) - Activities:
apps/backend/src/temporal/activity/(all files end with_activity.py) - Models:
apps/backend/src/models/temporal/(Pydantic only)
Full guide: {baseDir}/references/temporal.md
SQLAlchemy 2.0
Database access using SQLAlchemy 2.0 Core (not ORM). Short-lived sessions with auto-commit/rollback.
Key Files:
- Database config:
apps/backend/src/core/database.py - Models:
apps/backend/src/models/database/ - Repositories:
apps/backend/src/repositories/
Full guide: {baseDir}/references/sqlalchemy.md
Gmail API
Email fetching, thread reconstruction, and state management with event-sourced versioning.
Key Files:
- Gmail service:
apps/backend/src/services/external/gmail.py - Email processor:
apps/backend/src/services/email/processor.py - Thread state repository:
apps/backend/src/repositories/gmail_thread_state.py
Full guide: {baseDir}/references/gmail-api.md
Google Sheets
Recipient data import and metrics export with rate-limit protection via separate Temporal task queue.
Key Files:
- Sheets service:
apps/backend/src/services/external/gsheet.py - Metrics activity:
apps/backend/src/temporal/activity/thread_metrics_activity.py
Full guide: {baseDir}/references/google-sheets.md
Testing
Pytest with shared database (no parallel execution). Integration tests use real services, unit tests use mocks.
Key Files:
- Test config:
apps/backend/tests/conftest.py - Test directory:
apps/backend/tests/
Full guide: {baseDir}/references/testing.md
Fly.io Deployment
Process groups (web + worker) with shared secrets. Staging and production environments.
Key Files:
- Staging config:
infra/stg/fly.toml - Production config:
infra/prd/fly.toml - Deploy scripts:
infra/stg/deploy.sh,infra/prd/deploy.sh
Full guide: {baseDir}/references/fly-deployment.md
FastAPI
API routes with JWT authentication, context-managed DB sessions, and Pydantic request/response models.
Key Files:
- Main app:
apps/backend/main.py - Routes:
apps/backend/src/api/route/ - Auth dependencies:
apps/backend/src/api/dependencies/auth.py
Full guide: {baseDir}/references/fastapi.md
Supabase
PostgreSQL with Row Level Security (RLS), Storage for email content, and Auth with JWT verification.
Key Files:
- Database config:
apps/backend/src/core/database.py - Storage service:
apps/backend/src/services/storage/storage.py - Migrations:
supabase/migrations/
Full guide: {baseDir}/references/supabase.md
Architecture Insights
For comprehensive architectural patterns see:
- Dual High Water Mark Pattern:
{baseDir}/references/architecture-patterns.md - Event-sourced state management:
{baseDir}/references/architecture-patterns.md - Idempotent operations:
{baseDir}/references/architecture-patterns.md - AI Features as Pure Functions:
{baseDir}/references/architecture-patterns.md - DTO Pattern:
{baseDir}/references/sqlalchemy.md - Service Organization:
{baseDir}/references/architecture-patterns.md
Testing Commands
# Prerequisites: Start local dev cluster
./infra/dev.sh start --fresh
# Check services are running
docker ps
# Run all tests (from repo root)
(cd apps/backend && uv run pytest)
# Run specific test file
(cd apps/backend && uv run pytest tests/repositories/test_campaign.py)
# Run specific test function
(cd apps/backend && uv run pytest tests/repositories/test_campaign.py::TestCampaignRepository::test_empty_database)
# Skip integration tests (faster)
(cd apps/backend && uv run pytest -m "not integration")
# Only integration tests (requires API keys)
(cd apps/backend && uv run pytest -m integration)
# Verbose output
(cd apps/backend && uv run pytest -v)
Deployment Commands
# Check deployment status
fly status -a prd-cheerful
fly status -a stg-cheerful
# View logs
fly logs -a prd-cheerful # All processes
fly logs -a prd-cheerful --process web # Web only
fly logs -a prd-cheerful --process worker # Worker only
# Deploy staging
./infra/stg/deploy.sh
# Deploy production
./infra/prd/deploy.sh
# Preview secrets (doesn't apply)
flyctl secrets import --stage --app prd-cheerful < ./infra/prd/.production.env
# Apply secrets (triggers deployment!)
flyctl secrets import --app prd-cheerful < ./infra/prd/.production.env
# List current secrets
fly secrets list -a prd-cheerful
# Scale processes
fly scale count web=2 worker=3 -a prd-cheerful
# SSH into machine
fly ssh console -a prd-cheerful
# Health check
curl https://prd-cheerful.fly.dev/health