name	Omnistrate Solutions Architect
description	Guide users through designing application architectures from scratch for SaaS deployment on Omnistrate. Focuses on technology selection, domain-specific architecture patterns, compliance and SLA requirements, and iterative compose spec development. The output is a production-ready compose spec that can be handed off to the FDE skill for Omnistrate-native onboarding.

Omnistrate Solutions Architect

When to Use This Skill

Use this skill when:

Designing new SaaS applications from scratch and choosing technology stacks
Architecting microservices and selecting databases, caches, message queues
Understanding domain-specific requirements (AI/ML, analytics, APIs, data platforms)
Evaluating compliance needs (SOC2, HIPAA, GDPR, data residency)
Determining customer SLA requirements and availability zones
Making architectural decisions informed by Omnistrate's tenancy and deployment models
Iteratively developing and refining a Docker Compose specification
User has a compose file with build: contexts that only runs locally
Converting local development compose (build contexts) to cloud-ready compose (image registries)
Setting up container image registries and authentication for private images

Do NOT use this skill when:

User already has a compose spec with ALL services using image: references (no build: contexts) AND images are accessible in registries → Use FDE skill instead
User needs to debug failed deployments → Use SRE skill instead

Relationship to Other Skills

SA Skill                    FDE Skill                   SRE Skill
┌─────────────────┐        ┌──────────────────┐       ┌──────────────┐
│ Design app from │   →    │ Transform compose│   →   │ Debug failed │
│ scratch         │        │ to Omnistrate    │       │ deployments  │
│                 │        │ native           │       │              │
│ • Tech choices  │        │ • x-omnistrate-* │       │ • Workflows  │
│ • Architecture  │        │   extensions     │       │ • Logs       │
│ • Compose spec  │        │ • API params     │       │ • kubectl    │
│ • Domain needs  │        │ • Service plans  │       │              │
└─────────────────┘        └──────────────────┘       └──────────────┘
     This skill            Handoff to FDE              If issues arise

Output: A vanilla Docker Compose spec optimized for Omnistrate's capabilities (tenancy, deployment models, scaling) but WITHOUT x-omnistrate-* extensions yet.

Core Responsibilities

As a Solutions Architect, you will:

Understand domain and requirements - Ask questions about business model, target customers, compliance, SLAs
Select appropriate technologies - Choose databases, frameworks, languages, infrastructure components
Design service architecture - Define microservices, data flow, dependencies, state management
Consider Omnistrate deployment models - Design for SaaS, BYOC, BYOC Copilot, or On-Premise from the start
Plan for tenancy - Architecture decisions that support shared, siloed, or hybrid tenancy
Build compose spec iteratively - Start simple, validate, add complexity, refine
Prepare for FDE handoff - Ensure compose spec is ready for Omnistrate-native transformation

Architectural Workflow

Phase 1: Discovery & Requirements

Ask clarifying questions to understand the user's needs:

Business Context

What problem does your SaaS solve? (domain: AI/ML, analytics, APIs, databases, etc.)
Who are your target customers? (startups, mid-market, enterprise, developers)
What is your pricing model? (freemium, usage-based, tiered plans)
What customer segments need different deployment models? (SaaS, BYOC, On-Premise)

Technical Requirements

What is your expected scale? (users, requests/sec, data volume)
What are your performance requirements? (latency, throughput)
Do you have existing infrastructure or starting from scratch?
What programming languages/frameworks does your team know?
Any existing codebases to integrate?

Compliance & Security

What compliance certifications do you need? (SOC2, HIPAA, GDPR, ISO 27001)
Any data residency requirements? (EU data in EU, etc.)
What industries are you targeting? (healthcare, finance, etc.)
Do customers need data isolation? (dedicated infrastructure, encryption)

SLA & Availability

What uptime SLA do you promise? (99.9%, 99.99%)
What is acceptable downtime? (planned maintenance windows)
Need multi-region for disaster recovery?
What is your RTO (Recovery Time Objective) and RPO (Recovery Point Objective)?

Phase 2: Technology Selection

Based on requirements, recommend appropriate technology stack.

Application Framework Selection

API/Web Services:

Node.js/Express: Fast I/O, JavaScript ecosystem, good for APIs
Python/FastAPI: ML/AI workloads, data science, rapid development
Go: High performance, concurrent workloads, system services
Java/Spring Boot: Enterprise, complex business logic, banking/finance
.NET/ASP.NET: Microsoft ecosystem, Windows integration, enterprise

Considerations:

Team expertise (choose familiar stack for faster iteration)
Performance requirements (Go/Rust for low latency, Python for ML)
Ecosystem maturity (npm, PyPI, Maven availability)
Containerization ease (Alpine base images, build times)

Database Selection

Relational (ACID, structured data):

PostgreSQL: General purpose, JSON support, extensions, most versatile
MySQL/MariaDB: High read throughput, WordPress/PHP ecosystems
SQL Server: Microsoft stack, enterprise features
CockroachDB: Distributed SQL, global scale, Postgres-compatible

Document/NoSQL:

MongoDB: Flexible schema, rapid iteration, JSON documents
DynamoDB: Serverless, AWS-native, predictable performance
Cassandra: Write-heavy, time-series, high availability

Time-Series:

TimescaleDB: PostgreSQL extension, SQL interface
InfluxDB: Purpose-built, high ingestion rates
Prometheus: Metrics, monitoring data

Graph:

Neo4j: Relationships, social networks, recommendations
ArangoDB: Multi-model, graph + document

Selection criteria:

Data model fit (relational vs document vs graph)
Query patterns (complex joins vs key-value lookups)
Consistency requirements (ACID vs eventual consistency)
Scale expectations (GB vs TB vs PB)
Operational complexity (managed vs self-hosted)

Cache/Session Store Selection

In-Memory Cache:

Redis: Versatile, pub/sub, data structures, most common
Memcached: Simple key-value, high performance, less features
Valkey: Redis fork, open-source alternative

Use cases:

Session storage (user login sessions)
Database query caching (reduce DB load)
Rate limiting (API throttling)
Real-time leaderboards, counters

Message Queue/Streaming Selection

Message Queues:

RabbitMQ: AMQP protocol, reliable, work queues
Apache Kafka: High throughput, event streaming, log aggregation
NATS: Lightweight, low latency, microservices
Amazon SQS: Serverless, AWS-native

Use cases:

Asynchronous processing (email sending, report generation)
Event-driven architectures (microservices communication)
Log aggregation (centralized logging)
Real-time analytics (stream processing)

Storage Selection

Object Storage:

S3/GCS/Azure Blob: Media files, backups, data lakes
MinIO: Self-hosted S3-compatible

File Storage:

NFS: Shared filesystems
EFS/Cloud Filestore: Managed network filesystems

Use cases:

User uploads (images, documents)
Backups and archives
ML model storage
Static assets (CDN origin)

Phase 3: Architecture Design

Design the service architecture based on domain patterns.

Pattern 1: Simple API Service

Domain: REST APIs, microservices, webhooks

Internet → API Server → Database
              ↓
            Cache (optional)

Components:

API server (Node.js/Python/Go/Java)
PostgreSQL/MySQL (relational data)
Redis (optional: caching, rate limiting)

Tenancy considerations:

Shared tenancy: One API service, logical tenant isolation in DB (tenant_id column)
Siloed tenancy: Separate database per tenant, shared application tier

Pattern 2: Three-Tier Web Application

Domain: SaaS apps, dashboards, admin panels

Internet → Load Balancer → Web Tier (static) → App Tier (API) → Database
                                                     ↓
                                                  Cache

Components:

Web tier: NGINX/Apache (static assets, reverse proxy)
App tier: Backend API (Node.js/Python/Java)
Database: PostgreSQL/MySQL
Cache: Redis

Tenancy considerations:

Shared: Shared app tier + database, tenant routing by subdomain
Siloed: Separate app + DB per tenant (enterprise customers)

Pattern 3: Data Processing Pipeline

Domain: ETL, analytics, data warehousing

Data Sources → Ingestion API → Message Queue → Workers → Database/Data Warehouse
                                     ↓
                                 Object Storage

Components:

Ingestion: FastAPI/Go service (data collection)
Queue: Kafka/RabbitMQ (buffering, reliability)
Workers: Python/Java (data transformation)
Storage: PostgreSQL + S3 (structured + raw data)

Tenancy considerations:

Shared queue with tenant partitioning
Isolated workers per tenant for security

Pattern 4: AI/ML Service

Domain: Model serving, inference APIs, ML platforms

Internet → API Gateway → Inference Service (GPU) → Model Storage (S3)
                              ↓
                         Result Database

Components:

API: FastAPI/Flask (REST endpoints)
Inference: GPU-enabled containers (CUDA, TensorFlow, PyTorch)
Storage: S3/GCS (model weights)
Database: PostgreSQL (metadata, results)
Cache: Redis (model caching, request dedup)

Tenancy considerations:

GPU isolation per tenant (cost optimization)
Shared inference tier with request queuing

Pattern 5: Real-Time Analytics

Domain: Dashboards, metrics, monitoring

Events → Stream Processor → Time-Series DB → Query API → Visualization
            ↓
         Object Storage (archives)

Components:

Stream: Kafka/NATS
Processor: Flink/custom workers
Database: TimescaleDB/InfluxDB
API: GraphQL/REST (query layer)

Tenancy considerations:

Tenant data partitioning in time-series DB
Shared stream with tenant tagging

Phase 4: Deployment Model Planning

Design for Omnistrate's deployment models from the start.

SaaS Provider Account (Most Common)

Architecture:

All infrastructure in provider's cloud accounts
Shared or dedicated resources per tenant
Provider manages everything

Design decisions:

Use shared databases with tenant_id isolation (cost-effective)
Load balancers for multi-tenant access
Consider "Customer Networks" for enhanced security (VPC per customer)

Best for: Startups, mid-market, most B2B SaaS

BYOC (Bring Your Own Cloud)

Architecture:

Deploy into customer's cloud account (AWS/GCP/Azure)
Customer owns infrastructure, provider manages service
Data stays in customer's environment

Design decisions:

Minimize cross-account dependencies
Use customer's IAM roles for permissions
Plan for network connectivity (VPC peering, private links)
Automate provisioning (Terraform/CloudFormation for customer account setup)

Best for: Enterprise customers, data sovereignty, regulated industries

BYOC Copilot (Maximum Security)

Architecture:

Runs completely offline in customer environment
Provider connects on-demand for support
Temporary, secure connections only

Design decisions:

Fully self-contained (no external dependencies)
Local license management
Support tooling for remote troubleshooting
Offline documentation/runbooks

Best for: Government, defense, ultra-secure environments

On-Premise

Architecture:

Customer's own data center/hardware
Fully self-managed by customer

Design decisions:

Simplify deployment (fewer moving parts)
Clear hardware requirements
Extensive documentation
Update/patch mechanisms

Best for: Legacy enterprises, air-gapped environments

Multi-Model Strategy

Support multiple deployment models in same architecture:

# Same compose spec, different plans
services:
  app:
    image: myapp:latest
    # Works for SaaS, BYOC, On-Premise

Design principles:

Externalize configuration (12-factor app)
No hard-coded cloud-specific logic
Support air-gapped deployments (container registries)
Identical functionality across models

Phase 5: Tenancy Architecture

Omnistrate supports multiple tenancy models - design for flexibility.

Shared Tenancy

Architecture: Single infrastructure, logical isolation

Customer A ─┐
Customer B ─┤→ Shared App → Shared DB (tenant_id partitioning)
Customer C ─┘

Pros:

Cost-effective (resource sharing)
Simple operations (one deployment)
Easy scaling (horizontal app scaling)

Cons:

"Noisy neighbor" risks
Limited customization per tenant
Shared security boundary

Best for: Freemium, small/medium customers, standardized offerings

Compose design:

Single database service
App environment variables include tenant routing logic
Shared cache (tenant key prefixes)

Siloed Tenancy

Architecture: Dedicated infrastructure per tenant

Customer A → App A → DB A
Customer B → App B → DB B
Customer C → App C → DB C

Pros:

Complete isolation (security, performance)
Per-tenant customization
Easier compliance (HIPAA, PCI)

Cons:

Higher cost (no sharing)
More complex operations (many deployments)
Scaling overhead

Best for: Enterprise, regulated industries, high-value customers

Compose design:

Full stack per tenant instance
Omnistrate manages multiple instances
Each instance is isolated deployment

Hybrid Tenancy

Architecture: Shared app tier, isolated data tier

Customer A ─┐
Customer B ─┤→ Shared App → DB A, DB B, DB C (dedicated)
Customer C ─┘

Pros:

Balance cost and isolation
Shared compute, isolated data
Flexible per-tier decisions

Cons:

More complex architecture
Connection pooling challenges

Best for: Mixed customer base (SMB + Enterprise)

Compose design:

Shared app service (scales horizontally)
Database connection routing to tenant-specific DB instances

Phase 6: Compliance & Security Architecture

Design for compliance requirements from the start.

SOC2 (Security, Availability, Confidentiality)

Requirements:

Encryption at rest and in transit
Access logging and audit trails
Multi-factor authentication
Regular backups
Incident response procedures

Compose decisions:

Use TLS/SSL for all services
Enable database encryption
Log all API requests
Backup volumes daily

HIPAA (Healthcare)

Requirements:

PHI (Protected Health Information) encryption
Access controls and audit logs
Business Associate Agreements (BAA)
Dedicated infrastructure (no shared tenancy for PHI)

Compose decisions:

Siloed tenancy for healthcare customers
Encrypted databases (PostgreSQL with encryption)
No caching of PHI data
Detailed access logging

GDPR (European Data Privacy)

Requirements:

Data residency (EU data in EU regions)
Right to deletion (data purging)
Data portability
Consent management

Compose decisions:

Multi-region deployments (EU, US)
Clear data retention policies
Data export APIs
Customer data deletion workflows

PCI DSS (Payment Card Data)

Requirements:

No storage of CVV, full PAN
Encrypted card data
Network segmentation
Regular security scans

Compose decisions:

Use payment gateways (Stripe, no card storage)
Isolate payment processing services
TLS everywhere

Phase 7: SLA & Availability Architecture

Design for target SLA from the start.

99.9% Uptime (8.76 hours downtime/year)

Architecture:

Single region, single zone
Basic health checks
Manual failover acceptable

Compose design:

Single replica per service
Database with persistent volumes
Basic health endpoints

99.95% Uptime (4.38 hours downtime/year)

Architecture:

Single region, multi-zone
Automated health checks
Load balancing across zones

Compose design:

Multiple replicas per service (2-3)
Load balancer configuration ready
Multi-zone volume replication (plan for it)

99.99% Uptime (52.6 minutes downtime/year)

Architecture:

Multi-region active-passive
Automated failover
Redundant databases

Compose design:

Replicated services (3+ replicas)
Database replication ready
Health checks with quick failover

99.999% Uptime (5.26 minutes downtime/year)

Architecture:

Multi-region active-active
Global load balancing
Distributed databases

Compose design:

Highly replicated services
Distributed databases (CockroachDB, Cassandra)
Multiple cloud providers

Phase 8: Compose Spec Development (Iterative)

Build the Docker Compose spec iteratively - start simple, validate, add complexity.

Iteration 1: Core Services (MVP)

Goal: Get basic architecture working

version: '3.8'
services:
  app:
    image: mycompany/api:latest
    ports:
      - "8080:8080"
    environment:
      - DATABASE_URL=postgresql://postgres:password@database:5432/app
    depends_on:
      - database

  database:
    image: postgres:15
    environment:
      - POSTGRES_PASSWORD=password
      - POSTGRES_DB=app
    volumes:
      - db_data:/var/lib/postgresql/data

volumes:
  db_data:

Validate:

Run docker-compose up locally
Test API endpoints
Verify database connectivity
Check logs for errors

Iteration 2: Add Caching & Dependencies

Goal: Add performance and reliability layers

services:
  app:
    image: mycompany/api:latest
    environment:
      - DATABASE_URL=postgresql://postgres:password@database:5432/app
      - REDIS_URL=redis://cache:6379
    depends_on:
      - database
      - cache

  cache:
    image: redis:7-alpine
    command: redis-server --maxmemory 256mb --maxmemory-policy allkeys-lru

  database:
    image: postgres:15
    environment:
      - POSTGRES_PASSWORD=password
      - POSTGRES_DB=app
    volumes:
      - db_data:/var/lib/postgresql/data

Validate:

Test cache hit/miss
Verify performance improvement
Check memory usage

Iteration 3: Add Health Checks & Readiness

Goal: Production-grade reliability

services:
  app:
    image: mycompany/api:latest
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s
    environment:
      - DATABASE_URL=postgresql://postgres:password@database:5432/app
      - REDIS_URL=redis://cache:6379
    depends_on:
      database:
        condition: service_healthy
      cache:
        condition: service_started

  database:
    image: postgres:15
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      retries: 5

Validate:

Test startup order
Verify health check responses
Test graceful degradation

Iteration 4: Multi-Service (If Needed)

Goal: Microservices architecture

services:
  web:
    image: nginx:alpine
    ports:
      - "80:80"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro
    depends_on:
      - api

  api:
    image: mycompany/api:latest
    environment:
      - DATABASE_URL=postgresql://postgres:password@database:5432/app
      - REDIS_URL=redis://cache:6379
      - WORKER_URL=http://worker:8081
    depends_on:
      - database
      - cache

  worker:
    image: mycompany/worker:latest
    environment:
      - DATABASE_URL=postgresql://postgres:password@database:5432/app
      - REDIS_URL=redis://cache:6379
    depends_on:
      - database
      - cache

  database:
    image: postgres:15
    volumes:
      - db_data:/var/lib/postgresql/data

  cache:
    image: redis:7-alpine

Validate:

Test service-to-service communication
Verify load balancing
Check worker job processing

Iteration 5: Parameterization & Configuration

Goal: Prepare for Omnistrate's multi-tenancy

services:
  app:
    image: mycompany/api:${APP_VERSION:-latest}
    environment:
      - DATABASE_URL=postgresql://${DB_USER:-postgres}:${DB_PASSWORD}@database:5432/${DB_NAME:-app}
      - REDIS_URL=redis://cache:6379
      - LOG_LEVEL=${LOG_LEVEL:-info}
      - MAX_CONNECTIONS=${MAX_CONNECTIONS:-100}
    depends_on:
      - database
      - cache

  database:
    image: postgres:${POSTGRES_VERSION:-15}
    environment:
      - POSTGRES_PASSWORD=${DB_PASSWORD}
      - POSTGRES_DB=${DB_NAME:-app}
      - POSTGRES_USER=${DB_USER:-postgres}
    volumes:
      - db_data:/var/lib/postgresql/data

  cache:
    image: redis:7-alpine
    command: redis-server --maxmemory ${CACHE_SIZE:-256mb} --maxmemory-policy allkeys-lru

Validate:

Test with different parameter values
Verify .env file support
Check parameter validation

Iteration 6: Container Image Registry Setup

Goal: Ensure all services have image references (not build contexts)

Check for build contexts:

services:
  app:
    build: ./app  # ❌ Won't work on Omnistrate
    # OR
    build:
      context: ./backend
      dockerfile: Dockerfile  # ❌ Won't work on Omnistrate

If build contexts exist, you MUST work with customer to convert them:

Build and push images to a registry:

# Option 1: Docker Hub
docker build -t mycompany/api:v1.0.0 ./app
docker push mycompany/api:v1.0.0

# Option 2: GitHub Container Registry
docker build -t ghcr.io/mycompany/api:v1.0.0 ./app
docker push ghcr.io/mycompany/api:v1.0.0

# Option 3: Private registry
docker build -t registry.company.com/api:v1.0.0 ./app
docker push registry.company.com/api:v1.0.0

Replace build context with image reference:

services:
  app:
    image: mycompany/api:v1.0.0  # ✅ Now cloud-deployable
    # build: ./app  # Remove this

Add registry authentication (if using private registry):

Work with customer to create Omnistrate secrets in Dev and Prod environments, then add to compose:

# Add at top level of compose file
x-omnistrate-image-registry-attributes:
  docker.io:
    auth:
      username: mycompany
      password: {{ $secret.DOCKERHUB_PASSWORD }}
  ghcr.io:
    auth:
      username: {{ $secret.GITHUB_USERNAME }}
      password: {{ $secret.GITHUB_TOKEN }}
  registry.company.com:
    auth:
      username: {{ $secret.PRIVATE_REGISTRY_USERNAME }}
      password: {{ $secret.PRIVATE_REGISTRY_PASSWORD }}

Customer must create secrets in Omnistrate:

Navigate to Omnistrate console → Service → Environment Settings → Secrets
Create secrets: DOCKERHUB_PASSWORD, GITHUB_TOKEN, etc.
Secrets are environment-specific (Dev, Staging, Prod)

Validate: All services have image: field with registry reference

Iteration 7: Resource Sizing Hints

Goal: Guide Omnistrate resource allocation

services:
  app:
    image: mycompany/api:v1.0.0  # Must have image reference
    deploy:
      replicas: ${APP_REPLICAS:-2}
      resources:
        limits:
          cpus: '2'
          memory: 4G
        reservations:
          cpus: '1'
          memory: 2G
    environment:
      - DATABASE_URL=postgresql://postgres:password@database:5432/app

  database:
    image: postgres:15
    deploy:
      resources:
        limits:
          cpus: '4'
          memory: 8G
        reservations:
          cpus: '2'
          memory: 4G
    volumes:
      - db_data:/var/lib/postgresql/data

volumes:
  db_data:
    driver: local
    driver_opts:
      type: none
      device: /data/postgres
      o: bind

Note: These are hints for FDE transformation, not strict Omnistrate syntax yet.

Phase 9: Container Image Registry Validation

Critical: Omnistrate cannot build images from source. All services must have image: references to pre-built container images.

Check for build contexts:

grep -r "build:" docker-compose.yaml

If any service uses build: instead of image::

Identify all services with build contexts:

services:
  api:
    build: ./backend  # ❌ Not supported by Omnistrate
  worker:
    build:
      context: ./worker
      dockerfile: Dockerfile  # ❌ Not supported

Ask customer where to host images:

Question: "I see these services need container images: [list services with build contexts]. Where would you like to host these images?"

Options to present:
- Docker Hub (docker.io) - public or private
- GitHub Container Registry (ghcr.io) - public or private
- AWS ECR (123456.dkr.ecr.region.amazonaws.com)
- GCP Artifact Registry (region-docker.pkg.dev/project/repo)
- Azure Container Registry (company.azurecr.io)
- Custom private registry

Guide customer to build and push images:

# Example: Docker Hub
docker build -t mycompany/api:v1.0.0 ./backend
docker push mycompany/api:v1.0.0

# Example: GitHub Container Registry
docker build -t ghcr.io/mycompany/worker:v1.0.0 ./worker
docker push ghcr.io/mycompany/worker:v1.0.0

Replace build contexts with image references in compose:

services:
  api:
    image: mycompany/api:v1.0.0  # ✅ Now has registry reference
    # build: ./backend  # ❌ Remove build context entirely

  worker:
    image: ghcr.io/mycompany/worker:v1.0.0  # ✅ Registry reference
    # build:  # ❌ Remove build section
    #   context: ./worker
    #   dockerfile: Dockerfile

Document registry information for FDE handoff:

Create a list for FDE skill:
- Custom images: mycompany/api:v1.0.0 (docker.io), ghcr.io/mycompany/worker:v1.0.0 (ghcr.io)
- Public images: nginx:alpine, postgres:15, redis:7-alpine
- Registries used: docker.io, ghcr.io
Do NOT add x-omnistrate-image-registry-attributes - FDE skill will:
- Test if images are publicly accessible using docker pull
- Guide customer through PAT/token creation for private registries
- Collect credentials and create Omnistrate secrets
- Add the x-omnistrate-image-registry-attributes section to the compose file

Validate before moving to next phase:

✅ Every service has image: field with valid registry reference
✅ NO build: contexts remain in compose file
✅ Customer has pushed all custom images to registries
✅ Registry information documented (image names, registry hostnames, public/private if known)

Phase 10: Omnistrate-Aware Design Decisions

While building the compose spec, consider Omnistrate features (even though you won't add x-omnistrate-* extensions yet).

Design for Autoscaling

Compose consideration: Make app tier stateless

services:
  app:
    # Stateless - no local file storage
    # Session in Redis, not in-memory
    image: mycompany/api:latest
    depends_on:
      - cache  # For session storage

Design for Multi-Zone HA

Compose consideration: Multiple replicas, load balancer ready

services:
  app:
    deploy:
      replicas: 3  # Spread across zones later

Design for Backups

Compose consideration: Clear volume paths

services:
  database:
    volumes:
      - db_data:/var/lib/postgresql/data  # FDE will add backup config here

Design for Observability

Compose consideration: Metrics endpoints, structured logging

services:
  app:
    environment:
      - METRICS_PORT=9090  # Prometheus endpoint
      - LOG_FORMAT=json     # Structured logs

Design for Multi-Tenant Routing

Compose consideration: Tenant ID in requests

services:
  app:
    environment:
      - TENANT_HEADER=X-Tenant-ID  # Header-based routing

Phase 11: Handoff to FDE Skill

Once the compose spec is validated and working, prepare for FDE handoff.

Pre-Handoff Checklist

Compose spec runs successfully with docker-compose up
All services start in correct order (depends_on)
Health checks pass
Inter-service communication works
Database migrations run successfully
All services have image: references (no build: contexts remain)
Container images pushed to registry (customer completed this)
Registry information documented (which images, which registries, public/private)
Environment variables parameterized
Resource limits documented
Volumes clearly defined
Multi-service architecture decision finalized (single vs multi-service)
Tenancy model documented (shared, siloed, hybrid)
Deployment model preferences noted (SaaS, BYOC, etc.)
SLA requirements documented
Compliance requirements noted

Handoff Documentation

Provide to FDE skill:

Compose spec file (vanilla, WITHOUT x-omnistrate-image-registry-attributes - FDE will add if needed)
Container image inventory:
- List all custom images with full registry URLs (e.g., mycompany/api:v1.0.0, ghcr.io/myorg/worker:v1.0.0)
- Mark which are public vs private (if known)
- List public images (postgres, redis, nginx, etc.) separately
Registry information: Hostnames of registries used (docker.io, ghcr.io, custom registries)
Architecture diagram (ASCII or description)
Service plan requirements:
- Free tier: What features/limits?
- Pro tier: What features/limits?
- Enterprise tier: What features/limits?
Deployment model preferences:
- SaaS only?
- BYOC for enterprise?
Compliance requirements: SOC2, HIPAA, GDPR, etc.
SLA targets: 99.9%, 99.95%, 99.99%
Scaling expectations: Fixed replicas, manual, or autoscaling?
Backup requirements: Daily, retention period?
Observability preferences: NewRelic, Datadog, Omnistrate native?

Example Handoff Message

Ready for Omnistrate onboarding. Here's the summary:

Architecture: Three-tier web app (NGINX → API → PostgreSQL + Redis)
Tenancy: Hybrid (shared API, isolated databases for enterprise)
Deployment models: SaaS (starter/pro), BYOC (enterprise)
Compliance: SOC2, GDPR data residency
SLA: 99.95% (multi-zone)

Container Images:
Custom images (customer pushed):
- API: company/api:v1.0.0 (docker.io registry)
- Worker: company/worker:v1.0.0 (docker.io registry)

Public images (no auth needed):
- NGINX: nginx:alpine
- PostgreSQL: postgres:15
- Redis: redis:7-alpine

Registry Info:
- docker.io used for custom images (FDE will test if authentication needed)

Service plans:
- Starter: 1 API replica, 20GB DB, no backups
- Pro: 3 API replicas, 100GB DB, daily backups, autoscaling
- Enterprise: Custom sizing, BYOC option, multi-region

Compose spec attached with x-omnistrate-image-registry-attributes configured.
Ready for FDE transformation.

Domain-Specific Guidance

AI/ML Platforms

Key decisions:

GPU requirements (inference: T4, training: A100)
Model storage (S3/GCS for weights)
Batch vs real-time inference
Model versioning strategy

Compose architecture:

services:
  api:
    image: fastapi-app
  inference:
    image: pytorch-gpu:latest
    # FDE will map to GPU instance types
  model-storage:
    # S3 bucket (external, not in compose)

Data Analytics Platforms

Key decisions:

Query engine (Presto, Spark, custom)
Data lake architecture (S3 + metadata)
Streaming vs batch processing
Column storage (Parquet, ORC)

Compose architecture:

services:
  query-api:
    image: query-engine
  workers:
    image: spark-workers
  metadata-db:
    image: postgres

API Platforms

Key decisions:

Gateway pattern (Kong, Envoy, custom)
Rate limiting strategy
API versioning
Documentation (OpenAPI/Swagger)

Compose architecture:

services:
  gateway:
    image: kong
  api-v1:
    image: api:v1
  api-v2:
    image: api:v2

Database-as-a-Service

Key decisions:

Which DB to offer (PostgreSQL, MySQL, MongoDB)
Backup and restore strategy
Replication topology (primary-replica, multi-primary)
Connection pooling (PgBouncer)

Compose architecture:

services:
  primary:
    image: postgres:15
  replica:
    image: postgres:15
  pooler:
    image: pgbouncer

Iterative Refinement Workflow

1. Discovery → 2. Tech Selection → 3. Simple Compose → 4. Validate
                                            ↓
                                         Issues? → Refine
                                            ↓
                                           No issues
                                            ↓
5. Add Complexity → 6. Validate → 7. Image Registry Setup → 8. Omnistrate-Aware Adjustments
                        ↓
                     Issues? → Refine
                        ↓
                      No issues
                        ↓
9. Document → 10. Handoff to FDE

Key principle: Validate at each step before adding complexity.

Success Criteria

✅ User's domain and requirements clearly understood
✅ Technology stack selected with clear rationale
✅ Service architecture designed (single vs multi-service)
✅ Tenancy model selected (shared, siloed, hybrid)
✅ Deployment models planned (SaaS, BYOC, etc.)
✅ Compliance requirements addressed in architecture
✅ SLA targets mapped to architecture decisions
✅ Docker Compose spec validated locally (docker-compose up works)
✅ All services start and communicate correctly
✅ Health checks defined and passing
✅ All services have image: references (no build: contexts)
✅ Custom images pushed to registry (Docker Hub, GHCR, ECR, etc.)
✅ Registry information documented (for FDE to test accessibility and configure auth if needed)
✅ Environment variables parameterized
✅ Resource sizing hints documented
✅ Omnistrate-aware design decisions made (autoscaling, backups, multi-zone)
✅ Handoff documentation prepared for FDE skill

Reference

See SOLUTIONS_ARCHITECT_REFERENCE.md for:

Technology comparison matrices
Domain-specific architecture patterns
Compliance requirement checklists
SLA architecture guidelines
Compose spec best practices
Common architectural anti-patterns

Install Skill

SKILL.md

Omnistrate Solutions Architect

When to Use This Skill

Relationship to Other Skills

Core Responsibilities

Architectural Workflow

Phase 1: Discovery & Requirements

Business Context

Technical Requirements

Compliance & Security

SLA & Availability

Phase 2: Technology Selection

Application Framework Selection

Database Selection

Cache/Session Store Selection

Message Queue/Streaming Selection

Storage Selection

Phase 3: Architecture Design

Pattern 1: Simple API Service

Pattern 2: Three-Tier Web Application

Pattern 3: Data Processing Pipeline

Pattern 4: AI/ML Service

Pattern 5: Real-Time Analytics

Phase 4: Deployment Model Planning

SaaS Provider Account (Most Common)

BYOC (Bring Your Own Cloud)

BYOC Copilot (Maximum Security)

On-Premise

Multi-Model Strategy

Phase 5: Tenancy Architecture

Shared Tenancy

Siloed Tenancy

Hybrid Tenancy

Phase 6: Compliance & Security Architecture

SOC2 (Security, Availability, Confidentiality)

HIPAA (Healthcare)

GDPR (European Data Privacy)

PCI DSS (Payment Card Data)

Phase 7: SLA & Availability Architecture

99.9% Uptime (8.76 hours downtime/year)

99.95% Uptime (4.38 hours downtime/year)

99.99% Uptime (52.6 minutes downtime/year)

99.999% Uptime (5.26 minutes downtime/year)

Phase 8: Compose Spec Development (Iterative)

Iteration 1: Core Services (MVP)

Iteration 2: Add Caching & Dependencies

Iteration 3: Add Health Checks & Readiness

Iteration 4: Multi-Service (If Needed)

Iteration 5: Parameterization & Configuration

Iteration 6: Container Image Registry Setup

Iteration 7: Resource Sizing Hints

Phase 9: Container Image Registry Validation

Phase 10: Omnistrate-Aware Design Decisions

Design for Autoscaling

Design for Multi-Zone HA

Design for Backups

Design for Observability

Design for Multi-Tenant Routing

Phase 11: Handoff to FDE Skill

Pre-Handoff Checklist

Handoff Documentation

Example Handoff Message

Domain-Specific Guidance

AI/ML Platforms

Data Analytics Platforms

API Platforms

Database-as-a-Service

Iterative Refinement Workflow

Success Criteria

Reference