Microservices Architecture
Comprehensive guide for designing and implementing microservices-based systems.
Microservices Fundamentals
What are Microservices?
MICROSERVICES = Independently deployable services
that do one thing well
Characteristics:
┌─────────────────────────────────────────┐
│ ✓ Single responsibility │
│ ✓ Own their data │
│ ✓ Independently deployable │
│ ✓ Communicate via APIs │
│ ✓ Technology agnostic │
│ ✓ Owned by small teams │
└─────────────────────────────────────────┘
Monolith vs Microservices
MONOLITH:
┌─────────────────────────────────────────┐
│ Single Application │
│ ┌──────┬──────┬──────┬──────┐ │
│ │ Users│Orders│ Cart │Search│ │
│ └──────┴──────┴──────┴──────┘ │
│ Single Database │
└─────────────────────────────────────────┘
MICROSERVICES:
┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐
│Users │ │Orders│ │ Cart │ │Search│
│ DB │ │ DB │ │ DB │ │ DB │
└──────┘ └──────┘ └──────┘ └──────┘
│ │ │ │
└─────────┴─────────┴─────────┘
API Gateway
When to Use Microservices
USE WHEN:
✓ Large, complex domain
✓ Need independent scaling
✓ Multiple teams working in parallel
✓ Different technology needs per service
✓ Fault isolation is critical
✓ Frequent, independent deployments
DON'T USE WHEN:
✗ Small team/application
✗ Simple domain
✗ Tight latency requirements
✗ Limited DevOps maturity
✗ Unclear domain boundaries
Service Design
Domain-Driven Design
BOUNDED CONTEXTS:
┌─────────────────┐ ┌─────────────────┐
│ Orders │ │ Shipping │
│ Context │ │ Context │
│ ┌─────────────┐ │ │ ┌─────────────┐ │
│ │ Order │ │ │ │ Shipment │ │
│ │ LineItem │ │ │ │ Carrier │ │
│ │ Customer(ID)│ │ │ │ Address │ │
│ └─────────────┘ │ │ └─────────────┘ │
└─────────────────┘ └─────────────────┘
Each context has its own:
- Ubiquitous language
- Data model
- Business rules
Service Boundaries
GOOD boundaries follow:
┌─────────────────────────────────────────┐
│ Business Capability │
│ - What the business does │
│ - e.g., Payment Processing, Inventory │
├─────────────────────────────────────────┤
│ Subdomain │
│ - Area of expertise │
│ - e.g., Pricing, Catalog, Customer │
├─────────────────────────────────────────┤
│ Single Responsibility │
│ - Does one thing well │
│ - Can explain in one sentence │
└─────────────────────────────────────────┘
BAD boundaries:
✗ Technical layers (UI service, DB service)
✗ CRUD operations (User CRUD service)
✗ Too granular (EmailSender service)
Service Size Guidelines
Right-sized service:
- 2-pizza team can own it (5-8 people)
- Rewrite in 2-4 weeks if needed
- Clear, single business purpose
- Minimal external dependencies
- Own its data completely
Too big: Multiple teams needed, mixed concerns
Too small: Can't function independently
Communication Patterns
Synchronous (Request/Response)
REST:
┌──────┐ HTTP GET /users/123 ┌──────┐
│Client│ ─────────────────────>│Server│
│ │<───────────────────── │ │
└──────┘ { "name": "John" } └──────┘
gRPC:
┌──────┐ Binary/Protobuf ┌──────┐
│Client│ ─────────────────────>│Server│
│ │<───────────────────── │ │
└──────┘ Strongly typed └──────┘
GraphQL:
┌──────┐ POST /graphql ┌──────┐
│Client│ ─────────────────────>│Server│
│ │<───────────────────── │ │
└──────┘ Flexible queries └──────┘
Asynchronous (Event-Driven)
MESSAGE QUEUE:
┌────────┐ ┌─────────┐ ┌────────┐
│Producer│ ───> │ Queue │ ───> │Consumer│
└────────┘ │(RabbitMQ│ └────────┘
│ SQS) │
└─────────┘
EVENT STREAMING:
┌────────┐ ┌─────────┐ ┌────────┐
│Producer│ ───> │ Topic │ ───> │Consumer│
└────────┘ │ (Kafka) │ ───> │Consumer│
└─────────┘ ───> │Consumer│
└────────┘
Communication Comparison
| Pattern |
Use Case |
Trade-offs |
| REST |
CRUD, simple queries |
Simple, but chatty |
| gRPC |
High performance, internal |
Fast, but complex |
| GraphQL |
Flexible client needs |
Flexible, but overhead |
| Message Queue |
Task processing |
Decoupled, but delay |
| Event Streaming |
Event sourcing, analytics |
Scalable, but complex |
Data Management
Database per Service
SEPARATE DATABASES:
┌────────┐ ┌────────┐ ┌────────┐
│Users │ │Orders │ │Products│
│Service │ │Service │ │Service │
├────────┤ ├────────┤ ├────────┤
│PostgreSQL│ │MongoDB │ │MySQL │
└────────┘ └────────┘ └────────┘
Benefits:
✓ Independent scaling
✓ Technology freedom
✓ No shared schema coupling
✓ Fault isolation
Challenges:
- No joins across services
- Eventual consistency
- Data duplication
Data Consistency Patterns
SAGA PATTERN (Choreography):
┌──────┐ ┌──────┐ ┌──────┐
│Order │─────>│Payment│─────>│Ship │
│Create│ │Process│ │Order │
└──────┘ └──────┘ └──────┘
│ │ │
└─────────────┴─────────────┘
Each service publishes events
that trigger next step
SAGA PATTERN (Orchestration):
┌────────────┐
│Orchestrator│
└────────────┘
/ │ \
↓ ↓ ↓
┌──────┐ ┌──────┐ ┌──────┐
│Order │ │Payment│ │Ship │
└──────┘ └──────┘ └──────┘
Central coordinator manages flow
Event Sourcing
Instead of storing current state:
┌────────────────────────────┐
│ Account: $500 │
└────────────────────────────┘
Store all events:
┌────────────────────────────┐
│ 1. AccountCreated $0 │
│ 2. Deposited $1000 │
│ 3. Withdrawn $300 │
│ 4. Withdrawn $200 │
│ Current: $500 │
└────────────────────────────┘
Benefits:
✓ Complete audit trail
✓ Temporal queries
✓ Event replay
✓ Natural fit for CQRS
API Gateway
Gateway Pattern
┌─────────────┐
│ API Gateway │
└──────┬──────┘
┌──────────┬──┴──┬──────────┐
↓ ↓ ↓ ↓
┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐
│Users │ │Orders │ │Products│ │Auth │
│Service │ │Service │ │Service │ │Service │
└────────┘ └────────┘ └────────┘ └────────┘
Gateway responsibilities:
- Request routing
- Authentication/Authorization
- Rate limiting
- Load balancing
- Request/Response transformation
- Caching
- Monitoring
BFF (Backend for Frontend)
Mobile App Web App
│ │
↓ ↓
┌────────────┐ ┌────────────┐
│ Mobile BFF │ │ Web BFF │
└─────┬──────┘ └─────┬──────┘
│ │
┌──────┴────────────────┴──────┐
│ Internal APIs │
└───────────────────────────────┘
Each BFF:
- Optimized for its client
- Aggregates multiple services
- Handles client-specific logic
Service Discovery
Client-Side Discovery
┌──────┐ ┌──────────────┐
│Client│────>│Service │────> Service A (192.168.1.10)
│ │ │Registry │────> Service A (192.168.1.11)
│ │<────│(Consul, etcd)│────> Service A (192.168.1.12)
└──────┘ └──────────────┘
Client queries registry, then calls service directly.
Client handles load balancing.
Server-Side Discovery
┌──────┐ ┌─────────────┐ ┌──────────────┐
│Client│────>│Load Balancer│────>│Service │
│ │ │ │ │Registry │
└──────┘ └─────────────┘ └──────────────┘
│
┌────────┴────────┐
↓ ↓ ↓
Service Service Service
Load balancer handles discovery and routing.
Simpler for clients.
Resilience Patterns
Circuit Breaker
States:
┌────────┐ Failures ┌────────┐
│ CLOSED │───────────────>│ OPEN │
│(normal)│ │(reject)│
└────────┘ └────────┘
↑ │
│ Timeout │
│ ┌────────────┐ │
└───│HALF-OPEN │<────────┘
│(test) │
└────────────┘
Implementation:
- Track failure count
- Open circuit after threshold
- Reject calls while open
- Periodically test with half-open
- Close circuit on success
Retry Pattern
async function withRetry<T>(
fn: () => Promise<T>,
maxAttempts: number = 3,
backoff: number = 1000
): Promise<T> {
for (let attempt = 1; attempt <= maxAttempts; attempt++) {
try {
return await fn();
} catch (error) {
if (attempt === maxAttempts) throw error;
// Exponential backoff with jitter
const delay = backoff * Math.pow(2, attempt - 1);
const jitter = delay * 0.1 * Math.random();
await sleep(delay + jitter);
}
}
}
Bulkhead Pattern
Isolate resources to prevent cascade:
┌─────────────────────────────────────┐
│ Service A │
│ ┌─────────┐ ┌─────────┐ │
│ │Thread │ │Thread │ │
│ │Pool 1 │ │Pool 2 │ │
│ │(Service │ │(Service │ │
│ │ B) │ │ C) │ │
│ └─────────┘ └─────────┘ │
└─────────────────────────────────────┘
If Service C is slow, only Pool 2 is affected.
Service B calls continue normally.
Timeout Pattern
async function withTimeout<T>(
fn: () => Promise<T>,
ms: number
): Promise<T> {
return Promise.race([
fn(),
new Promise<T>((_, reject) =>
setTimeout(() => reject(new Error('Timeout')), ms)
),
]);
}
// Always set timeouts on external calls
const user = await withTimeout(
() => userService.getUser(id),
5000 // 5 second timeout
);
Observability
The Three Pillars
LOGS:
┌─────────────────────────────────────────┐
│ 2024-01-15 10:30:45 [INFO] OrderService │
│ Order created: { id: 123, user: 456 } │
└─────────────────────────────────────────┘
METRICS:
┌─────────────────────────────────────────┐
│ order_created_total: 1523 │
│ order_processing_seconds: 0.234 │
│ active_connections: 45 │
└─────────────────────────────────────────┘
TRACES:
┌─────────────────────────────────────────┐
│ [Request ID: abc123] │
│ ├─ Gateway: 2ms │
│ ├─ OrderService: 150ms │
│ │ ├─ UserService: 45ms │
│ │ └─ InventoryService: 80ms │
│ └─ Total: 152ms │
└─────────────────────────────────────────┘
Distributed Tracing
Trace Context Propagation:
┌──────┐ X-Trace-ID: abc ┌──────┐
│API │─────────────────>│Order │
│GW │ │Svc │
└──────┘ └──────┘
│
X-Trace-ID: abc │
↓
┌──────┐
│User │
│Svc │
└──────┘
Tools: Jaeger, Zipkin, AWS X-Ray, Datadog
Health Checks
// Liveness: Is the service running?
app.get('/health/live', (req, res) => {
res.status(200).json({ status: 'alive' });
});
// Readiness: Is the service ready to handle traffic?
app.get('/health/ready', async (req, res) => {
const dbHealthy = await checkDatabase();
const cacheHealthy = await checkCache();
if (dbHealthy && cacheHealthy) {
res.status(200).json({ status: 'ready' });
} else {
res.status(503).json({
status: 'not ready',
checks: { database: dbHealthy, cache: cacheHealthy },
});
}
});
Deployment
Containerization
# Dockerfile
FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY dist ./dist
USER node
EXPOSE 3000
CMD ["node", "dist/main.js"]
Kubernetes Basics
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: order-service
spec:
replicas: 3
selector:
matchLabels:
app: order-service
template:
metadata:
labels:
app: order-service
spec:
containers:
- name: order-service
image: order-service:1.0.0
ports:
- containerPort: 3000
resources:
limits:
cpu: '500m'
memory: '256Mi'
livenessProbe:
httpGet:
path: /health/live
port: 3000
readinessProbe:
httpGet:
path: /health/ready
port: 3000
Testing Strategies
Test Pyramid for Microservices
/\
/ \ E2E Tests
/────\ (Few, slow, brittle)
/ \
/────────\ Contract Tests
/ \ (Service boundaries)
/────────────\
/ \ Integration Tests
/────────────────\ (With dependencies)
/ \
/────────────────────\ Unit Tests
/ \ (Many, fast, isolated)
/________________________\
Contract Testing
// Consumer test (Order Service)
describe('User Service Contract', () => {
it('returns user by ID', async () => {
// Define expected interaction
await provider.addInteraction({
state: 'user 123 exists',
uponReceiving: 'a request for user 123',
withRequest: {
method: 'GET',
path: '/users/123',
},
willRespondWith: {
status: 200,
body: {
id: '123',
name: like('John'),
email: like('john@example.com'),
},
},
});
// Test passes if consumer expectations match
});
});
Best Practices
DO:
- Start with a monolith, extract services later
- Define clear service boundaries
- Use asynchronous communication where possible
- Implement circuit breakers
- Centralize logging and monitoring
- Automate everything
- Design for failure
- Version your APIs
DON'T:
- Create too many, too small services
- Share databases between services
- Make synchronous chains too deep
- Ignore distributed system complexities
- Couple services through shared libraries
- Skip contract testing
- Deploy without monitoring
Migration Checklist
Monolith to Microservices