Claude Code Plugins

Community-maintained marketplace

Feedback

microservices-architecture

@tachyon-beep/skillpacks
2
0

Use when designing microservices, splitting monoliths, handling distributed data consistency, choosing communication patterns, or implementing service boundaries - covers domain-driven design, saga patterns, API gateways, service mesh

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name microservices-architecture
description Use when designing microservices, splitting monoliths, handling distributed data consistency, choosing communication patterns, or implementing service boundaries - covers domain-driven design, saga patterns, API gateways, service mesh

Microservices Architecture

Overview

Microservices architecture specialist covering service boundaries, communication patterns, data consistency, and operational concerns.

Core principle: Microservices decompose applications into independently deployable services organized around business capabilities - enabling team autonomy and technology diversity at the cost of operational complexity and distributed system challenges.

When to Use This Skill

Use when encountering:

  • Service boundaries: Defining service scope, applying domain-driven design
  • Monolith decomposition: Strategies for splitting existing systems
  • Data consistency: Sagas, event sourcing, eventual consistency patterns
  • Communication: Sync (REST/gRPC) vs async (events/messages)
  • API gateways: Routing, authentication, rate limiting
  • Service discovery: Registry patterns, DNS, configuration
  • Resilience: Circuit breakers, retries, timeouts, bulkheads
  • Observability: Distributed tracing, logging aggregation, metrics
  • Deployment: Containers, orchestration, blue-green deployments

Do NOT use for:

  • Monolithic architectures (microservices aren't always better)
  • Single-team projects < 5 services (overhead exceeds benefits)
  • Simple CRUD applications (microservices add unnecessary complexity)

When NOT to Use Microservices

Stay monolithic if:

  • Team < 10 engineers
  • Domain is not well understood yet
  • Strong consistency required everywhere
  • Network latency is critical
  • You can't invest in observability/DevOps infrastructure

Microservices require: Mature DevOps, monitoring, distributed systems expertise, organizational support.

Service Boundary Patterns (Domain-Driven Design)

1. Bounded Contexts

Pattern: One microservice = One bounded context

❌ Too fine-grained (anemic services):
- UserService (just CRUD)
- OrderService (just CRUD)
- PaymentService (just CRUD)

✅ Business capability alignment:
- CustomerManagementService (user profiles, preferences, history)
- OrderFulfillmentService (order lifecycle, inventory, shipping)
- PaymentProcessingService (payment, billing, invoicing, refunds)

Identifying boundaries:

  1. Ubiquitous language - Different terms for same concept = different contexts
  2. Change patterns - Services that change together should stay together
  3. Team ownership - One team should own one service
  4. Data autonomy - Each service owns its data, no shared databases

2. Strategic DDD Patterns

Pattern Use When Example
Separate Ways Contexts are independent Analytics service, main app service
Partnership Teams must collaborate closely Order + Inventory services
Customer-Supplier Upstream/downstream relationship Payment gateway (upstream) → Order service
Conformist Accept upstream model as-is Third-party API integration
Anti-Corruption Layer Isolate from legacy/external systems ACL between new microservices and legacy monolith

3. Service Sizing Guidelines

Too small (Nanoservices):

  • Excessive network calls
  • Distributed monolith
  • Coordination overhead exceeds benefits

Too large (Minimonoliths):

  • Multiple teams modifying same service
  • Mixed deployment frequencies
  • Tight coupling re-emerges

Right size indicators:

  • Single team can own it
  • Deployable independently
  • Changes don't ripple to other services
  • Clear business capability
  • 100-10,000 LOC (highly variable)

Communication Patterns

Synchronous Communication

REST APIs:

# Order service calling Payment service
async def create_order(order: Order):
    # Synchronous REST call
    payment = await payment_service.charge(
        amount=order.total,
        customer_id=order.customer_id
    )

    if payment.status == "success":
        order.status = "confirmed"
        await db.save(order)
        return order
    else:
        raise PaymentFailedException()

Pros: Simple, request-response, easy to debug Cons: Tight coupling, availability dependency, latency cascades

gRPC:

# Proto definition
service OrderService {
    rpc CreateOrder (OrderRequest) returns (OrderResponse);
}

# Implementation
class OrderServicer(order_pb2_grpc.OrderServiceServicer):
    async def CreateOrder(self, request, context):
        # Type-safe, efficient binary protocol
        payment = await payment_stub.Charge(
            PaymentRequest(amount=request.total)
        )
        return OrderResponse(order_id=order.id)

Pros: Type-safe, efficient, streaming support Cons: HTTP/2 required, less human-readable, proto dependencies

Asynchronous Communication

Event-Driven (Pub/Sub):

# Order service publishes event
await event_bus.publish("order.created", {
    "order_id": order.id,
    "customer_id": customer.id,
    "total": order.total
})

# Inventory service subscribes
@event_bus.subscribe("order.created")
async def reserve_inventory(event):
    await inventory.reserve(event["order_id"])
    await event_bus.publish("inventory.reserved", {...})

# Notification service subscribes
@event_bus.subscribe("order.created")
async def send_confirmation(event):
    await email.send_order_confirmation(event)

Pros: Loose coupling, services independent, scalable Cons: Eventual consistency, harder to trace, ordering challenges

Message Queues (Point-to-Point):

# Producer
await queue.send("payment-processing", {
    "order_id": order.id,
    "amount": order.total
})

# Consumer
@queue.consumer("payment-processing")
async def process_payment(message):
    result = await payment_gateway.charge(message["amount"])
    if result.success:
        await message.ack()
    else:
        await message.nack(requeue=True)

Pros: Guaranteed delivery, work distribution, retry handling Cons: Queue becomes bottleneck, requires message broker

Communication Pattern Decision Matrix

Scenario Pattern Why
User-facing request/response Sync (REST/gRPC) Low latency, immediate feedback
Background processing Async (queue) Don't block user, retry support
Cross-service notifications Async (pub/sub) Loose coupling, multiple consumers
Real-time updates WebSocket/SSE Bidirectional, streaming
Data replication Event sourcing Audit trail, rebuild state
High throughput Async (messaging) Buffer spikes, backpressure

Data Consistency Patterns

1. Saga Pattern (Distributed Transactions)

Choreography (Event-Driven):

# Order Service
async def create_order(order):
    order.status = "pending"
    await db.save(order)
    await events.publish("order.created", order)

# Payment Service
@events.subscribe("order.created")
async def handle_order(event):
    try:
        await charge_customer(event["total"])
        await events.publish("payment.completed", event)
    except PaymentError:
        await events.publish("payment.failed", event)

# Inventory Service
@events.subscribe("payment.completed")
async def reserve_items(event):
    try:
        await reserve(event["items"])
        await events.publish("inventory.reserved", event)
    except InventoryError:
        await events.publish("inventory.failed", event)

# Order Service (Compensation)
@events.subscribe("payment.failed")
async def cancel_order(event):
    order = await db.get(event["order_id"])
    order.status = "cancelled"
    await db.save(order)

@events.subscribe("inventory.failed")
async def refund_payment(event):
    await payment.refund(event["order_id"])
    await cancel_order(event)

Orchestration (Coordinator):

class OrderSaga:
    def __init__(self, order):
        self.order = order
        self.completed_steps = []

    async def execute(self):
        try:
            # Step 1: Reserve inventory
            await self.reserve_inventory()
            self.completed_steps.append("inventory")

            # Step 2: Process payment
            await self.process_payment()
            self.completed_steps.append("payment")

            # Step 3: Confirm order
            await self.confirm_order()

        except Exception as e:
            # Compensate in reverse order
            await self.compensate()
            raise

    async def compensate(self):
        for step in reversed(self.completed_steps):
            if step == "inventory":
                await inventory_service.release(self.order.id)
            elif step == "payment":
                await payment_service.refund(self.order.id)

Choreography vs Orchestration:

Aspect Choreography Orchestration
Coordination Decentralized (events) Centralized (orchestrator)
Coupling Loose Tight to orchestrator
Complexity Distributed across services Concentrated in orchestrator
Tracing Harder (follow events) Easier (single coordinator)
Failure handling Implicit (event handlers) Explicit (orchestrator logic)
Best for Simple workflows Complex workflows

2. Event Sourcing

Pattern: Store events, not state

# Traditional approach (storing state)
class Order:
    id: int
    status: str  # "pending" → "confirmed" → "shipped"
    total: float

# Event sourcing (storing events)
class OrderCreated(Event):
    order_id: int
    total: float

class OrderConfirmed(Event):
    order_id: int

class OrderShipped(Event):
    order_id: int

# Rebuild state from events
def rebuild_order(order_id):
    events = event_store.get_events(order_id)
    order = Order()
    for event in events:
        order.apply(event)  # Apply each event to rebuild state
    return order

Pros: Complete audit trail, time travel, event replay Cons: Complexity, eventual consistency, schema evolution challenges

3. CQRS (Command Query Responsibility Segregation)

Separate read and write models:

# Write model (commands)
class CreateOrder:
    def execute(self, data):
        order = Order(**data)
        await db.save(order)
        await event_bus.publish("order.created", order)

# Read model (projections)
class OrderReadModel:
    # Denormalized for fast reads
    def __init__(self):
        self.cache = {}

    @event_bus.subscribe("order.created")
    async def on_order_created(self, event):
        self.cache[event["order_id"]] = {
            "id": event["order_id"],
            "customer_name": await get_customer_name(event["customer_id"]),
            "status": "pending",
            "total": event["total"]
        }

    def get_order(self, order_id):
        return self.cache.get(order_id)  # Fast read, no joins

Use when: Read/write patterns differ significantly (e.g., analytics dashboards)

Resilience Patterns

1. Circuit Breaker

from circuitbreaker import circuit

@circuit(failure_threshold=5, recovery_timeout=60)
async def call_payment_service(amount):
    response = await http.post("http://payment-service/charge", json={"amount": amount})
    if response.status >= 500:
        raise PaymentServiceError()
    return response.json()

# Circuit states:
# CLOSED → normal operation
# OPEN → fails fast after threshold
# HALF_OPEN → test if service recovered

2. Retry with Exponential Backoff

from tenacity import retry, stop_after_attempt, wait_exponential

@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=2, max=10)
)
async def call_with_retry(url):
    return await http.get(url)

# Retries: 2s → 4s → 8s

3. Timeout

import asyncio

async def call_with_timeout(url):
    try:
        return await asyncio.wait_for(
            http.get(url),
            timeout=5.0  # 5 second timeout
        )
    except asyncio.TimeoutError:
        return {"error": "Service timeout"}

4. Bulkhead

Isolate resources to prevent cascade failures:

# Separate thread pools for different services
payment_pool = ThreadPoolExecutor(max_workers=10)
inventory_pool = ThreadPoolExecutor(max_workers=5)

async def call_payment():
    return await asyncio.get_event_loop().run_in_executor(
        payment_pool,
        payment_service.call
    )

# If payment service is slow, it only exhausts payment_pool,
# inventory calls still work

API Gateway Pattern

Centralized entry point for client requests:

Client → API Gateway → [Order, Payment, Inventory services]

Responsibilities:

  • Routing requests to services
  • Authentication/authorization
  • Rate limiting
  • Request/response transformation
  • Caching
  • Logging/monitoring

Example (Kong, AWS API Gateway, Nginx):

# API Gateway config
routes:
  - path: /orders
    service: order-service
    auth: jwt
    ratelimit: 100/minute

  - path: /payments
    service: payment-service
    auth: oauth2
    ratelimit: 50/minute

Backend for Frontend (BFF) Pattern:

Web Client → Web BFF → Services
Mobile App → Mobile BFF → Services

Each client type has optimized gateway.

Service Discovery

1. Client-Side Discovery

# Service registry (Consul, Eureka)
registry = ServiceRegistry("http://consul:8500")

# Client looks up service
instances = registry.get_instances("payment-service")
instance = load_balancer.choose(instances)
response = await http.get(f"http://{instance.host}:{instance.port}/charge")

2. Server-Side Discovery (Load Balancer)

Client → Load Balancer → [Service Instance 1, Instance 2, Instance 3]

DNS-based: Kubernetes services, AWS ELB

Observability

Distributed Tracing

from opentelemetry import trace

tracer = trace.get_tracer(__name__)

async def create_order(order):
    with tracer.start_as_current_span("create-order") as span:
        span.set_attribute("order.id", order.id)
        span.set_attribute("order.total", order.total)

        # Trace propagates to payment service
        payment = await payment_service.charge(
            amount=order.total,
            trace_context=span.context
        )

        span.add_event("payment-completed")
        return order

Tools: Jaeger, Zipkin, AWS X-Ray, Datadog APM

Log Aggregation

Structured logging with correlation IDs:

import logging
import uuid

logger = logging.getLogger(__name__)

async def handle_request(request):
    correlation_id = request.headers.get("X-Correlation-ID") or str(uuid.uuid4())

    logger.info("Processing request", extra={
        "correlation_id": correlation_id,
        "service": "order-service",
        "user_id": request.user_id
    })

Tools: ELK stack (Elasticsearch, Logstash, Kibana), Splunk, Datadog

Monolith Decomposition Strategies

1. Strangler Fig Pattern

Gradually replace monolith with microservices:

Phase 1: Monolith handles everything
Phase 2: Extract service, proxy some requests to it
Phase 3: More services extracted, proxy more requests
Phase 4: Monolith retired

2. Branch by Abstraction

  1. Create abstraction layer in monolith
  2. Implement new service
  3. Gradually migrate code behind abstraction
  4. Remove old implementation
  5. Extract as microservice

3. Extract by Bounded Context

Priority order:

  1. Services with clear boundaries (authentication, payments)
  2. Services changing frequently
  3. Services with different scaling needs
  4. Services with technology mismatches (e.g., Java monolith, Python ML service)

Anti-Patterns

Anti-Pattern Why Bad Fix
Distributed Monolith Services share database, deploy together One DB per service, independent deployment
Nanoservices Too fine-grained, excessive network calls Merge related services, follow DDD
Shared Database Tight coupling, schema changes break multiple services Database per service
Synchronous Chains A→B→C→D, latency adds up, cascading failures Async events, parallelize where possible
Chatty Services N+1 calls, excessive network overhead Batch APIs, caching, coarser boundaries
No Circuit Breakers Cascading failures bring down system Circuit breakers + timeouts + retries
No Distributed Tracing Impossible to debug cross-service issues OpenTelemetry, correlation IDs

Cross-References

Related skills:

  • Message queuesmessage-queues (RabbitMQ, Kafka patterns)
  • REST APIsrest-api-design (service interface design)
  • gRPC → Check if gRPC skill exists
  • Securityordis-security-architect (service-to-service auth, zero trust)
  • Databasedatabase-integration (per-service databases, migrations)
  • Deploymentbackend-deployment (Docker, Kubernetes, CI/CD)
  • Testingapi-testing (contract testing, integration testing)

Further Reading

  • Building Microservices by Sam Newman
  • Domain-Driven Design by Eric Evans
  • Release It! by Michael Nygard (resilience patterns)
  • Microservices Patterns by Chris Richardson