| name | ml-deployment-helper |
| description | Prepares ML models for production deployment with containerization, API creation, monitoring setup, and A/B testing. Activates for "deploy model", "production deployment", "model API", "containerize model", "docker ml", "serving ml model", "model monitoring", "A/B test model". Generates deployment artifacts and ensures models are production-ready with monitoring, versioning, and rollback capabilities. |
ML Deployment Helper
Overview
Bridges the gap between trained models and production systems. Generates deployment artifacts, APIs, monitoring, and A/B testing infrastructure following MLOps best practices.
Deployment Checklist
Before deploying any model, this skill ensures:
- ✅ Model versioned and tracked
- ✅ Dependencies documented (requirements.txt/Dockerfile)
- ✅ API endpoint created
- ✅ Input validation implemented
- ✅ Monitoring configured
- ✅ A/B testing ready
- ✅ Rollback plan documented
- ✅ Performance benchmarked
Deployment Patterns
Pattern 1: REST API (FastAPI)
from specweave import create_model_api
# Generates production-ready API
api = create_model_api(
model_path="models/model-v3.pkl",
increment="0042",
framework="fastapi"
)
# Creates:
# - api/
# ├── main.py (FastAPI app)
# ├── models.py (Pydantic schemas)
# ├── predict.py (Prediction logic)
# ├── Dockerfile
# ├── requirements.txt
# └── tests/
Generated main.py:
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import joblib
app = FastAPI(title="Recommendation Model API", version="0042-v3")
model = joblib.load("model-v3.pkl")
class PredictionRequest(BaseModel):
user_id: int
context: dict
@app.post("/predict")
async def predict(request: PredictionRequest):
try:
prediction = model.predict([request.dict()])
return {
"recommendations": prediction.tolist(),
"model_version": "0042-v3",
"timestamp": datetime.now()
}
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
@app.get("/health")
async def health():
return {"status": "healthy", "model_loaded": model is not None}
Pattern 2: Batch Prediction
from specweave import create_batch_predictor
# For offline scoring
batch_predictor = create_batch_predictor(
model_path="models/model-v3.pkl",
increment="0042",
input_path="s3://bucket/data/",
output_path="s3://bucket/predictions/"
)
# Creates:
# - batch/
# ├── predictor.py
# ├── scheduler.yaml (Airflow/Kubernetes CronJob)
# └── monitoring.py
Pattern 3: Real-Time Streaming
from specweave import create_streaming_predictor
# For Kafka/Kinesis streams
streaming = create_streaming_predictor(
model_path="models/model-v3.pkl",
increment="0042",
input_topic="user-events",
output_topic="predictions"
)
# Creates:
# - streaming/
# ├── consumer.py
# ├── predictor.py
# ├── producer.py
# └── docker-compose.yaml
Containerization
from specweave import containerize_model
# Generates optimized Dockerfile
dockerfile = containerize_model(
model_path="models/model-v3.pkl",
framework="sklearn",
python_version="3.10",
increment="0042"
)
Generated Dockerfile:
FROM python:3.10-slim
WORKDIR /app
# Copy model and dependencies
COPY models/model-v3.pkl /app/model.pkl
COPY requirements.txt /app/
# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt
# Copy application
COPY api/ /app/api/
# Health check
HEALTHCHECK --interval=30s --timeout=3s \
CMD curl -f http://localhost:8000/health || exit 1
# Run API
CMD ["uvicorn", "api.main:app", "--host", "0.0.0.0", "--port", "8000"]
Monitoring Setup
from specweave import setup_model_monitoring
# Configures monitoring for production
monitoring = setup_model_monitoring(
model_name="recommendation-model",
increment="0042",
metrics=[
"prediction_latency",
"throughput",
"error_rate",
"prediction_distribution",
"feature_drift"
]
)
# Creates:
# - monitoring/
# ├── prometheus.yaml
# ├── grafana-dashboard.json
# ├── alerts.yaml
# └── drift-detector.py
A/B Testing Infrastructure
from specweave import create_ab_test
# Sets up A/B test framework
ab_test = create_ab_test(
control_model="model-v2.pkl",
treatment_model="model-v3.pkl",
traffic_split=0.1, # 10% to new model
success_metric="click_through_rate",
increment="0042"
)
# Creates:
# - ab-test/
# ├── router.py (traffic splitting)
# ├── metrics.py (success tracking)
# ├── statistical-tests.py (significance testing)
# └── dashboard.py (real-time monitoring)
A/B Test Router:
import random
def route_prediction(user_id, control_model, treatment_model):
"""Route to control or treatment based on user_id hash"""
# Consistent hashing (same user always gets same model)
user_bucket = hash(user_id) % 100
if user_bucket < 10: # 10% to treatment
return treatment_model.predict(features), "treatment"
else:
return control_model.predict(features), "control"
Model Versioning
from specweave import ModelVersion
# Register model version
version = ModelVersion.register(
model_path="models/model-v3.pkl",
increment="0042",
metadata={
"accuracy": 0.87,
"training_date": "2024-01-15",
"data_version": "v2024-01",
"framework": "xgboost==1.7.0"
}
)
# Easy rollback
if production_metrics["error_rate"] > threshold:
ModelVersion.rollback(to_version="0042-v2")
Load Testing
from specweave import load_test_model
# Benchmark model performance
results = load_test_model(
api_url="http://localhost:8000/predict",
requests_per_second=[10, 50, 100, 500, 1000],
duration_seconds=60,
increment="0042"
)
Output:
Load Test Results:
==================
| RPS | Latency P50 | Latency P95 | Latency P99 | Error Rate |
|------|-------------|-------------|-------------|------------|
| 10 | 35ms | 45ms | 50ms | 0.00% |
| 50 | 38ms | 52ms | 65ms | 0.00% |
| 100 | 45ms | 70ms | 95ms | 0.02% |
| 500 | 120ms | 250ms | 400ms | 1.20% |
| 1000 | 350ms | 800ms | 1200ms | 8.50% |
Recommendation: Deploy with max 100 RPS per instance
Target: <100ms P95 latency (achieved at 100 RPS)
Deployment Commands
# Generate deployment artifacts
/ml:deploy-prepare 0042
# Create API
/ml:create-api --increment 0042 --framework fastapi
# Setup monitoring
/ml:setup-monitoring 0042
# Create A/B test
/ml:create-ab-test --control v2 --treatment v3 --split 0.1
# Load test
/ml:load-test 0042 --rps 100 --duration 60s
# Deploy to production
/ml:deploy 0042 --environment production
Deployment Increment
The skill creates a deployment increment:
.specweave/increments/0043-deploy-recommendation-model/
├── spec.md (deployment requirements)
├── plan.md (deployment strategy)
├── tasks.md
│ ├── [ ] Containerize model
│ ├── [ ] Create API
│ ├── [ ] Setup monitoring
│ ├── [ ] Configure A/B test
│ ├── [ ] Load test
│ ├── [ ] Deploy to staging
│ ├── [ ] Validate staging
│ └── [ ] Deploy to production
├── api/ (FastAPI app)
├── monitoring/ (Grafana dashboards)
├── ab-test/ (A/B testing logic)
└── load-tests/ (Performance benchmarks)
Best Practices
- Always load test before production
- Start with 1-5% traffic in A/B test
- Monitor model drift in production
- Version everything (model, data, code)
- Document rollback plan before deploying
- Set up alerts for anomalies
- Gradual rollout (canary deployment)
Integration with SpecWeave
# After training model (increment 0042)
/specweave:inc "0043-deploy-recommendation-model"
# Generates deployment increment with all artifacts
/specweave:do
# Deploy to production when ready
/ml:deploy 0043 --environment production
Model deployment is not the end—it's the beginning of the MLOps lifecycle.