Claude Code Plugins

Community-maintained marketplace

Feedback

Complete MLOps skill covering production ML lifecycle and security. Includes data ingestion, model deployment, drift detection, monitoring, plus ML security (prompt injection, jailbreak defense, RAG security, privacy, governance). Modern automation-first patterns with multi-layered defenses.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name ai-mlops
description Complete MLOps skill covering production ML lifecycle and security. Includes data ingestion, model deployment, drift detection, monitoring, plus ML security (prompt injection, jailbreak defense, RAG security, privacy, governance). Modern automation-first patterns with multi-layered defenses.

MLOps & ML Security — Complete Reference

Production ML lifecycle with modern security practices.

This skill covers:

  • Production: Data ingestion, deployment, drift detection, monitoring, incident response
  • Security: Prompt injection, jailbreak defense, RAG security, output filtering
  • Governance: Privacy protection, supply chain security, safety evaluation
  1. Data ingestion (dlt): Load data from APIs, databases to warehouses
  2. Model deployment: Batch jobs, real-time APIs, hybrid systems, event-driven automation
  3. Operations: Real-time monitoring, 18-second drift detection, automated retraining, incident response

Key Advances:

  • Event-driven, modular, auditable pipelines automating every key phase
  • 18-second drift detection with F1 >0.99 post-attack recovery
  • Automated retraining triggers (drift, schema change, volume threshold, manual override)
  • Scalable architecture: >2,300 req/sec with sub-50ms latency

It is execution-focused:

  • Data ingestion patterns (REST APIs, database replication, incremental loading)
  • Deployment patterns (batch, online, hybrid, streaming, event-driven)
  • Automated monitoring with real-time drift detection
  • Automated retraining pipelines (monitor → detect → trigger → validate → deploy)
  • Incident handling with rapid recovery (F1 >0.99 restoration)
  • Links to copy-paste templates in templates/

Quick Reference

Task Tool/Framework Command When to Use
Data Ingestion dlt (data load tool) dlt pipeline run, dlt init Loading from APIs, databases to warehouses
Batch Deployment Airflow, Dagster, Prefect airflow dags trigger, dagster job launch Scheduled predictions on large datasets
API Deployment FastAPI, Flask, TorchServe uvicorn app:app, torchserve --start Real-time inference (<500ms latency)
Model Registry MLflow, W&B mlflow.register_model(), wandb.log_model() Versioning and promoting models
Drift Detection Evidently, WhyLabs evidently.dashboard(), monitor metrics Automated drift monitoring (18s response)
Monitoring Prometheus, Grafana prometheus.yml, Grafana dashboards Metrics, alerts, SLO tracking
Incident Response Runbooks, PagerDuty Documented playbooks, alert routing Handling failures and degradation

When to Use This Skill

Claude should invoke this skill when the user asks for deployment, operations, or data ingestion help, e.g.:

  • "How do I deploy this model to prod?"
  • "Design a batch + online scoring architecture."
  • "Add monitoring and drift detection to our model."
  • "Write an incident runbook for this ML service."
  • "Package this LLM/RAG pipeline as an API."
  • "Plan our retraining and promotion workflow."
  • "Load data from Stripe API to Snowflake."
  • "Set up incremental database replication with dlt."
  • "Build an ELT pipeline for warehouse loading."

If the user is asking only about EDA, modelling, or theory, prefer:

  • ai-ml-data-science (EDA, features, modelling, SQL transformation with SQLMesh)
  • ai-llm (prompting, fine-tuning, eval)
  • ai-rag (retrieval pipeline design)
  • ai-llm-inference (compression, spec decode, serving internals)

If the user is asking about SQL transformation (after data is loaded), prefer:

  • ai-ml-data-science (SQLMesh templates for staging, intermediate, marts layers)

Decision Tree: Choosing Deployment Strategy

User needs to deploy: [ML System]
    ├─ Data Ingestion?
    │   ├─ From REST APIs? → dlt REST API templates
    │   ├─ From databases? → dlt database sources (PostgreSQL, MySQL, MongoDB)
    │   └─ Incremental loading? → dlt incremental patterns (timestamp, ID-based)
    │
    ├─ Model Serving?
    │   ├─ Latency <500ms? → FastAPI real-time API
    │   ├─ Batch predictions? → Airflow/Dagster batch pipeline
    │   └─ Mix of both? → Hybrid (batch features + online scoring)
    │
    ├─ Monitoring & Ops?
    │   ├─ Drift detection? → Evidently + automated retraining triggers
    │   ├─ Performance tracking? → Prometheus + Grafana dashboards
    │   └─ Incident response? → Runbooks + PagerDuty alerts
    │
    └─ LLM/RAG Production?
        ├─ Cost optimization? → Caching, prompt templates, token budgets
        └─ Safety? → See ai-mlops skill

Core Patterns Overview

This skill provides 13 production-ready patterns organized into comprehensive guides:

Data & Infrastructure Patterns

Pattern 0: Data Contracts, Ingestion & Lineage → See Data Ingestion Patterns

  • Data contracts with SLAs and versioning
  • Ingestion modes (CDC, batch, streaming)
  • Lineage tracking and schema evolution
  • Replay and backfill procedures

Pattern 1: Choose Deployment Mode → See Deployment Patterns

  • Decision table (batch, online, hybrid, streaming)
  • When to use each mode
  • Deployment mode selection checklist

Pattern 2: Standard Deployment Lifecycle → See Deployment Lifecycle

  • Pre-deploy, deploy, observe, operate, evolve phases
  • Environment promotion (dev → staging → prod)
  • Gradual rollout strategies (canary, blue-green)

Pattern 3: Packaging & Model Registry → See Model Registry Patterns

  • Model registry structure and metadata
  • Packaging strategies (Docker, ONNX, MLflow)
  • Promotion flows (experimental → production)
  • Versioning and governance

Serving Patterns

Pattern 4: Batch Scoring Pipeline → See Deployment Patterns

  • Orchestration with Airflow/Dagster
  • Idempotent scoring jobs
  • Validation and backfill procedures

Pattern 5: Real-Time API Scoring → See API Design Patterns

  • Service design (HTTP/JSON, gRPC)
  • Input/output schemas
  • Rate limiting, timeouts, circuit breakers

Pattern 6: Hybrid & Feature Store Integration → See Feature Store Patterns

  • Batch vs online features
  • Feature store architecture
  • Training-serving consistency
  • Point-in-time correctness

Operations Patterns

Pattern 7: Monitoring & Alerting → See Monitoring Best Practices

  • Data, performance, and technical metrics
  • SLO definition and tracking
  • Dashboard design and alerting strategies

Pattern 8: Drift Detection & Automated Retraining → See Drift Detection Guide

  • Real-time drift detection (18-second response)
  • Automated retraining triggers
  • Event-driven retraining pipelines
  • Performance targets (F1 >0.99 recovery)

Pattern 9: Incidents & Runbooks → See Incident Response Playbooks

  • Common failure modes
  • Detection, diagnosis, resolution
  • Post-mortem procedures

Pattern 10: LLM / RAG in Production → See LLM & RAG Production Patterns

  • Prompt and configuration management
  • Safety and compliance (PII, jailbreaks)
  • Cost optimization (token budgets, caching)
  • Monitoring and fallbacks

Pattern 11: Cross-Region, Residency & Rollback → See Multi-Region Patterns

  • Multi-region deployment architectures
  • Data residency and tenant isolation
  • Disaster recovery and failover
  • Regional rollback procedures

Pattern 12: Online Evaluation & Feedback Loops → See Online Evaluation Patterns

  • Feedback signal collection (implicit, explicit)
  • Shadow and canary deployments
  • A/B testing with statistical significance
  • Human-in-the-loop labeling
  • Automated retraining cadence

Resources (Detailed Guides)

For comprehensive operational guides, see:

Core Infrastructure:

Serving & APIs:

  • Deployment Patterns - Batch, online, hybrid, streaming deployment strategies and architectures
  • API Design Patterns - ML/LLM/RAG API patterns, input/output schemas, reliability patterns, versioning

Operations & Reliability:

Advanced Patterns:


Templates

Use these as copy-paste starting points for production artifacts:

Data Ingestion (dlt)

For loading data into warehouses and pipelines:

Use dlt when:

  • Loading data from APIs (Stripe, HubSpot, Shopify, custom APIs)
  • Replicating databases to warehouses
  • Building ELT pipelines with incremental loading
  • Managing data ingestion with Python

For SQL transformation (after ingestion), use:

ai-ml-data-science skill (SQLMesh templates for staging/intermediate/marts layers)

Deployment & Packaging

Monitoring & Operations

Navigation

Resources

Templates

Data


External Resources

See data/sources.json for curated references on:

  • Serving frameworks (FastAPI, Flask, gRPC, TorchServe, KServe, Ray Serve)
  • Orchestration (Airflow, Dagster, Prefect)
  • Model registries and MLOps (MLflow, W&B, Vertex AI, Sagemaker)
  • Monitoring and observability (Prometheus, Grafana, OpenTelemetry, Evidently)
  • Feature stores (Feast, Tecton, Vertex, Databricks)
  • Streaming & messaging (Kafka, Pulsar, Kinesis)
  • LLMOps & RAG infra (vector DBs, LLM gateways, safety tools)

Data Lake & Lakehouse

For comprehensive data lake/lakehouse patterns (beyond dlt ingestion), see data-lake-platform:

  • Table formats: Apache Iceberg, Delta Lake, Apache Hudi
  • Query engines: ClickHouse, DuckDB, Apache Doris, StarRocks
  • Alternative ingestion: Airbyte (GUI-based connectors)
  • Transformation: dbt (alternative to SQLMesh)
  • Streaming: Apache Kafka patterns
  • Orchestration: Dagster, Airflow

This skill focuses on ML-specific deployment, monitoring, and security. Use data-lake-platform for general-purpose data infrastructure.


Related Skills

For adjacent topics, reference these skills:

  • ai-ml-data-science - EDA, feature engineering, modelling, evaluation, SQLMesh transformations
  • ai-llm - Prompting, fine-tuning, evaluation for LLMs
  • ai-agents - Agentic workflows, multi-agent systems, LLMOps
  • ai-rag - RAG pipeline design, chunking, retrieval, evaluation
  • ai-llm-inference - Model serving optimization, quantization, batching
  • ai-prompt-engineering - Prompt design patterns and best practices
  • data-lake-platform - Data lake/lakehouse infrastructure (ClickHouse, Iceberg, Kafka)

Use this skill to turn trained models into reliable services, not to derive the model itself.