| name | mlops-basics |
| version | 2.0.0 |
| sasmp_version | 1.3.0 |
| description | Master MLOps fundamentals - lifecycle, principles, tools, practices, and organizational adoption |
| bonded_agent | 01-mlops-fundamentals |
| bond_type | PRIMARY_BOND |
| category | fundamentals |
| difficulty | beginner_to_intermediate |
| estimated_hours | 40 |
| prerequisites | |
| validation | [object Object] |
| retry | [object Object] |
| observability | [object Object] |
MLOps Basics Skill
Learn: Master the foundations of Machine Learning Operations for production ML systems.
Skill Overview
| Attribute | Value |
|---|---|
| Bonded Agent | 01-mlops-fundamentals |
| Difficulty | Beginner to Intermediate |
| Duration | 40 hours |
| Prerequisites | Basic ML concepts, Git |
Learning Objectives
After completing this skill, you will be able to:
- Explain the complete ML lifecycle from data to production
- Assess organizational MLOps maturity levels (0-4)
- Compare and select appropriate MLOps tools
- Design basic ML pipelines following best practices
- Implement foundational MLOps practices in your team
Topics Covered
Module 1: ML Lifecycle Fundamentals (8 hours)
┌─────────────────────────────────────────────────────────────────┐
│ ML LIFECYCLE PHASES │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Data │──▶│ Model │──▶│ Deploy │──▶│ Monitor │ │
│ │Collection│ │ Training │ │ │ │ │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ │ │ │
│ └──────────── Feedback Loop ◀──────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Topics:
- Data Engineering vs Data Science vs ML Engineering
- Feature Engineering lifecycle
- Model training and evaluation
- Deployment patterns
- Continuous monitoring and retraining
Exercises:
- Map an existing ML project to lifecycle phases
- Identify gaps in current workflow
- Create lifecycle documentation template
Module 2: MLOps Principles (8 hours)
Core Principles (2024-2025 Best Practices):
| Principle | Description | Implementation |
|---|---|---|
| Automation | Reduce manual intervention | CI/CD pipelines, auto-retraining |
| Reproducibility | Same inputs → same outputs | Version everything, seed randomness |
| Testing | Validate at every stage | Data tests, model tests, integration tests |
| Monitoring | Observe production behavior | Drift detection, performance metrics |
| Iteration | Continuous improvement | Feedback loops, experimentation |
Exercises:
- Audit current project against principles
- Create reproducibility checklist
- Design testing strategy
Module 3: Tool Ecosystem (12 hours)
Tool Categories & Recommendations:
MLOPS_TOOLS = {
"experiment_tracking": {
"open_source": ["MLflow", "DVC"],
"managed": ["Weights & Biases", "Neptune", "Comet"]
},
"feature_stores": {
"open_source": ["Feast"],
"managed": ["Tecton", "Hopsworks", "Vertex Feature Store"]
},
"orchestration": {
"open_source": ["Airflow", "Prefect", "Dagster"],
"managed": ["Kubeflow", "Vertex Pipelines", "SageMaker Pipelines"]
},
"model_serving": {
"open_source": ["TorchServe", "BentoML", "Seldon"],
"managed": ["Triton", "SageMaker Endpoints", "Vertex Endpoints"]
},
"monitoring": {
"open_source": ["Evidently", "NannyML"],
"managed": ["WhyLabs", "Arize", "Fiddler"]
}
}
Exercises:
- Install and configure MLflow locally
- Track an experiment with parameters and metrics
- Register a model in the model registry
Module 4: Best Practices & Patterns (12 hours)
Design Patterns:
Training Pipeline Pattern
Data → Validate → Transform → Train → Evaluate → RegisterServing Pipeline Pattern
Request → Preprocess → Predict → Postprocess → ResponseMonitoring Pattern
Production Data → Compare vs Reference → Detect Drift → Alert/Retrain
Exercises:
- Design a training pipeline for a sample project
- Implement basic data validation with Great Expectations
- Set up a simple monitoring dashboard
Code Templates
Template 1: MLflow Experiment Setup
# templates/mlflow_setup.py
import mlflow
from mlflow.tracking import MlflowClient
def setup_experiment(
experiment_name: str,
tracking_uri: str = "sqlite:///mlflow.db"
) -> str:
"""Initialize MLflow experiment with best practices."""
mlflow.set_tracking_uri(tracking_uri)
# Create or get experiment
client = MlflowClient()
experiment = client.get_experiment_by_name(experiment_name)
if experiment is None:
experiment_id = client.create_experiment(
name=experiment_name,
tags={"version": "1.0", "team": "ml-platform"}
)
else:
experiment_id = experiment.experiment_id
mlflow.set_experiment(experiment_name)
return experiment_id
def log_run(params: dict, metrics: dict, model_path: str):
"""Log a complete training run."""
with mlflow.start_run():
mlflow.log_params(params)
mlflow.log_metrics(metrics)
mlflow.log_artifact(model_path)
Template 2: MLOps Maturity Assessment
# templates/maturity_assessment.py
from dataclasses import dataclass
from enum import IntEnum
from typing import List
class MaturityLevel(IntEnum):
AD_HOC = 0
REPEATABLE = 1
RELIABLE = 2
SCALABLE = 3
OPTIMIZED = 4
@dataclass
class AssessmentQuestion:
dimension: str
question: str
level_0: str
level_1: str
level_2: str
level_3: str
level_4: str
ASSESSMENT_QUESTIONS = [
AssessmentQuestion(
dimension="Data Management",
question="How do you manage training data?",
level_0="Ad-hoc file storage",
level_1="Versioned with DVC/Git LFS",
level_2="Data validation in place",
level_3="Feature store implemented",
level_4="Automated data quality monitoring"
),
# Add more questions for each dimension
]
def calculate_maturity_score(responses: List[int]) -> tuple:
"""Calculate overall maturity score."""
avg_score = sum(responses) / len(responses)
level = MaturityLevel(int(avg_score))
return avg_score, level
Troubleshooting Guide
Common Issues
| Issue | Symptom | Solution |
|---|---|---|
| MLflow UI not loading | Connection refused | Check tracking URI, start server |
| Experiment not found | Experiment doesn't exist | Verify experiment name, create if needed |
| Artifact upload fails | Storage permission denied | Check artifact location permissions |
| Model registration fails | Model name conflict | Use unique names or versioning |
Debug Checklist
□ 1. Verify MLflow server is running
□ 2. Check tracking URI configuration
□ 3. Confirm artifact storage accessible
□ 4. Validate experiment exists
□ 5. Test with minimal example first
Knowledge Check
Self-Assessment Questions
- What are the 5 main phases of the ML lifecycle?
- How does MLOps differ from DevOps?
- When would you use MLflow vs Weights & Biases?
- What's the difference between Level 1 and Level 2 MLOps maturity?
- What should you version in an ML project?
Practical Exercises
- Beginner: Set up MLflow tracking for an existing notebook
- Intermediate: Create a reproducible training script with environment capture
- Advanced: Design an end-to-end pipeline diagram for a real use case
Resources
Official Documentation
Recommended Reading
- "Designing Machine Learning Systems" by Chip Huyen
- "Machine Learning Engineering" by Andriy Burkov
Related Skills
- [See: experiment-tracking] - Deep dive into experiment tracking
- [See: training-pipelines] - Build production training pipelines
- [See: ml-infrastructure] - Set up ML infrastructure
Skill Completion Criteria
To mark this skill as complete, you must:
- Complete all 4 modules
- Finish at least 80% of exercises
- Pass the knowledge check (4/5 correct)
- Complete one practical project
Version History
| Version | Date | Changes |
|---|---|---|
| 2.0.0 | 2024-12 | Production-grade upgrade with exercises and templates |
| 1.0.0 | 2024-11 | Initial release |