ML Reviewer Skill
Purpose
Reviews Machine Learning and Deep Learning code for PyTorch, TensorFlow, scikit-learn, and MLOps best practices.
When to Use
- ML/DL project code review
- "PyTorch", "TensorFlow", "Keras", "scikit-learn", "model training" mentions
- Model performance, training optimization inspection
- Projects with ML framework dependencies
Project Detection
torch, tensorflow, keras, sklearn in requirements.txt/pyproject.toml
.pt, .pth, .h5, .pkl model files
train.py, model.py, dataset.py files
- Jupyter notebooks with ML imports
Workflow
Step 1: Analyze Project
**Framework**: PyTorch / TensorFlow / scikit-learn
**Python**: 3.10+
**CUDA**: 11.x / 12.x
**Task**: Classification / Regression / NLP / CV
**Stage**: Research / Production
Step 2: Select Review Areas
AskUserQuestion:
"Which areas to review?"
Options:
- Full ML pattern check (recommended)
- Model architecture review
- Training loop optimization
- Data pipeline efficiency
- MLOps/deployment patterns
multiSelect: true
Detection Rules
PyTorch Patterns
| Check |
Recommendation |
Severity |
| Missing model.eval() |
Inconsistent inference |
HIGH |
| Missing torch.no_grad() |
Memory leak in inference |
HIGH |
| In-place operations in autograd |
Gradient computation error |
CRITICAL |
| DataLoader num_workers=0 |
CPU bottleneck |
MEDIUM |
| Missing gradient clipping |
Exploding gradients |
MEDIUM |
# BAD: Missing eval() and no_grad()
def predict(model, x):
return model(x) # Dropout/BatchNorm inconsistent!
# GOOD: Proper inference mode
def predict(model, x):
model.eval()
with torch.no_grad():
return model(x)
# BAD: In-place operation breaking autograd
x = torch.randn(10, requires_grad=True)
x += 1 # In-place! Breaks gradient computation
# GOOD: Out-of-place operation
x = torch.randn(10, requires_grad=True)
x = x + 1
# BAD: DataLoader bottleneck
loader = DataLoader(dataset, batch_size=32) # num_workers=0
# GOOD: Parallel data loading
loader = DataLoader(
dataset,
batch_size=32,
num_workers=4,
pin_memory=True, # For GPU
persistent_workers=True,
)
# BAD: No gradient clipping
optimizer.step()
# GOOD: Clip gradients
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
optimizer.step()
TensorFlow/Keras Patterns
| Check |
Recommendation |
Severity |
| Missing @tf.function |
Performance loss |
MEDIUM |
| Eager mode in production |
Slow inference |
HIGH |
| Large model in memory |
OOM risk |
HIGH |
| Missing mixed precision |
Training inefficiency |
MEDIUM |
# BAD: No @tf.function
def train_step(x, y):
with tf.GradientTape() as tape:
pred = model(x)
loss = loss_fn(y, pred)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
# GOOD: Use @tf.function
@tf.function
def train_step(x, y):
with tf.GradientTape() as tape:
pred = model(x, training=True)
loss = loss_fn(y, pred)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
# BAD: Missing mixed precision
model.fit(x_train, y_train, epochs=10)
# GOOD: Enable mixed precision
tf.keras.mixed_precision.set_global_policy('mixed_float16')
model.fit(x_train, y_train, epochs=10)
scikit-learn Patterns
| Check |
Recommendation |
Severity |
| fit_transform on test data |
Data leakage |
CRITICAL |
| Missing cross-validation |
Overfitting risk |
HIGH |
| No feature scaling |
Model performance |
MEDIUM |
| Hardcoded random_state |
Reproducibility |
LOW |
# BAD: Data leakage
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.fit_transform(X_test) # LEAK! Re-fitting
# GOOD: transform only on test
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test) # No re-fit
# BAD: No cross-validation
model.fit(X_train, y_train)
score = model.score(X_test, y_test)
# GOOD: Use cross-validation
from sklearn.model_selection import cross_val_score
scores = cross_val_score(model, X, y, cv=5)
print(f"CV Score: {scores.mean():.3f} (+/- {scores.std():.3f})")
# BAD: Pipeline without scaling
model = LogisticRegression()
model.fit(X_train, y_train)
# GOOD: Use Pipeline with scaling
from sklearn.pipeline import Pipeline
pipeline = Pipeline([
('scaler', StandardScaler()),
('classifier', LogisticRegression())
])
pipeline.fit(X_train, y_train)
Data Pipeline
| Check |
Problem |
Solution |
| Loading full dataset to memory |
OOM |
Use generators/tf.data |
| No data augmentation |
Overfitting |
Add augmentation |
| Unbalanced classes |
Biased model |
Oversample/undersample/weights |
| No validation split |
No early stopping |
Use validation set |
# BAD: Full dataset in memory
images = []
for path in all_image_paths:
images.append(load_image(path)) # OOM for large datasets!
# GOOD: Use generator
def data_generator(paths, batch_size):
for i in range(0, len(paths), batch_size):
batch_paths = paths[i:i+batch_size]
yield np.array([load_image(p) for p in batch_paths])
# GOOD: Use tf.data
dataset = tf.data.Dataset.from_tensor_slices(paths)
dataset = dataset.map(load_and_preprocess)
dataset = dataset.batch(32).prefetch(tf.data.AUTOTUNE)
# BAD: No class weights for imbalanced data
model.fit(X_train, y_train)
# GOOD: Add class weights
from sklearn.utils.class_weight import compute_class_weight
weights = compute_class_weight('balanced', classes=np.unique(y), y=y)
class_weights = dict(enumerate(weights))
model.fit(X_train, y_train, class_weight=class_weights)
GPU/Performance
| Check |
Recommendation |
Severity |
| CPU tensor operations |
Use GPU tensors |
HIGH |
| Frequent GPU-CPU transfer |
Batch transfers |
HIGH |
| No gradient accumulation |
OOM for large batch |
MEDIUM |
| Missing torch.cuda.empty_cache() |
Memory fragmentation |
LOW |
# BAD: CPU operations
x = torch.randn(1000, 1000)
y = torch.randn(1000, 1000)
z = x @ y # CPU computation
# GOOD: GPU operations
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
x = torch.randn(1000, 1000, device=device)
y = torch.randn(1000, 1000, device=device)
z = x @ y # GPU computation
# BAD: Frequent CPU-GPU transfer
for x, y in dataloader:
x = x.cuda()
y = y.cuda()
loss = model(x, y)
print(loss.item()) # Sync every iteration!
# GOOD: Batch logging
losses = []
for x, y in dataloader:
x, y = x.to(device), y.to(device)
loss = model(x, y)
losses.append(loss)
if step % log_interval == 0:
print(torch.stack(losses).mean().item())
# Gradient accumulation for large effective batch
accumulation_steps = 4
for i, (x, y) in enumerate(dataloader):
loss = model(x, y) / accumulation_steps
loss.backward()
if (i + 1) % accumulation_steps == 0:
optimizer.step()
optimizer.zero_grad()
MLOps/Experiment Tracking
| Check |
Recommendation |
Severity |
| No experiment tracking |
Reproducibility |
HIGH |
| Hardcoded hyperparameters |
Config management |
MEDIUM |
| No model versioning |
Deployment issues |
MEDIUM |
| Missing seed setting |
Non-reproducible |
HIGH |
# BAD: No seed setting
model = train_model(X, y)
# GOOD: Set all seeds
import random
import numpy as np
import torch
def set_seed(seed=42):
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.deterministic = True
set_seed(42)
# BAD: Hardcoded hyperparameters
lr = 0.001
batch_size = 32
epochs = 100
# GOOD: Use config file or hydra
import hydra
from omegaconf import DictConfig
@hydra.main(config_path="configs", config_name="train")
def train(cfg: DictConfig):
model = build_model(cfg.model)
optimizer = torch.optim.Adam(model.parameters(), lr=cfg.lr)
# GOOD: Use experiment tracking
import wandb
wandb.init(project="my-project", config=cfg)
for epoch in range(epochs):
loss = train_epoch(model, dataloader)
wandb.log({"loss": loss, "epoch": epoch})
wandb.finish()
Response Template
## ML Code Review Results
**Project**: [name]
**Framework**: PyTorch/TensorFlow/scikit-learn
**Task**: Classification/Regression/NLP/CV
**Files Analyzed**: X
### Model Architecture
| Status | File | Issue |
|--------|------|-------|
| MEDIUM | models/resnet.py | Missing dropout for regularization |
| LOW | models/transformer.py | Consider gradient checkpointing |
### Training Loop
| Status | File | Issue |
|--------|------|-------|
| HIGH | train.py | Missing model.eval() in validation (line 45) |
| HIGH | train.py | No gradient clipping (line 67) |
### Data Pipeline
| Status | File | Issue |
|--------|------|-------|
| CRITICAL | data/dataset.py | fit_transform on test data (line 23) |
| HIGH | data/loader.py | DataLoader num_workers=0 |
### MLOps
| Status | File | Issue |
|--------|------|-------|
| HIGH | train.py | No seed setting for reproducibility |
| MEDIUM | train.py | Hardcoded hyperparameters |
### Recommended Actions
1. [ ] Add model.eval() and torch.no_grad() for inference
2. [ ] Fix data leakage in preprocessing
3. [ ] Set random seeds for reproducibility
4. [ ] Add experiment tracking (wandb/mlflow)
Best Practices
- Training: eval mode, no_grad, gradient clipping, mixed precision
- Data: No leakage, proper splits, augmentation, balanced classes
- Performance: GPU operations, batch transfers, gradient accumulation
- MLOps: Seed setting, experiment tracking, config management
- Testing: Unit tests for data pipeline, model output shape tests
Integration
python-reviewer skill: General Python code quality
python-data-reviewer skill: Data preprocessing patterns
test-generator skill: ML test generation
docker-reviewer skill: ML containerization
Notes
- Based on PyTorch 2.x, TensorFlow 2.x, scikit-learn 1.x
- Supports distributed training patterns (DDP, FSDP)
- Includes MLOps patterns (wandb, mlflow, hydra)