Claude Code Plugins

Community-maintained marketplace

Feedback

|

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name ml-reviewer
description WHEN: Machine Learning/Deep Learning code review, PyTorch/TensorFlow patterns, Model training optimization, MLOps checks WHAT: Model architecture review + Training patterns + Data pipeline checks + GPU optimization + Experiment tracking WHEN NOT: Data analysis only → python-data-reviewer, General Python → python-reviewer

ML Reviewer Skill

Purpose

Reviews Machine Learning and Deep Learning code for PyTorch, TensorFlow, scikit-learn, and MLOps best practices.

When to Use

  • ML/DL project code review
  • "PyTorch", "TensorFlow", "Keras", "scikit-learn", "model training" mentions
  • Model performance, training optimization inspection
  • Projects with ML framework dependencies

Project Detection

  • torch, tensorflow, keras, sklearn in requirements.txt/pyproject.toml
  • .pt, .pth, .h5, .pkl model files
  • train.py, model.py, dataset.py files
  • Jupyter notebooks with ML imports

Workflow

Step 1: Analyze Project

**Framework**: PyTorch / TensorFlow / scikit-learn
**Python**: 3.10+
**CUDA**: 11.x / 12.x
**Task**: Classification / Regression / NLP / CV
**Stage**: Research / Production

Step 2: Select Review Areas

AskUserQuestion:

"Which areas to review?"
Options:
- Full ML pattern check (recommended)
- Model architecture review
- Training loop optimization
- Data pipeline efficiency
- MLOps/deployment patterns
multiSelect: true

Detection Rules

PyTorch Patterns

Check Recommendation Severity
Missing model.eval() Inconsistent inference HIGH
Missing torch.no_grad() Memory leak in inference HIGH
In-place operations in autograd Gradient computation error CRITICAL
DataLoader num_workers=0 CPU bottleneck MEDIUM
Missing gradient clipping Exploding gradients MEDIUM
# BAD: Missing eval() and no_grad()
def predict(model, x):
    return model(x)  # Dropout/BatchNorm inconsistent!

# GOOD: Proper inference mode
def predict(model, x):
    model.eval()
    with torch.no_grad():
        return model(x)

# BAD: In-place operation breaking autograd
x = torch.randn(10, requires_grad=True)
x += 1  # In-place! Breaks gradient computation

# GOOD: Out-of-place operation
x = torch.randn(10, requires_grad=True)
x = x + 1

# BAD: DataLoader bottleneck
loader = DataLoader(dataset, batch_size=32)  # num_workers=0

# GOOD: Parallel data loading
loader = DataLoader(
    dataset,
    batch_size=32,
    num_workers=4,
    pin_memory=True,  # For GPU
    persistent_workers=True,
)

# BAD: No gradient clipping
optimizer.step()

# GOOD: Clip gradients
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
optimizer.step()

TensorFlow/Keras Patterns

Check Recommendation Severity
Missing @tf.function Performance loss MEDIUM
Eager mode in production Slow inference HIGH
Large model in memory OOM risk HIGH
Missing mixed precision Training inefficiency MEDIUM
# BAD: No @tf.function
def train_step(x, y):
    with tf.GradientTape() as tape:
        pred = model(x)
        loss = loss_fn(y, pred)
    gradients = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))

# GOOD: Use @tf.function
@tf.function
def train_step(x, y):
    with tf.GradientTape() as tape:
        pred = model(x, training=True)
        loss = loss_fn(y, pred)
    gradients = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))

# BAD: Missing mixed precision
model.fit(x_train, y_train, epochs=10)

# GOOD: Enable mixed precision
tf.keras.mixed_precision.set_global_policy('mixed_float16')
model.fit(x_train, y_train, epochs=10)

scikit-learn Patterns

Check Recommendation Severity
fit_transform on test data Data leakage CRITICAL
Missing cross-validation Overfitting risk HIGH
No feature scaling Model performance MEDIUM
Hardcoded random_state Reproducibility LOW
# BAD: Data leakage
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.fit_transform(X_test)  # LEAK! Re-fitting

# GOOD: transform only on test
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)  # No re-fit

# BAD: No cross-validation
model.fit(X_train, y_train)
score = model.score(X_test, y_test)

# GOOD: Use cross-validation
from sklearn.model_selection import cross_val_score
scores = cross_val_score(model, X, y, cv=5)
print(f"CV Score: {scores.mean():.3f} (+/- {scores.std():.3f})")

# BAD: Pipeline without scaling
model = LogisticRegression()
model.fit(X_train, y_train)

# GOOD: Use Pipeline with scaling
from sklearn.pipeline import Pipeline
pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('classifier', LogisticRegression())
])
pipeline.fit(X_train, y_train)

Data Pipeline

Check Problem Solution
Loading full dataset to memory OOM Use generators/tf.data
No data augmentation Overfitting Add augmentation
Unbalanced classes Biased model Oversample/undersample/weights
No validation split No early stopping Use validation set
# BAD: Full dataset in memory
images = []
for path in all_image_paths:
    images.append(load_image(path))  # OOM for large datasets!

# GOOD: Use generator
def data_generator(paths, batch_size):
    for i in range(0, len(paths), batch_size):
        batch_paths = paths[i:i+batch_size]
        yield np.array([load_image(p) for p in batch_paths])

# GOOD: Use tf.data
dataset = tf.data.Dataset.from_tensor_slices(paths)
dataset = dataset.map(load_and_preprocess)
dataset = dataset.batch(32).prefetch(tf.data.AUTOTUNE)

# BAD: No class weights for imbalanced data
model.fit(X_train, y_train)

# GOOD: Add class weights
from sklearn.utils.class_weight import compute_class_weight
weights = compute_class_weight('balanced', classes=np.unique(y), y=y)
class_weights = dict(enumerate(weights))
model.fit(X_train, y_train, class_weight=class_weights)

GPU/Performance

Check Recommendation Severity
CPU tensor operations Use GPU tensors HIGH
Frequent GPU-CPU transfer Batch transfers HIGH
No gradient accumulation OOM for large batch MEDIUM
Missing torch.cuda.empty_cache() Memory fragmentation LOW
# BAD: CPU operations
x = torch.randn(1000, 1000)
y = torch.randn(1000, 1000)
z = x @ y  # CPU computation

# GOOD: GPU operations
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
x = torch.randn(1000, 1000, device=device)
y = torch.randn(1000, 1000, device=device)
z = x @ y  # GPU computation

# BAD: Frequent CPU-GPU transfer
for x, y in dataloader:
    x = x.cuda()
    y = y.cuda()
    loss = model(x, y)
    print(loss.item())  # Sync every iteration!

# GOOD: Batch logging
losses = []
for x, y in dataloader:
    x, y = x.to(device), y.to(device)
    loss = model(x, y)
    losses.append(loss)
if step % log_interval == 0:
    print(torch.stack(losses).mean().item())

# Gradient accumulation for large effective batch
accumulation_steps = 4
for i, (x, y) in enumerate(dataloader):
    loss = model(x, y) / accumulation_steps
    loss.backward()
    if (i + 1) % accumulation_steps == 0:
        optimizer.step()
        optimizer.zero_grad()

MLOps/Experiment Tracking

Check Recommendation Severity
No experiment tracking Reproducibility HIGH
Hardcoded hyperparameters Config management MEDIUM
No model versioning Deployment issues MEDIUM
Missing seed setting Non-reproducible HIGH
# BAD: No seed setting
model = train_model(X, y)

# GOOD: Set all seeds
import random
import numpy as np
import torch

def set_seed(seed=42):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True

set_seed(42)

# BAD: Hardcoded hyperparameters
lr = 0.001
batch_size = 32
epochs = 100

# GOOD: Use config file or hydra
import hydra
from omegaconf import DictConfig

@hydra.main(config_path="configs", config_name="train")
def train(cfg: DictConfig):
    model = build_model(cfg.model)
    optimizer = torch.optim.Adam(model.parameters(), lr=cfg.lr)

# GOOD: Use experiment tracking
import wandb
wandb.init(project="my-project", config=cfg)
for epoch in range(epochs):
    loss = train_epoch(model, dataloader)
    wandb.log({"loss": loss, "epoch": epoch})
wandb.finish()

Response Template

## ML Code Review Results

**Project**: [name]
**Framework**: PyTorch/TensorFlow/scikit-learn
**Task**: Classification/Regression/NLP/CV
**Files Analyzed**: X

### Model Architecture
| Status | File | Issue |
|--------|------|-------|
| MEDIUM | models/resnet.py | Missing dropout for regularization |
| LOW | models/transformer.py | Consider gradient checkpointing |

### Training Loop
| Status | File | Issue |
|--------|------|-------|
| HIGH | train.py | Missing model.eval() in validation (line 45) |
| HIGH | train.py | No gradient clipping (line 67) |

### Data Pipeline
| Status | File | Issue |
|--------|------|-------|
| CRITICAL | data/dataset.py | fit_transform on test data (line 23) |
| HIGH | data/loader.py | DataLoader num_workers=0 |

### MLOps
| Status | File | Issue |
|--------|------|-------|
| HIGH | train.py | No seed setting for reproducibility |
| MEDIUM | train.py | Hardcoded hyperparameters |

### Recommended Actions
1. [ ] Add model.eval() and torch.no_grad() for inference
2. [ ] Fix data leakage in preprocessing
3. [ ] Set random seeds for reproducibility
4. [ ] Add experiment tracking (wandb/mlflow)

Best Practices

  1. Training: eval mode, no_grad, gradient clipping, mixed precision
  2. Data: No leakage, proper splits, augmentation, balanced classes
  3. Performance: GPU operations, batch transfers, gradient accumulation
  4. MLOps: Seed setting, experiment tracking, config management
  5. Testing: Unit tests for data pipeline, model output shape tests

Integration

  • python-reviewer skill: General Python code quality
  • python-data-reviewer skill: Data preprocessing patterns
  • test-generator skill: ML test generation
  • docker-reviewer skill: ML containerization

Notes

  • Based on PyTorch 2.x, TensorFlow 2.x, scikit-learn 1.x
  • Supports distributed training patterns (DDP, FSDP)
  • Includes MLOps patterns (wandb, mlflow, hydra)