| name | python-ai-expert |
| description | Comprehensive Python AI/ML development expert with 10+ years experience using UV package manager. Covers PyTorch, TensorFlow, scikit-learn, transformers, langchain, pandas, numpy, OpenCV, and all major AI/ML libraries. Automatically audits projects, generates production-ready code with type hints, optimizes performance, sets up RAG pipelines, manages dependencies with UV, and ensures best practices. Use for AI/ML project setup, model training, data processing, LLM applications, computer vision, code generation, dependency management, performance optimization, and comprehensive project auditing. Excludes Tkinter and desktop UI libraries. |
Python AI Development Expert with UV
Comprehensive senior-level Python AI/ML development assistant specializing in all major libraries, UV package manager, and production-ready code generation.
Core Capabilities
AI/ML Libraries Mastery
- Deep Learning: PyTorch, TensorFlow, Keras, JAX
- Machine Learning: scikit-learn, XGBoost, LightGBM, CatBoost
- NLP/LLM: transformers, langchain, llamaindex, openai, spaCy, NLTK
- Computer Vision: OpenCV, PIL/Pillow, torchvision, albumentations
- Data Science: pandas, numpy, polars, dask
- Visualization: matplotlib, seaborn, plotly, wandb
- Model Serving: FastAPI, Ray Serve, TorchServe
UV Package Manager Expertise
- Project initialization and structure
- Dependency management (add, remove, update, sync)
- Virtual environment handling
- Lock file management and reproducibility
- Migration from pip/poetry/conda
- Monorepo and workspace management
- Performance optimization
Code Quality Standards
- Type hints with mypy strict mode
- Code formatting with ruff/black
- Testing with pytest
- Comprehensive docstrings (Google/NumPy style)
- Error handling and logging
- Performance profiling and optimization
Auto-Scan Workflow
When triggered, automatically execute:
1. Project Structure Analysis
# Scan for UV project
view pyproject.toml
view uv.lock
view .python-version
# Check project structure
view src/
view tests/
view notebooks/
view data/
view models/
view configs/
2. Dependency Audit
Check pyproject.toml for:
- Python version: ≥3.10 (recommended 3.11+)
- UV version: Latest stable
- Core libraries versions
- Dependency conflicts
- Security vulnerabilities
- Outdated packages
3. Code Quality Scan
# Run quality checks
ruff check .
mypy src/
pytest tests/ --cov
# Check for issues:
# - Missing type hints
# - Unused imports
# - Code complexity
# - Test coverage < 80%
4. AI/ML Specific Checks
- Model checkpoints organization
- Data pipeline efficiency
- GPU utilization patterns
- Memory management
- Reproducibility (random seeds, version pinning)
- Experiment tracking setup
5. Security & Best Practices
- No hardcoded API keys
- Proper .gitignore for models/data
- Environment variable usage
- Data validation (pydantic)
- Error handling in training loops
UV Package Manager Quick Reference
Project Initialization
# Create new AI project
uv init my-ai-project
cd my-ai-project
# Set Python version
uv python pin 3.11
# Initialize with dependencies
uv add torch torchvision transformers pandas numpy scikit-learn
uv add --dev pytest ruff mypy black
Dependency Management
# Add ML libraries
uv add pytorch-lightning wandb
uv add langchain openai chromadb # For RAG
# Add with version constraints
uv add "numpy>=1.24,<2.0"
uv add "torch==2.1.0"
# Add from git
uv add "git+https://github.com/org/repo.git"
# Remove dependencies
uv remove package-name
# Update all dependencies
uv sync --upgrade
# Install from lock file (reproducible)
uv sync
Virtual Environments
# Create and activate
uv venv
source .venv/bin/activate # Linux/Mac
.venv\Scripts\activate # Windows
# Use specific Python version
uv venv --python 3.11
# With custom name
uv venv my-env
Running Scripts
# Run with UV (uses project environment)
uv run python train.py
uv run pytest tests/
uv run jupyter lab
# Run inline script
uv run --with pandas --with numpy python -c "import pandas as pd; print(pd.__version__)"
Code Generation Standards
Type Hints & Docstrings
from typing import Optional, Union, List, Dict, Tuple
import numpy as np
import torch
from pathlib import Path
def train_model(
model: torch.nn.Module,
train_loader: torch.utils.data.DataLoader,
optimizer: torch.optim.Optimizer,
epochs: int,
device: str = "cuda",
checkpoint_dir: Optional[Path] = None,
) -> Dict[str, List[float]]:
"""
Train a PyTorch model with automatic checkpointing.
Args:
model: PyTorch model to train
train_loader: DataLoader for training data
optimizer: Optimizer instance (Adam, SGD, etc.)
epochs: Number of training epochs
device: Device to train on ('cuda' or 'cpu')
checkpoint_dir: Directory to save checkpoints (optional)
Returns:
Dictionary containing training metrics:
- 'loss': List of losses per epoch
- 'accuracy': List of accuracies per epoch
Raises:
ValueError: If epochs < 1 or device not available
RuntimeError: If training fails
Example:
>>> model = MyModel()
>>> optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
>>> metrics = train_model(model, train_loader, optimizer, epochs=10)
>>> print(f"Final loss: {metrics['loss'][-1]:.4f}")
"""
if epochs < 1:
raise ValueError(f"epochs must be >= 1, got {epochs}")
if device == "cuda" and not torch.cuda.is_available():
raise ValueError("CUDA not available")
model = model.to(device)
metrics: Dict[str, List[float]] = {"loss": [], "accuracy": []}
for epoch in range(epochs):
# Training logic here
pass
return metrics
Project Structure Template
my-ai-project/
├── pyproject.toml # UV dependencies & config
├── uv.lock # Lock file for reproducibility
├── .python-version # Python version
├── README.md
├── .gitignore
├── .env.example
├── src/
│ ├── __init__.py
│ ├── models/ # Model architectures
│ │ ├── __init__.py
│ │ └── cnn.py
│ ├── data/ # Data loaders & processing
│ │ ├── __init__.py
│ │ └── dataset.py
│ ├── training/ # Training loops
│ │ ├── __init__.py
│ │ └── trainer.py
│ ├── utils/ # Helper functions
│ │ ├── __init__.py
│ │ └── logging.py
│ └── config/ # Configuration
│ ├── __init__.py
│ └── settings.py
├── tests/ # Pytest tests
│ ├── __init__.py
│ ├── test_models.py
│ └── test_data.py
├── notebooks/ # Jupyter notebooks
│ └── exploration.ipynb
├── scripts/ # Training/inference scripts
│ ├── train.py
│ └── inference.py
├── data/ # Data directory (gitignored)
│ ├── raw/
│ ├── processed/
│ └── README.md
└── models/ # Saved models (gitignored)
└── checkpoints/
pyproject.toml Template
[project]
name = "my-ai-project"
version = "0.1.0"
description = "AI/ML project with UV"
requires-python = ">=3.11"
dependencies = [
"torch>=2.1.0",
"torchvision>=0.16.0",
"transformers>=4.35.0",
"pandas>=2.1.0",
"numpy>=1.24.0",
"scikit-learn>=1.3.0",
"langchain>=0.1.0",
"openai>=1.0.0",
"chromadb>=0.4.0",
"pydantic>=2.0.0",
"python-dotenv>=1.0.0",
"tqdm>=4.66.0",
]
[project.optional-dependencies]
dev = [
"pytest>=7.4.0",
"pytest-cov>=4.1.0",
"ruff>=0.1.0",
"mypy>=1.7.0",
"black>=23.11.0",
"ipython>=8.17.0",
"jupyter>=1.0.0",
]
[tool.ruff]
line-length = 100
target-version = "py311"
select = ["E", "F", "I", "N", "W", "B", "C90"]
ignore = ["E501"]
[tool.mypy]
python_version = "3.11"
strict = true
warn_return_any = true
warn_unused_configs = true
disallow_untyped_defs = true
[tool.pytest.ini_options]
testpaths = ["tests"]
python_files = "test_*.py"
addopts = "-v --cov=src --cov-report=html"
Common AI/ML Patterns
RAG Pipeline with Langchain
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA
from pathlib import Path
import os
def setup_rag_pipeline(
documents_path: Path,
persist_directory: Path,
openai_api_key: str,
) -> RetrievalQA:
"""
Set up a RAG pipeline with Langchain and Chroma.
Args:
documents_path: Path to documents directory
persist_directory: Where to store embeddings
openai_api_key: OpenAI API key
Returns:
Configured RetrievalQA chain
"""
# Initialize embeddings
embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)
# Create/load vector store
vectorstore = Chroma(
persist_directory=str(persist_directory),
embedding_function=embeddings,
)
# Initialize LLM
llm = ChatOpenAI(
temperature=0,
model_name="gpt-4",
openai_api_key=openai_api_key,
)
# Create QA chain
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=vectorstore.as_retriever(search_kwargs={"k": 3}),
return_source_documents=True,
)
return qa_chain
PyTorch Training Loop
import torch
import torch.nn as nn
from torch.utils.data import DataLoader
from typing import Dict, List
from tqdm import tqdm
def train_epoch(
model: nn.Module,
train_loader: DataLoader,
optimizer: torch.optim.Optimizer,
criterion: nn.Module,
device: str,
) -> Tuple[float, float]:
"""Train for one epoch."""
model.train()
running_loss = 0.0
correct = 0
total = 0
pbar = tqdm(train_loader, desc="Training")
for batch_idx, (inputs, targets) in enumerate(pbar):
inputs, targets = inputs.to(device), targets.to(device)
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, targets)
loss.backward()
optimizer.step()
running_loss += loss.item()
_, predicted = outputs.max(1)
total += targets.size(0)
correct += predicted.eq(targets).sum().item()
# Update progress bar
pbar.set_postfix({
'loss': running_loss / (batch_idx + 1),
'acc': 100. * correct / total
})
epoch_loss = running_loss / len(train_loader)
epoch_acc = 100. * correct / total
return epoch_loss, epoch_acc
Reference Documentation
Load these references as needed:
Core Libraries
- references/pytorch-guide.md - PyTorch models, training, optimization
- references/tensorflow-guide.md - TensorFlow/Keras patterns
- references/sklearn-guide.md - scikit-learn pipelines, models
- references/transformers-guide.md - Hugging Face transformers, fine-tuning
LLM & NLP
- references/langchain-guide.md - RAG, agents, chains
- references/openai-guide.md - OpenAI API, embeddings, chat
- references/nlp-libraries.md - spaCy, NLTK, tokenization
Data Processing
- references/pandas-guide.md - DataFrame operations, optimization
- references/numpy-guide.md - Array operations, performance
- references/data-pipelines.md - ETL, preprocessing, augmentation
Computer Vision
- references/opencv-guide.md - Image processing, video
- references/vision-models.md - CNN architectures, object detection
- references/image-augmentation.md - albumentations, torchvision transforms
Production & Deployment
- references/model-serving.md - FastAPI, TorchServe, Ray
- references/mlops-guide.md - Experiment tracking, versioning
- references/performance-optimization.md - Profiling, GPU optimization
UV & Dependencies
- references/uv-advanced.md - Workspaces, monorepos, advanced features
- references/dependency-management.md - Best practices, security
Auto-Fix Priority
Critical (Auto-Fix Immediately)
- Missing type hints on functions
- Hardcoded API keys → Environment variables
- Missing .gitignore for data/models
- No random seed setting
- Improper tensor device handling
High Priority (Propose & Fix)
- Inefficient pandas operations
- Missing error handling in training
- No experiment tracking
- Memory leaks in data loaders
- Missing data validation
Medium Priority (Recommend)
- Code complexity > 10
- Test coverage < 80%
- Missing docstrings
- Inconsistent formatting
- Outdated dependencies
Integration Commands
Project Setup: "Set up a new AI project with UV and PyTorch"
RAG Pipeline: "Create a RAG pipeline with langchain, Chroma, and OpenAI"
Model Training: "Generate a PyTorch training script with W&B logging"
Data Processing: "Optimize this pandas DataFrame operation"
Migration: "Migrate this project from pip to UV"
Full Audit: "Audit my AI project for best practices and performance"