| name | ai-integration |
| description | AI model integration with Local-First strategy. Use FREE local models (Ollama) before API calls. |
| allowed-tools | Read, Glob, Grep, Edit, Write, Bash(python:*), Bash(pytest:*) |
AI Integration - Local-First Strategy (Dec 2025)
Core Principle: FREE Before PAID
Use local models first, API only when necessary!
| Tier |
Model |
Cost |
When to Use |
| 0 |
Local Ollama |
FREE |
Try FIRST for everything |
| 1 |
Gemini Flash |
$0.10/1M |
When local fails |
| 2 |
DeepSeek API |
$0.14/1M |
Code fallback |
| 3 |
Sonnet 4.5 |
$3/1M |
Quality when needed |
| 4 |
Opus 4.5 |
$15/1M |
NEVER unless critical |
Local Models (via Ollama)
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull recommended models
ollama pull llama3.2:3b # 2GB - Fast, simple
ollama pull qwen2.5-coder:7b # 4GB - Code tasks
ollama pull deepseek-r1:8b # 5GB - Reasoning
LiteLLM Model IDs
MODELS = {
# Tier 0: LOCAL FREE
"local-llama": "ollama/llama3.2:3b",
"local-qwen": "ollama/qwen2.5-coder:7b",
"local-deepseek": "ollama/deepseek-r1:8b",
# Tier 1-2: CHEAP API
"gemini-flash": "vertex_ai/gemini-3-flash-preview",
"deepseek-api": "deepseek/deepseek-chat",
# Tier 3-4: QUALITY API (use sparingly!)
"claude-sonnet": "vertex_ai/claude-sonnet-4@20250514",
"claude-opus": "vertex_ai/claude-opus-4-5@20250514",
}
Task Routing (Local-First)
TASK_ROUTING = {
# Simple → Local FREE
TaskType.SIMPLE_TASK: ["local-llama", "gemini-flash"],
# Validation → Local coder FREE
TaskType.VALIDATION: ["local-qwen", "deepseek-api"],
# Understanding → Local reasoning FREE
TaskType.WORKFLOW_UNDERSTANDING: ["local-deepseek", "gemini-flash", "claude-sonnet"],
# Code gen → Local coder FREE
TaskType.CODE_GENERATION: ["local-qwen", "gemini-flash", "claude-sonnet"],
# Pipeline → Quality matters
TaskType.PIPELINE_GENERATION: ["local-qwen", "claude-sonnet", "gemini-pro"],
}
Regional Configuration
# For API models only
if "vertex_ai" in model_id:
if "claude" in model_id:
litellm.vertex_location = "us-east5"
else:
litellm.vertex_location = "us-central1"
elif model_id.startswith("ollama/"):
litellm.api_base = "http://localhost:11434"
Cost Savings
| Scenario |
API-Only |
Local-First |
Savings |
| 100 simple tasks |
$3.00 |
$0 |
100% |
| 100 validations |
$2.80 |
$0 |
100% |
| 100 code gen |
$30.00 |
$0.30 |
99% |
Environment
GOOGLE_CLOUD_PROJECT=gen-lang-client-0497834162
VERTEX_AI_LOCATION=us-central1
VERTEX_AI_CLAUDE_LOCATION=us-east5
CUSTOM_API_URL=http://localhost:11434