| name | embedding-models |
| description | Embedding model configurations and cost calculators |
| allowed-tools | Bash, Read, Write, Edit, WebFetch |
Embedding Models Skill
Embedding model selection, configuration, and cost optimization for RAG pipelines.
Use When
- Selecting embedding models for vector search
- Configuring OpenAI, Cohere, or HuggingFace embeddings
- Calculating embedding generation costs
- Optimizing embedding performance vs cost tradeoffs
- Setting up local vs cloud embedding models
- Implementing embedding caching strategies
- User mentions: "embeddings", "vector models", "embedding costs", "semantic search models"
Model Selection Guide
Commercial Models
OpenAI Embeddings:
text-embedding-3-small- 1536 dims, $0.02/1M tokens, balanced performancetext-embedding-3-large- 3072 dims, $0.13/1M tokens, highest qualitytext-embedding-ada-002- 1536 dims, $0.10/1M tokens, legacy model
Cohere Embeddings:
embed-english-v3.0- 1024 dims, multilingual supportembed-english-light-v3.0- 384 dims, faster/cheaperembed-multilingual-v3.0- 1024 dims, 100+ languages
Open Source Models (HuggingFace)
Sentence Transformers:
all-MiniLM-L6-v2- 384 dims, 80MB, fast and efficientall-mpnet-base-v2- 768 dims, 420MB, high qualitymulti-qa-mpnet-base-dot-v1- 768 dims, optimized for Q&Aparaphrase-multilingual-mpnet-base-v2- 768 dims, 50+ languages
Specialized Models:
BAAI/bge-small-en-v1.5- 384 dims, SOTA small modelBAAI/bge-base-en-v1.5- 768 dims, excellent retrievalBAAI/bge-large-en-v1.5- 1024 dims, top performanceintfloat/e5-base-v2- 768 dims, strong general purpose
Cost Calculator
Use the cost calculator script to estimate embedding costs:
# Calculate costs for different models and volumes
python scripts/calculate-embedding-costs.py \
--documents 100000 \
--avg-tokens 500 \
--model text-embedding-3-small
# Compare multiple models
python scripts/calculate-embedding-costs.py \
--documents 100000 \
--avg-tokens 500 \
--compare
Setup Scripts
OpenAI Embeddings
bash scripts/setup-openai-embeddings.sh
Configures OpenAI embedding client with API key management and retry logic.
HuggingFace Embeddings
bash scripts/setup-huggingface-embeddings.sh
Downloads and configures sentence-transformers models locally.
Cohere Embeddings
bash scripts/setup-cohere-embeddings.sh
Sets up Cohere embedding client with API credentials.
Configuration Templates
OpenAI Configuration
# templates/openai-embedding-config.py
from openai import OpenAI
client = OpenAI(api_key="your-key")
embeddings = client.embeddings.create(
model="text-embedding-3-small",
input=["Your text here"]
)
HuggingFace Configuration
# templates/huggingface-embedding-config.py
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
embeddings = model.encode(["Your text here"])
Custom Model Template
# templates/custom-embedding-model.py
# Wrapper for any embedding model with consistent interface
Optimization Strategies
Cost Optimization:
- Use smaller models for high-volume applications
- Implement embedding caching (see examples/embedding-cache.py)
- Batch embedding generation (see examples/batch-embedding-generation.py)
- Consider local models for sensitive data
Performance Optimization:
- Use GPU acceleration for local models
- Batch processing for throughput
- Dimension reduction for storage/speed
- Model distillation for faster inference
Model Comparison Matrix
| Model | Dimensions | Size | Speed | Quality | Cost |
|---|---|---|---|---|---|
| text-embedding-3-small | 1536 | API | Fast | Good | $0.02/1M |
| text-embedding-3-large | 3072 | API | Medium | Excellent | $0.13/1M |
| all-MiniLM-L6-v2 | 384 | 80MB | Very Fast | Good | Free |
| all-mpnet-base-v2 | 768 | 420MB | Fast | Excellent | Free |
| bge-base-en-v1.5 | 768 | 420MB | Fast | Excellent | Free |
| embed-english-v3.0 | 1024 | API | Fast | Excellent | $0.10/1M |
Examples
Batch Embedding Generation:
# examples/batch-embedding-generation.py
# Process large document collections efficiently
Embedding Cache:
# examples/embedding-cache.py
# Cache embeddings to avoid redundant API calls
Decision Framework
Use OpenAI when:
- Need highest quality embeddings
- Low to medium volume (<10M tokens/month)
- Prefer managed service over self-hosting
- Working with latest models
Use Cohere when:
- Need multilingual support
- Require production SLA
- Want embedding customization
- Need both embedding and reranking
Use HuggingFace/Local when:
- High volume (>10M tokens/month)
- Data privacy requirements
- Have GPU infrastructure
- Cost optimization priority
- Offline/air-gapped environments
References
- Sentence Transformers: https://www.sbert.net/
- OpenAI Embeddings: https://platform.openai.com/docs/guides/embeddings
- Cohere Embeddings: https://docs.cohere.com/docs/embeddings
- MTEB Leaderboard: https://huggingface.co/spaces/mteb/leaderboard