Claude Code Plugins

Community-maintained marketplace

Feedback

Embedding model configurations and cost calculators

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name embedding-models
description Embedding model configurations and cost calculators
allowed-tools Bash, Read, Write, Edit, WebFetch

Embedding Models Skill

Embedding model selection, configuration, and cost optimization for RAG pipelines.

Use When

  • Selecting embedding models for vector search
  • Configuring OpenAI, Cohere, or HuggingFace embeddings
  • Calculating embedding generation costs
  • Optimizing embedding performance vs cost tradeoffs
  • Setting up local vs cloud embedding models
  • Implementing embedding caching strategies
  • User mentions: "embeddings", "vector models", "embedding costs", "semantic search models"

Model Selection Guide

Commercial Models

OpenAI Embeddings:

  • text-embedding-3-small - 1536 dims, $0.02/1M tokens, balanced performance
  • text-embedding-3-large - 3072 dims, $0.13/1M tokens, highest quality
  • text-embedding-ada-002 - 1536 dims, $0.10/1M tokens, legacy model

Cohere Embeddings:

  • embed-english-v3.0 - 1024 dims, multilingual support
  • embed-english-light-v3.0 - 384 dims, faster/cheaper
  • embed-multilingual-v3.0 - 1024 dims, 100+ languages

Open Source Models (HuggingFace)

Sentence Transformers:

  • all-MiniLM-L6-v2 - 384 dims, 80MB, fast and efficient
  • all-mpnet-base-v2 - 768 dims, 420MB, high quality
  • multi-qa-mpnet-base-dot-v1 - 768 dims, optimized for Q&A
  • paraphrase-multilingual-mpnet-base-v2 - 768 dims, 50+ languages

Specialized Models:

  • BAAI/bge-small-en-v1.5 - 384 dims, SOTA small model
  • BAAI/bge-base-en-v1.5 - 768 dims, excellent retrieval
  • BAAI/bge-large-en-v1.5 - 1024 dims, top performance
  • intfloat/e5-base-v2 - 768 dims, strong general purpose

Cost Calculator

Use the cost calculator script to estimate embedding costs:

# Calculate costs for different models and volumes
python scripts/calculate-embedding-costs.py \
  --documents 100000 \
  --avg-tokens 500 \
  --model text-embedding-3-small

# Compare multiple models
python scripts/calculate-embedding-costs.py \
  --documents 100000 \
  --avg-tokens 500 \
  --compare

Setup Scripts

OpenAI Embeddings

bash scripts/setup-openai-embeddings.sh

Configures OpenAI embedding client with API key management and retry logic.

HuggingFace Embeddings

bash scripts/setup-huggingface-embeddings.sh

Downloads and configures sentence-transformers models locally.

Cohere Embeddings

bash scripts/setup-cohere-embeddings.sh

Sets up Cohere embedding client with API credentials.

Configuration Templates

OpenAI Configuration

# templates/openai-embedding-config.py
from openai import OpenAI
client = OpenAI(api_key="your-key")

embeddings = client.embeddings.create(
    model="text-embedding-3-small",
    input=["Your text here"]
)

HuggingFace Configuration

# templates/huggingface-embedding-config.py
from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')
embeddings = model.encode(["Your text here"])

Custom Model Template

# templates/custom-embedding-model.py
# Wrapper for any embedding model with consistent interface

Optimization Strategies

Cost Optimization:

  1. Use smaller models for high-volume applications
  2. Implement embedding caching (see examples/embedding-cache.py)
  3. Batch embedding generation (see examples/batch-embedding-generation.py)
  4. Consider local models for sensitive data

Performance Optimization:

  1. Use GPU acceleration for local models
  2. Batch processing for throughput
  3. Dimension reduction for storage/speed
  4. Model distillation for faster inference

Model Comparison Matrix

Model Dimensions Size Speed Quality Cost
text-embedding-3-small 1536 API Fast Good $0.02/1M
text-embedding-3-large 3072 API Medium Excellent $0.13/1M
all-MiniLM-L6-v2 384 80MB Very Fast Good Free
all-mpnet-base-v2 768 420MB Fast Excellent Free
bge-base-en-v1.5 768 420MB Fast Excellent Free
embed-english-v3.0 1024 API Fast Excellent $0.10/1M

Examples

Batch Embedding Generation:

# examples/batch-embedding-generation.py
# Process large document collections efficiently

Embedding Cache:

# examples/embedding-cache.py
# Cache embeddings to avoid redundant API calls

Decision Framework

Use OpenAI when:

  • Need highest quality embeddings
  • Low to medium volume (<10M tokens/month)
  • Prefer managed service over self-hosting
  • Working with latest models

Use Cohere when:

  • Need multilingual support
  • Require production SLA
  • Want embedding customization
  • Need both embedding and reranking

Use HuggingFace/Local when:

  • High volume (>10M tokens/month)
  • Data privacy requirements
  • Have GPU infrastructure
  • Cost optimization priority
  • Offline/air-gapped environments

References