Claude Code Plugins

Community-maintained marketplace

Feedback

Guidance for text embedding retrieval tasks using sentence transformers or similar embedding models. This skill should be used when the task involves loading documents, encoding text with embedding models, computing similarity scores (cosine similarity), and retrieving/ranking documents based on semantic similarity to a query. Applies to MTEB benchmark tasks, document retrieval, semantic search, and text similarity ranking.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name mteb-retrieve
description Guidance for text embedding retrieval tasks using sentence transformers or similar embedding models. This skill should be used when the task involves loading documents, encoding text with embedding models, computing similarity scores (cosine similarity), and retrieving/ranking documents based on semantic similarity to a query. Applies to MTEB benchmark tasks, document retrieval, semantic search, and text similarity ranking.

MTEB Retrieve

Overview

This skill provides guidance for text embedding retrieval tasks that involve encoding documents and queries using embedding models, computing similarity scores, and retrieving or ranking documents based on semantic similarity.

Workflow

Step 1: Inspect and Parse Data

Before writing any code, carefully inspect the raw data format:

  1. Read the data file and examine actual line contents
  2. Identify formatting artifacts such as:
    • Line number prefixes (e.g., 1→, 2→, 1., 1:)
    • Whitespace or tab characters
    • Quote characters or escape sequences
    • Header rows or metadata
  3. Design parsing logic that strips all non-content artifacts

Common data format issues:

  • Files with line numbers prepended (e.g., 1→Document text here)
  • CSV/TSV files with headers
  • JSON files with nested structures
  • Files with trailing whitespace or newlines

Verification: Print 2-3 parsed documents to confirm they contain only the actual text content.

Step 2: Load the Embedding Model

  1. Identify the model specified in the task (e.g., sentence-transformers/all-MiniLM-L6-v2)
  2. Load the model using the appropriate library (typically sentence-transformers)
  3. Verify model loading succeeded before proceeding
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('model-name')

Step 3: Encode Documents and Query

  1. Encode all documents using the model's encode method
  2. Encode the query using the same model
  3. Ensure consistent encoding - same model and parameters for both

Step 4: Compute Similarities

  1. Use cosine similarity (most common for embedding retrieval)
  2. Compute similarity between query embedding and all document embeddings
  3. Store similarities with corresponding document indices
from sklearn.metrics.pairwise import cosine_similarity
similarities = cosine_similarity([query_embedding], document_embeddings)[0]

Step 5: Rank and Retrieve

  1. Sort documents by similarity score in descending order
  2. Handle the ranking request (e.g., "5th most similar" means index 4 after sorting)
  3. Extract the requested document(s)

Step 6: Validate Results

Critical verification steps before finalizing:

  1. Print top 10 results with similarity scores and document text
  2. Semantic sanity check: Do top results relate to the query?
    • If query is "terminal-bench", expect documents containing "terminal", "bench", or "benchmark"
    • If results seem unrelated, investigate data parsing or encoding issues
  3. Check for anomalies:
    • Are similarity scores reasonable (typically 0.0 to 1.0)?
    • Are there unexpected ties in similarity values?
    • Do document texts look properly parsed?

Common Pitfalls

1. Data Format Parsing Errors

Problem: Document files often include line numbers, prefixes, or other formatting artifacts.

Example: A file might contain:

 1→Beyond the Imitation Game...
 2→MTEB: Massive Text Embedding Benchmark

If not properly parsed, embeddings are computed on 1→Beyond the Imitation Game... instead of just Beyond the Imitation Game....

Solution: Always inspect raw file contents and strip all formatting artifacts before encoding.

2. Skipping Validation

Problem: Accepting results without verification can lead to incorrect answers.

Solution: Always print intermediate results (top 10 documents with scores) and verify they make semantic sense given the query.

3. Off-by-One Errors in Ranking

Problem: Confusion between 0-indexed and 1-indexed rankings.

Example: "5th most similar" means:

  • Sort by similarity descending
  • Take index 4 (0-indexed) or position 5 (1-indexed)

Solution: Be explicit about indexing when retrieving ranked results.

4. Ignoring Semantic Reasonableness

Problem: Not questioning whether results make logical sense.

Example: If query is "terminal-bench" and the 5th result is "HumanEval: Benchmarking Python code generation", ask: Does this semantically relate to the query? If not, something may be wrong.

Solution: Apply domain knowledge to sanity-check results before finalizing.

Verification Checklist

Before submitting results, confirm:

  • Data was inspected for formatting artifacts
  • Documents were parsed to contain only actual text content
  • Embedding model loaded successfully
  • Query and documents encoded with same model
  • Similarity computation used correct metric (cosine similarity)
  • Top 10+ results printed and reviewed
  • Top results semantically relate to the query
  • Correct document extracted based on ranking request