Claude Code Plugins

Community-maintained marketplace

Feedback

HyDE (Hypothetical Document Embeddings) for improved semantic retrieval. Use when queries don't match document vocabulary, retrieval quality is poor, or implementing advanced RAG patterns.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name hyde-retrieval
description HyDE (Hypothetical Document Embeddings) for improved semantic retrieval. Use when queries don't match document vocabulary, retrieval quality is poor, or implementing advanced RAG patterns.
context fork
agent data-pipeline-engineer

HyDE (Hypothetical Document Embeddings)

Generate hypothetical answer documents to bridge vocabulary gaps in semantic search.

The Problem

Direct query embedding often fails due to vocabulary mismatch:

Query: "scaling async data pipelines"
Docs use: "event-driven messaging", "Apache Kafka", "message brokers"
→ Low similarity scores despite high relevance

The Solution

Instead of embedding the query, generate a hypothetical answer document:

Query: "scaling async data pipelines"
→ LLM generates: "To scale asynchronous data pipelines, use event-driven
   messaging with Apache Kafka. Message brokers provide backpressure..."
→ Embed the hypothetical document
→ Now matches docs using similar terminology

Implementation

from openai import AsyncOpenAI
from pydantic import BaseModel, Field

class HyDEResult(BaseModel):
    """Result of HyDE generation."""
    original_query: str
    hypothetical_doc: str
    embedding: list[float]

async def generate_hyde(
    query: str,
    llm: AsyncOpenAI,
    embed_fn: callable,
    max_tokens: int = 150,
) -> HyDEResult:
    """Generate hypothetical document and embed it."""

    # Generate hypothetical answer
    response = await llm.chat.completions.create(
        model="gpt-4o-mini",  # Fast, cheap model
        messages=[
            {"role": "system", "content":
                "Write a short paragraph that would answer this query. "
                "Use technical terminology that documentation would use."},
            {"role": "user", "content": query}
        ],
        max_tokens=max_tokens,
        temperature=0.3,  # Low temp for consistency
    )

    hypothetical_doc = response.choices[0].message.content

    # Embed the hypothetical document (not the query!)
    embedding = await embed_fn(hypothetical_doc)

    return HyDEResult(
        original_query=query,
        hypothetical_doc=hypothetical_doc,
        embedding=embedding,
    )

With Caching

from functools import lru_cache
import hashlib

class HyDEService:
    def __init__(self, llm, embed_fn):
        self.llm = llm
        self.embed_fn = embed_fn
        self._cache: dict[str, HyDEResult] = {}

    def _cache_key(self, query: str) -> str:
        return hashlib.md5(query.lower().strip().encode()).hexdigest()

    async def generate(self, query: str) -> HyDEResult:
        key = self._cache_key(query)

        if key in self._cache:
            return self._cache[key]

        result = await generate_hyde(query, self.llm, self.embed_fn)
        self._cache[key] = result
        return result

Per-Concept HyDE (Advanced)

For multi-concept queries, generate HyDE for each concept:

async def batch_hyde(
    concepts: list[str],
    hyde_service: HyDEService,
) -> list[HyDEResult]:
    """Generate HyDE embeddings for multiple concepts in parallel."""
    import asyncio

    tasks = [hyde_service.generate(concept) for concept in concepts]
    return await asyncio.gather(*tasks)

When to Use HyDE

Scenario Use HyDE?
Abstract/conceptual queries Yes
Exact term searches No (use keyword)
Code snippet searches No
Natural language questions Yes
Vocabulary mismatch suspected Yes

Fallback Strategy

async def hyde_with_fallback(
    query: str,
    hyde_service: HyDEService,
    embed_fn: callable,
    timeout: float = 3.0,
) -> list[float]:
    """HyDE with fallback to direct embedding on timeout."""
    import asyncio

    try:
        async with asyncio.timeout(timeout):
            result = await hyde_service.generate(query)
            return result.embedding
    except TimeoutError:
        # Fallback to direct query embedding
        return await embed_fn(query)

Performance Tips

  • Use fast model (gpt-4o-mini, claude-3-haiku) for generation
  • Cache aggressively (queries often repeat)
  • Set tight timeouts (2-3s) with fallback
  • Keep hypothetical docs concise (100-200 tokens)
  • Combine with query decomposition for best results

References