Claude Code Plugins

Community-maintained marketplace

Feedback

Production hybrid search with PGVector 0.7+ and BM25 using Reciprocal Rank Fusion, metadata filtering, and HNSW indexing for semantic retrieval

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name pgvector-search
description Production hybrid search with PGVector 0.7+ and BM25 using Reciprocal Rank Fusion, metadata filtering, and HNSW indexing for semantic retrieval
version 2.0.0
author YG Starter Template
tags pgvector, hybrid-search, bm25, rrf, semantic-search, retrieval, 2025

PGVector Hybrid Search

Production-grade semantic + keyword search using PostgreSQL 16+ and PGVector 0.7+

Overview

Hybrid search combines semantic similarity (vector embeddings) with keyword matching (BM25) to achieve better retrieval than either alone.

Real-World Use Cases:

  • Technical Documentation Search: Hybrid retrieval over API docs, tutorials, guides
  • Code Search Engine: Find similar functions/classes across repositories
  • E-commerce Product Discovery: Search by image + text attributes
  • Legal Document Retrieval: Citation-aware semantic search
  • Medical Literature Search: Semantic search over research papers with MeSH terms

Architecture:

Query
  ↓
[Generate embedding] → Vector Search (PGVector) → Top 30 results
  ↓
[Generate ts_query]  → Keyword Search (BM25)    → Top 30 results
  ↓
[Reciprocal Rank Fusion (RRF)] → Merge & re-rank → Top 10 final results

When to use this skill:

  • Building semantic search (RAG, knowledge bases, recommendations)
  • Implementing hybrid retrieval (vector + keyword)
  • Optimizing PGVector performance
  • Working with large document collections (1M+ chunks)

Core Concepts

1. Semantic Search (Vector Similarity)

How it works:

  1. Embed query: "database indexing strategies"[0.23, -0.15, ..., 0.42] (1024 dims)
  2. Find nearest neighbors: ORDER BY embedding <=> query_embedding LIMIT 30
  3. Returns: Conceptually similar documents (even with different words)

Example:

  • Query: "machine learning model training"
  • Matches: "neural network optimization", "deep learning techniques"
  • Misses: "ML model training" (different embeddings despite similar meaning)

Strengths:

  • Captures semantic meaning
  • Works across languages
  • Handles synonyms ("car" matches "automobile")

Weaknesses:

  • Slow for exact keyword matches
  • Sensitive to embedding quality
  • Doesn't handle rare technical terms well

2. Keyword Search (BM25)

How it works:

  1. Tokenize query: "database indexing"database & indexing
  2. Full-text search: WHERE content_tsvector @@ to_tsquery('database & indexing')
  3. Rank by BM25 score (TF-IDF + document length normalization)

Example:

  • Query: "PostgreSQL B-tree index"
  • Matches: Documents with exact phrase "PostgreSQL B-tree index"
  • Misses: "Postgres tree-based indexing" (different words)

Strengths:

  • Fast exact matches
  • Handles technical terms well
  • Works for rare/specific phrases

Weaknesses:

  • No semantic understanding
  • Requires exact word matches
  • Sensitive to typos

3. Reciprocal Rank Fusion (RRF)

The Problem: How do you combine vector scores (0.85) with BM25 scores (42.7)?

The Solution: Use rank instead of score.

Algorithm:

def rrf_score(rank: int, k: int = 60) -> float:
    """
    Calculate RRF score for a document at given rank.

    Args:
        rank: Position in result list (1-indexed)
        k: Smoothing constant (typically 60)

    Returns:
        Score between 0 and ~0.016 (1/k)
    """
    return 1.0 / (k + rank)

# Example:
# Document appears at rank 3 in vector search → score = 1/(60+3) = 0.0159
# Same document at rank 7 in BM25 search    → score = 1/(60+7) = 0.0149
# Combined RRF score = 0.0159 + 0.0149 = 0.0308

Why it works:

  • Rank-based: Ignores absolute scores (no normalization needed)
  • Symmetric: Treats both searches equally
  • Robust: Top results from either search get high scores

Detailed Implementation: See references/hybrid-search-rrf.md


this project's Hybrid Search Implementation

Database Schema

-- Chunks table with vector and full-text search
CREATE TABLE chunks (
    id UUID PRIMARY KEY,
    document_id UUID REFERENCES documents(id),
    content TEXT NOT NULL,

    -- Vector embedding (1024 dimensions for Voyage AI)
    embedding vector(1024),

    -- Pre-computed tsvector for full-text search
    content_tsvector tsvector GENERATED ALWAYS AS (
        to_tsvector('english', content)
    ) STORED,

    -- Metadata
    section_title TEXT,
    section_path TEXT,
    chunk_index INT,
    content_type TEXT,  -- 'code_block', 'paragraph', 'list', etc.

    -- Timestamps
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW()
);

-- Indexes
CREATE INDEX idx_chunks_embedding ON chunks
    USING hnsw (embedding vector_cosine_ops);  -- Vector search

CREATE INDEX idx_chunks_content_tsvector ON chunks
    USING gin (content_tsvector);  -- Full-text search

CREATE INDEX idx_chunks_document_id ON chunks(document_id);
CREATE INDEX idx_chunks_content_type ON chunks(content_type);

Search Query

# backend/app/db/repositories/chunk_repository.py
from sqlalchemy import select, func, literal
from pgvector.sqlalchemy import Vector

async def hybrid_search(
    query: str,
    query_embedding: list[float],
    top_k: int = 10,
    content_type_filter: list[str] | None = None
) -> list[Chunk]:
    """
    Perform hybrid search using RRF.

    Args:
        query: Search query text
        query_embedding: Query embedding vector
        top_k: Number of results to return
        content_type_filter: Optional filter by content type

    Returns:
        List of chunks ranked by RRF score
    """

    # Fetch multiplier (retrieve more for better RRF)
    FETCH_MULTIPLIER = 3
    fetch_limit = top_k * FETCH_MULTIPLIER  # 30 for top_k=10

    # ===== 1. VECTOR SEARCH =====
    vector_subquery = (
        select(
            Chunk.id,
            (Chunk.embedding.cosine_distance(query_embedding)).label("vector_distance"),
            func.row_number().over(
                order_by=Chunk.embedding.cosine_distance(query_embedding)
            ).label("vector_rank")
        )
        .where(Chunk.embedding.isnot(None))
    )

    # Apply content type filter
    if content_type_filter:
        vector_subquery = vector_subquery.where(
            Chunk.content_type.in_(content_type_filter)
        )

    vector_subquery = vector_subquery.limit(fetch_limit).subquery("vector_results")

    # ===== 2. KEYWORD SEARCH (BM25) =====
    # Generate tsquery
    ts_query = func.plainto_tsquery("english", query)

    keyword_subquery = (
        select(
            Chunk.id,
            func.ts_rank_cd(Chunk.content_tsvector, ts_query).label("bm25_score"),
            func.row_number().over(
                order_by=func.ts_rank_cd(Chunk.content_tsvector, ts_query).desc()
            ).label("keyword_rank")
        )
        .where(Chunk.content_tsvector.op("@@")(ts_query))
    )

    # Apply content type filter
    if content_type_filter:
        keyword_subquery = keyword_subquery.where(
            Chunk.content_type.in_(content_type_filter)
        )

    keyword_subquery = keyword_subquery.limit(fetch_limit).subquery("keyword_results")

    # ===== 3. RECIPROCAL RANK FUSION =====
    K = 60  # RRF smoothing constant

    rrf_query = (
        select(
            func.coalesce(vector_subquery.c.id, keyword_subquery.c.id).label("chunk_id"),
            (
                func.coalesce(1.0 / (K + vector_subquery.c.vector_rank), 0.0) +
                func.coalesce(1.0 / (K + keyword_subquery.c.keyword_rank), 0.0)
            ).label("rrf_score")
        )
        .select_from(
            vector_subquery.outerjoin(
                keyword_subquery,
                vector_subquery.c.id == keyword_subquery.c.id,
                full=True
            )
        )
        .order_by(literal("rrf_score").desc())
        .limit(top_k)
    ).subquery("rrf_results")

    # ===== 4. FETCH FULL CHUNKS =====
    final_query = (
        select(Chunk)
        .join(rrf_query, Chunk.id == rrf_query.c.chunk_id)
        .order_by(rrf_query.c.rrf_score.desc())
    )

    result = await session.execute(final_query)
    chunks = result.scalars().all()

    return chunks

Key Features:

  1. 3x Fetch Multiplier: Retrieve 30 results from each search (better RRF coverage)
  2. Indexed tsvector: Uses content_tsvector column (5-10x faster than to_tsvector() on query)
  3. Full outer join: Includes results from either search (vector OR keyword)
  4. Content type filtering: Optional pre-filter by metadata

Performance Optimizations

1. Pre-Computed tsvector Column

Before (Slow):

-- Computes tsvector on every query (SLOW!)
WHERE to_tsvector('english', content) @@ to_tsquery('database')

After (Fast):

-- Uses pre-computed column with GIN index (FAST!)
WHERE content_tsvector @@ to_tsquery('database')

Speedup: 5-10x faster for keyword search


2. HNSW vs IVFFlat Indexes

IVFFlat (Older):

CREATE INDEX idx_embedding ON chunks
    USING ivfflat (embedding vector_cosine_ops)
    WITH (lists = 100);
  • Faster indexing
  • Slower queries
  • Good for < 100k vectors

HNSW (Recommended):

CREATE INDEX idx_embedding ON chunks
    USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 64);
  • Slower indexing
  • Much faster queries (10-100x)
  • Scales to millions of vectors

this project uses HNSW (415 chunks, room to scale to 10M+).

Detailed Comparison: See references/indexing-strategies.md


3. Metadata Filtering & Boosting

Problem: Some chunks are more valuable than others.

Solution: Boost by metadata:

# Boost section titles (1.5x)
if query_matches_section_title(chunk.section_title, query):
    rrf_score *= 1.5

# Boost document path (1.15x)
if query_matches_path(chunk.section_path, query):
    rrf_score *= 1.15

# Boost code blocks for technical queries (1.2x)
if is_technical_query(query) and chunk.content_type == "code_block":
    rrf_score *= 1.2

this project Evaluation:

  • Before boosting: 91.1% pass rate, 0.647 MRR (Hard queries)
  • After boosting: 91.6% pass rate, 0.686 MRR (Hard queries)
  • Improvement: +0.5% pass rate, +6% MRR

Detailed Implementation: See references/metadata-filtering.md


Common Patterns

Pattern 1: Filtered Search

# Search only code blocks
results = await hybrid_search(
    query="binary search implementation",
    query_embedding=embedding,
    content_type_filter=["code_block"]
)

Pattern 2: Similarity Threshold

# Only return results above similarity threshold
MIN_SIMILARITY = 0.75

results = await hybrid_search(query, embedding, top_k=50)
filtered = [
    r for r in results
    if (1 - r.vector_distance) >= MIN_SIMILARITY
][:10]

Pattern 3: Multi-Query Retrieval

# Generate multiple query variations for better recall
queries = generate_query_variations("machine learning")
# ["machine learning", "ML algorithms", "neural networks"]

all_results = []
for q in queries:
    emb = embed(q)
    results = await hybrid_search(q, emb, top_k=5)
    all_results.extend(results)

# De-duplicate and re-rank
final_results = rerank_by_rrf(all_results, top_k=10)

Testing Hybrid Search

Golden Dataset Evaluation

# backend/tests/integration/test_hybrid_search.py
import pytest
from app.db.repositories.chunk_repository import hybrid_search
from app.shared.services.embeddings import embed_text

@pytest.mark.asyncio
async def test_hybrid_search_golden_dataset():
    """Test hybrid search against golden queries."""

    golden_queries = load_golden_queries()  # 98 queries

    results = []
    for query_data in golden_queries:
        query = query_data["query"]
        expected_chunks = query_data["expected_chunk_ids"]

        # Perform search
        embedding = await embed_text(query)
        retrieved = await hybrid_search(query, embedding, top_k=10)
        retrieved_ids = {c.id for c in retrieved}

        # Check if expected chunks are in top 10
        found = len(expected_chunks & retrieved_ids)
        results.append({
            "query": query,
            "expected": len(expected_chunks),
            "found": found,
            "pass": found == len(expected_chunks)
        })

    # Calculate metrics
    pass_rate = sum(r["pass"] for r in results) / len(results)
    mrr = calculate_mrr(results)

    print(f"Pass Rate: {pass_rate:.1%}")
    print(f"MRR: {mrr:.3f}")

    assert pass_rate >= 0.90, f"Pass rate {pass_rate:.1%} below 90% threshold"

References

PGVector Documentation

this project Implementation

  • backend/app/db/repositories/chunk_repository.py - Hybrid search implementation
  • backend/app/shared/services/search/search_service.py - Search service layer
  • backend/app/core/constants.py - Search constants (fetch multiplier, boosting factors)

Related Skills

  • ai-native-development - Embeddings and vector concepts
  • database-schema-designer - Schema design for vector search
  • performance-optimization - Query optimization strategies

Version: 1.0.0 (December 2025) Status: Production-ready patterns from this project's 415-chunk golden dataset