name	pgvector-search
description	Production hybrid search with PGVector 0.7+ and BM25 using Reciprocal Rank Fusion, metadata filtering, and HNSW indexing for semantic retrieval
version	2.0.0
author	YG Starter Template
tags	pgvector, hybrid-search, bm25, rrf, semantic-search, retrieval, 2025

PGVector Hybrid Search

Production-grade semantic + keyword search using PostgreSQL 16+ and PGVector 0.7+

Overview

Hybrid search combines semantic similarity (vector embeddings) with keyword matching (BM25) to achieve better retrieval than either alone.

Real-World Use Cases:

Technical Documentation Search: Hybrid retrieval over API docs, tutorials, guides
Code Search Engine: Find similar functions/classes across repositories
E-commerce Product Discovery: Search by image + text attributes
Legal Document Retrieval: Citation-aware semantic search
Medical Literature Search: Semantic search over research papers with MeSH terms

Architecture:

Query
  ↓
[Generate embedding] → Vector Search (PGVector) → Top 30 results
  ↓
[Generate ts_query]  → Keyword Search (BM25)    → Top 30 results
  ↓
[Reciprocal Rank Fusion (RRF)] → Merge & re-rank → Top 10 final results

When to use this skill:

Building semantic search (RAG, knowledge bases, recommendations)
Implementing hybrid retrieval (vector + keyword)
Optimizing PGVector performance
Working with large document collections (1M+ chunks)

Core Concepts

1. Semantic Search (Vector Similarity)

How it works:

Embed query: "database indexing strategies" → [0.23, -0.15, ..., 0.42] (1024 dims)
Find nearest neighbors: ORDER BY embedding <=> query_embedding LIMIT 30
Returns: Conceptually similar documents (even with different words)

Example:

Query: "machine learning model training"
Matches: "neural network optimization", "deep learning techniques"
Misses: "ML model training" (different embeddings despite similar meaning)

Strengths:

Captures semantic meaning
Works across languages
Handles synonyms ("car" matches "automobile")

Weaknesses:

Slow for exact keyword matches
Sensitive to embedding quality
Doesn't handle rare technical terms well

2. Keyword Search (BM25)

How it works:

Tokenize query: "database indexing" → database & indexing
Full-text search: WHERE content_tsvector @@ to_tsquery('database & indexing')
Rank by BM25 score (TF-IDF + document length normalization)

Example:

Query: "PostgreSQL B-tree index"
Matches: Documents with exact phrase "PostgreSQL B-tree index"
Misses: "Postgres tree-based indexing" (different words)

Strengths:

Fast exact matches
Handles technical terms well
Works for rare/specific phrases

Weaknesses:

No semantic understanding
Requires exact word matches
Sensitive to typos

3. Reciprocal Rank Fusion (RRF)

The Problem: How do you combine vector scores (0.85) with BM25 scores (42.7)?

The Solution: Use rank instead of score.

Algorithm:

def rrf_score(rank: int, k: int = 60) -> float:
    """
    Calculate RRF score for a document at given rank.

    Args:
        rank: Position in result list (1-indexed)
        k: Smoothing constant (typically 60)

    Returns:
        Score between 0 and ~0.016 (1/k)
    """
    return 1.0 / (k + rank)

# Example:
# Document appears at rank 3 in vector search → score = 1/(60+3) = 0.0159
# Same document at rank 7 in BM25 search    → score = 1/(60+7) = 0.0149
# Combined RRF score = 0.0159 + 0.0149 = 0.0308

Why it works:

Rank-based: Ignores absolute scores (no normalization needed)
Symmetric: Treats both searches equally
Robust: Top results from either search get high scores

Detailed Implementation: See references/hybrid-search-rrf.md

this project's Hybrid Search Implementation

Database Schema

-- Chunks table with vector and full-text search
CREATE TABLE chunks (
    id UUID PRIMARY KEY,
    document_id UUID REFERENCES documents(id),
    content TEXT NOT NULL,

    -- Vector embedding (1024 dimensions for Voyage AI)
    embedding vector(1024),

    -- Pre-computed tsvector for full-text search
    content_tsvector tsvector GENERATED ALWAYS AS (
        to_tsvector('english', content)
    ) STORED,

    -- Metadata
    section_title TEXT,
    section_path TEXT,
    chunk_index INT,
    content_type TEXT,  -- 'code_block', 'paragraph', 'list', etc.

    -- Timestamps
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW()
);

-- Indexes
CREATE INDEX idx_chunks_embedding ON chunks
    USING hnsw (embedding vector_cosine_ops);  -- Vector search

CREATE INDEX idx_chunks_content_tsvector ON chunks
    USING gin (content_tsvector);  -- Full-text search

CREATE INDEX idx_chunks_document_id ON chunks(document_id);
CREATE INDEX idx_chunks_content_type ON chunks(content_type);

Search Query

# backend/app/db/repositories/chunk_repository.py
from sqlalchemy import select, func, literal
from pgvector.sqlalchemy import Vector

async def hybrid_search(
    query: str,
    query_embedding: list[float],
    top_k: int = 10,
    content_type_filter: list[str] | None = None
) -> list[Chunk]:
    """
    Perform hybrid search using RRF.

    Args:
        query: Search query text
        query_embedding: Query embedding vector
        top_k: Number of results to return
        content_type_filter: Optional filter by content type

    Returns:
        List of chunks ranked by RRF score
    """

    # Fetch multiplier (retrieve more for better RRF)
    FETCH_MULTIPLIER = 3
    fetch_limit = top_k * FETCH_MULTIPLIER  # 30 for top_k=10

    # ===== 1. VECTOR SEARCH =====
    vector_subquery = (
        select(
            Chunk.id,
            (Chunk.embedding.cosine_distance(query_embedding)).label("vector_distance"),
            func.row_number().over(
                order_by=Chunk.embedding.cosine_distance(query_embedding)
            ).label("vector_rank")
        )
        .where(Chunk.embedding.isnot(None))
    )

    # Apply content type filter
    if content_type_filter:
        vector_subquery = vector_subquery.where(
            Chunk.content_type.in_(content_type_filter)
        )

    vector_subquery = vector_subquery.limit(fetch_limit).subquery("vector_results")

    # ===== 2. KEYWORD SEARCH (BM25) =====
    # Generate tsquery
    ts_query = func.plainto_tsquery("english", query)

    keyword_subquery = (
        select(
            Chunk.id,
            func.ts_rank_cd(Chunk.content_tsvector, ts_query).label("bm25_score"),
            func.row_number().over(
                order_by=func.ts_rank_cd(Chunk.content_tsvector, ts_query).desc()
            ).label("keyword_rank")
        )
        .where(Chunk.content_tsvector.op("@@")(ts_query))
    )

    # Apply content type filter
    if content_type_filter:
        keyword_subquery = keyword_subquery.where(
            Chunk.content_type.in_(content_type_filter)
        )

    keyword_subquery = keyword_subquery.limit(fetch_limit).subquery("keyword_results")

    # ===== 3. RECIPROCAL RANK FUSION =====
    K = 60  # RRF smoothing constant

    rrf_query = (
        select(
            func.coalesce(vector_subquery.c.id, keyword_subquery.c.id).label("chunk_id"),
            (
                func.coalesce(1.0 / (K + vector_subquery.c.vector_rank), 0.0) +
                func.coalesce(1.0 / (K + keyword_subquery.c.keyword_rank), 0.0)
            ).label("rrf_score")
        )
        .select_from(
            vector_subquery.outerjoin(
                keyword_subquery,
                vector_subquery.c.id == keyword_subquery.c.id,
                full=True
            )
        )
        .order_by(literal("rrf_score").desc())
        .limit(top_k)
    ).subquery("rrf_results")

    # ===== 4. FETCH FULL CHUNKS =====
    final_query = (
        select(Chunk)
        .join(rrf_query, Chunk.id == rrf_query.c.chunk_id)
        .order_by(rrf_query.c.rrf_score.desc())
    )

    result = await session.execute(final_query)
    chunks = result.scalars().all()

    return chunks

Key Features:

3x Fetch Multiplier: Retrieve 30 results from each search (better RRF coverage)
Indexed tsvector: Uses content_tsvector column (5-10x faster than to_tsvector() on query)
Full outer join: Includes results from either search (vector OR keyword)
Content type filtering: Optional pre-filter by metadata

Performance Optimizations

1. Pre-Computed `tsvector` Column

Before (Slow):

-- Computes tsvector on every query (SLOW!)
WHERE to_tsvector('english', content) @@ to_tsquery('database')

After (Fast):

-- Uses pre-computed column with GIN index (FAST!)
WHERE content_tsvector @@ to_tsquery('database')

Speedup: 5-10x faster for keyword search

2. HNSW vs IVFFlat Indexes

IVFFlat (Older):

CREATE INDEX idx_embedding ON chunks
    USING ivfflat (embedding vector_cosine_ops)
    WITH (lists = 100);

Faster indexing
Slower queries
Good for < 100k vectors

HNSW (Recommended):

CREATE INDEX idx_embedding ON chunks
    USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 64);

Slower indexing
Much faster queries (10-100x)
Scales to millions of vectors

this project uses HNSW (415 chunks, room to scale to 10M+).

Detailed Comparison: See references/indexing-strategies.md

3. Metadata Filtering & Boosting

Problem: Some chunks are more valuable than others.

Solution: Boost by metadata:

# Boost section titles (1.5x)
if query_matches_section_title(chunk.section_title, query):
    rrf_score *= 1.5

# Boost document path (1.15x)
if query_matches_path(chunk.section_path, query):
    rrf_score *= 1.15

# Boost code blocks for technical queries (1.2x)
if is_technical_query(query) and chunk.content_type == "code_block":
    rrf_score *= 1.2

this project Evaluation:

Before boosting: 91.1% pass rate, 0.647 MRR (Hard queries)
After boosting: 91.6% pass rate, 0.686 MRR (Hard queries)
Improvement: +0.5% pass rate, +6% MRR

Detailed Implementation: See references/metadata-filtering.md

Common Patterns

Pattern 1: Filtered Search

# Search only code blocks
results = await hybrid_search(
    query="binary search implementation",
    query_embedding=embedding,
    content_type_filter=["code_block"]
)

Pattern 2: Similarity Threshold

# Only return results above similarity threshold
MIN_SIMILARITY = 0.75

results = await hybrid_search(query, embedding, top_k=50)
filtered = [
    r for r in results
    if (1 - r.vector_distance) >= MIN_SIMILARITY
][:10]

Pattern 3: Multi-Query Retrieval

# Generate multiple query variations for better recall
queries = generate_query_variations("machine learning")
# ["machine learning", "ML algorithms", "neural networks"]

all_results = []
for q in queries:
    emb = embed(q)
    results = await hybrid_search(q, emb, top_k=5)
    all_results.extend(results)

# De-duplicate and re-rank
final_results = rerank_by_rrf(all_results, top_k=10)

Testing Hybrid Search

Golden Dataset Evaluation

# backend/tests/integration/test_hybrid_search.py
import pytest
from app.db.repositories.chunk_repository import hybrid_search
from app.shared.services.embeddings import embed_text

@pytest.mark.asyncio
async def test_hybrid_search_golden_dataset():
    """Test hybrid search against golden queries."""

    golden_queries = load_golden_queries()  # 98 queries

    results = []
    for query_data in golden_queries:
        query = query_data["query"]
        expected_chunks = query_data["expected_chunk_ids"]

        # Perform search
        embedding = await embed_text(query)
        retrieved = await hybrid_search(query, embedding, top_k=10)
        retrieved_ids = {c.id for c in retrieved}

        # Check if expected chunks are in top 10
        found = len(expected_chunks & retrieved_ids)
        results.append({
            "query": query,
            "expected": len(expected_chunks),
            "found": found,
            "pass": found == len(expected_chunks)
        })

    # Calculate metrics
    pass_rate = sum(r["pass"] for r in results) / len(results)
    mrr = calculate_mrr(results)

    print(f"Pass Rate: {pass_rate:.1%}")
    print(f"MRR: {mrr:.3f}")

    assert pass_rate >= 0.90, f"Pass rate {pass_rate:.1%} below 90% threshold"

References

PGVector Documentation

this project Implementation

backend/app/db/repositories/chunk_repository.py - Hybrid search implementation
backend/app/shared/services/search/search_service.py - Search service layer
backend/app/core/constants.py - Search constants (fetch multiplier, boosting factors)

Related Skills

ai-native-development - Embeddings and vector concepts
database-schema-designer - Schema design for vector search
performance-optimization - Query optimization strategies

Version: 1.0.0 (December 2025) Status: Production-ready patterns from this project's 415-chunk golden dataset

pgvector-search

Install Skill

SKILL.md

PGVector Hybrid Search

Overview

Core Concepts

1. Semantic Search (Vector Similarity)

2. Keyword Search (BM25)

3. Reciprocal Rank Fusion (RRF)

this project's Hybrid Search Implementation

Database Schema

Search Query

Performance Optimizations

1. Pre-Computed `tsvector` Column

2. HNSW vs IVFFlat Indexes

3. Metadata Filtering & Boosting

Common Patterns

Pattern 1: Filtered Search

Pattern 2: Similarity Threshold

Pattern 3: Multi-Query Retrieval

Testing Hybrid Search

Golden Dataset Evaluation

References

PGVector Documentation

this project Implementation

Related Skills

Install Skill

SKILL.md

PGVector Hybrid Search

Overview

Core Concepts

1. Semantic Search (Vector Similarity)

2. Keyword Search (BM25)

3. Reciprocal Rank Fusion (RRF)

this project's Hybrid Search Implementation

Database Schema

Search Query

Performance Optimizations

1. Pre-Computed tsvector Column

2. HNSW vs IVFFlat Indexes

3. Metadata Filtering & Boosting

Common Patterns

Pattern 1: Filtered Search

Pattern 2: Similarity Threshold

Pattern 3: Multi-Query Retrieval

Testing Hybrid Search

Golden Dataset Evaluation

References

PGVector Documentation

this project Implementation

Related Skills

1. Pre-Computed `tsvector` Column