Claude Code Plugins

Community-maintained marketplace

Feedback

hybrid-search

@juanre/llmemory
0
0

Use when building search systems that need both semantic similarity and keyword matching - covers combining vector and BM25 search with Reciprocal Rank Fusion, alpha tuning for search weight control, and optimizing retrieval quality

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name hybrid-search
description Use when building search systems that need both semantic similarity and keyword matching - covers combining vector and BM25 search with Reciprocal Rank Fusion, alpha tuning for search weight control, and optimizing retrieval quality
version 1.0.0

LLMemory Hybrid Search

Installation

uv add llmemory
# or
pip install llmemory

Overview

Hybrid search combines vector similarity search (semantic understanding) with full-text search (keyword matching) to deliver superior retrieval quality. Results are merged using Reciprocal Rank Fusion (RRF) to create a unified ranking.

When to use hybrid search:

  • Need both semantic similarity AND exact keyword matches
  • Queries contain specific terms, names, or technical jargon
  • Want best-of-both-worlds retrieval quality (recommended default)

When to use vector-only search:

  • Purely semantic/conceptual queries
  • Cross-lingual search
  • Queries with synonyms or paraphrasing

When to use text-only search:

  • Exact keyword/phrase matching required
  • Search in structured data or code
  • When embeddings are not available

Quick Start

from llmemory import LLMemory, SearchType

async with LLMemory(connection_string="postgresql://localhost/mydb") as memory:
    # Hybrid search (default, recommended)
    results = await memory.search(
        owner_id="workspace-1",
        query_text="machine learning algorithms",
        search_type=SearchType.HYBRID,
        limit=10,
        alpha=0.5  # Equal weight to vector and text
    )

    for result in results:
        print(f"[RRF={result.rrf_score:.3f}] {result.content[:80]}...")

Complete API Documentation

SearchType Enum

class SearchType(str, Enum):
    VECTOR = "vector"   # Vector similarity only
    TEXT = "text"       # Full-text search only
    HYBRID = "hybrid"   # Combines vector + text (recommended)

search() - Hybrid Mode

Signature:

async def search(
    owner_id: str,
    query_text: str,
    search_type: Union[SearchType, str] = SearchType.HYBRID,
    limit: int = 10,
    alpha: float = 0.5,
    metadata_filter: Optional[Dict[str, Any]] = None,
    id_at_origins: Optional[List[str]] = None,
    date_from: Optional[datetime] = None,
    date_to: Optional[datetime] = None,
    include_parent_context: bool = False,
    context_window: int = 2
) -> List[SearchResult]

Hybrid Search Parameters:

  • search_type (SearchType, default: HYBRID): Set to SearchType.HYBRID for hybrid search
  • alpha (float, default: 0.5): Weight for vector vs text search
    • 0.0 = text search only
    • 0.5 = equal weight (balanced, recommended)
    • 1.0 = vector search only
    • 0.3 = favor text search (good for keyword-heavy queries)
    • 0.7 = favor vector search (good for semantic queries)

Returns:

  • List[SearchResult] with hybrid-specific fields:
    • rrf_score (float): Reciprocal Rank Fusion score (primary ranking)
    • similarity (float): Vector similarity score (0-1)
    • text_rank (float): Full-text search rank
    • score (float): Overall score (equals rrf_score for hybrid)

Example:

# Balanced hybrid search
results = await memory.search(
    owner_id="workspace-1",
    query_text="quarterly revenue growth",
    search_type=SearchType.HYBRID,
    alpha=0.5,  # Equal weight
    limit=20
)

for result in results:
    print(f"RRF Score: {result.rrf_score:.3f}")
    print(f"Vector Similarity: {result.similarity:.3f}")
    print(f"Text Rank: {result.text_rank:.3f}")
    print(f"Content: {result.content[:100]}...")
    print("---")

Understanding Alpha Parameter

The alpha parameter controls the balance between vector and text search in hybrid mode.

Alpha Values Guide

# Text-heavy (alpha = 0.0 to 0.3)
# Use when: Query has specific keywords, names, or technical terms
results = await memory.search(
    owner_id="workspace-1",
    query_text="Python asyncio gather timeout",
    search_type=SearchType.HYBRID,
    alpha=0.3  # Favor keyword matching
)

# Balanced (alpha = 0.4 to 0.6)
# Use when: General queries, uncertain which is better
results = await memory.search(
    owner_id="workspace-1",
    query_text="customer retention strategies",
    search_type=SearchType.HYBRID,
    alpha=0.5  # Equal weight (recommended default)
)

# Semantic-heavy (alpha = 0.7 to 1.0)
# Use when: Conceptual queries, synonyms, paraphrasing
results = await memory.search(
    owner_id="workspace-1",
    query_text="ways to keep customers happy",
    search_type=SearchType.HYBRID,
    alpha=0.7  # Favor semantic similarity
)

Choosing Alpha for Different Query Types

Query Type Example Recommended Alpha Reasoning
Specific keywords "PostgreSQL CONNECTION_LIMIT error" 0.2-0.3 Need exact keyword matches
Product/person names "iPhone 15 Pro specifications" 0.3-0.4 Names matter more than semantics
Technical jargon "SOLID principles dependency injection" 0.4-0.5 Balance needed
General concepts "improve team collaboration" 0.5-0.6 Balanced approach
Semantic queries "how to motivate employees" 0.6-0.7 Semantic understanding key
Paraphrased questions "what are good ways to retain staff" 0.7-0.8 Vector search excels

Reciprocal Rank Fusion (RRF)

Hybrid search uses RRF to merge vector and text search results into a unified ranking.

How RRF Works

k = 60  # RRF constant (prevents early results from dominating)

# Initialize score accumulator for each chunk
rrf_scores = {}

# Process vector search results
for rank, result in enumerate(vector_results):
    chunk_id = result["chunk_id"]
    vector_contribution = alpha / (k + rank + 1)
    rrf_scores[chunk_id] = rrf_scores.get(chunk_id, 0) + vector_contribution

# Process text search results
for rank, result in enumerate(text_results):
    chunk_id = result["chunk_id"]
    text_contribution = (1 - alpha) / (k + rank + 1)
    rrf_scores[chunk_id] = rrf_scores.get(chunk_id, 0) + text_contribution

# Sort by accumulated RRF score descending
sorted_results = sorted(rrf_scores.items(), key=lambda x: x[1], reverse=True)

Key points:

  • Alpha is inside the division: alpha / (k + rank + 1), not multiplied afterward
  • Rank is 1-indexed: rank + 1 where rank starts at 0
  • Chunks appearing in both result lists get contributions from both
  • k = 50 by default (configurable via SearchConfig.rrf_k)

RRF Benefits

  1. Handles different score scales: Vector similarities (0-1) and text ranks (varying) are normalized
  2. Position-based fusion: Emphasizes consensus across search methods
  3. Robust to score outliers: Single high score doesn't dominate
  4. Tunable with alpha: Control the balance between search methods

Example: RRF in Action

results = await memory.search(
    owner_id="workspace-1",
    query_text="machine learning neural networks",
    search_type=SearchType.HYBRID,
    alpha=0.5,
    limit=5
)

for i, result in enumerate(results, 1):
    print(f"Result #{i}")
    print(f"  RRF Score: {result.rrf_score:.4f}")
    print(f"  Vector Sim: {result.similarity:.4f} (semantic match)")
    print(f"  Text Rank: {result.text_rank:.4f} (keyword match)")
    print(f"  Content: {result.content[:80]}...")
    print()

# Output shows how RRF balances both signals:
# Result #1
#   RRF Score: 0.0245  (highest combined score)
#   Vector Sim: 0.85   (very semantically similar)
#   Text Rank: 12.5    (good keyword match)
#   Content: Deep learning uses neural networks with multiple layers...

Configuring Hybrid Search with SearchConfig

LLMemory's SearchConfig provides fine-grained control over hybrid search behavior, including HNSW vector index parameters and RRF fusion settings. You can configure these settings via environment variables or programmatically through LLMemoryConfig.

HNSW Index Configuration

The HNSW (Hierarchical Navigable Small World) index powers fast approximate nearest neighbor vector search. LLMemory provides three preset profiles and supports custom configuration.

HNSW Parameters

  • hnsw_m (int, default: 16): Number of bi-directional links per node

    • Higher values = better recall, larger index, slower construction
    • Range: 8-64, typical values: 8 (fast), 16 (balanced), 32 (accurate)
  • hnsw_ef_construction (int, default: 200): Size of dynamic candidate list during index construction

    • Higher values = better index quality, slower construction
    • Range: 100-1000, typical values: 80 (fast), 200 (balanced), 400 (accurate)
  • hnsw_ef_search (int, default: 100): Size of dynamic candidate list during search

    • Higher values = better recall, slower search
    • Range: 40-500, typical values: 40 (fast), 100 (balanced), 200 (accurate)

HNSW Presets

LLMemory includes three built-in presets for common use cases:

HNSW_PRESETS = {
    "fast": {
        "m": 8,
        "ef_construction": 80,
        "ef_search": 40
    },
    "balanced": {
        "m": 16,
        "ef_construction": 200,
        "ef_search": 100
    },
    "accurate": {
        "m": 32,
        "ef_construction": 400,
        "ef_search": 200
    }
}

Preset Recommendations:

  • fast: Latency-critical applications (40-60ms search, ~95% recall)
  • balanced: General-purpose use (80-120ms search, ~98% recall) - Default
  • accurate: High-precision requirements (150-250ms search, ~99.5% recall)

Using HNSW Presets via Environment Variable

Set the LLMEMORY_HNSW_PROFILE environment variable to use a preset:

# Use fast profile for low-latency applications
export LLMEMORY_HNSW_PROFILE=fast

# Use accurate profile for high-precision requirements
export LLMEMORY_HNSW_PROFILE=accurate

# Use balanced profile (default, can be omitted)
export LLMEMORY_HNSW_PROFILE=balanced

Then initialize LLMemory normally - the preset will be applied automatically:

from llmemory import LLMemory, SearchType

# Automatically uses HNSW preset from environment
async with LLMemory(connection_string="postgresql://localhost/mydb") as memory:
    results = await memory.search(
        owner_id="workspace-1",
        query_text="machine learning",
        search_type=SearchType.HYBRID,
        limit=10
    )

Programmatic HNSW Configuration

For more control, configure HNSW parameters programmatically:

from llmemory import LLMemory, SearchType
from llmemory.config import LLMemoryConfig

# Create custom configuration
config = LLMemoryConfig()

# Configure search parameters
config.search.hnsw_ef_search = 150  # Higher search accuracy

# Configure database/index parameters
config.database.hnsw_m = 24
config.database.hnsw_ef_construction = 300

# Initialize with custom config
async with LLMemory(
    connection_string="postgresql://localhost/mydb",
    config=config
) as memory:
    results = await memory.search(
        owner_id="workspace-1",
        query_text="neural networks",
        search_type=SearchType.HYBRID,
        limit=10
    )

Note: Index construction parameters (hnsw_m, hnsw_ef_construction) only affect new indexes. To apply them to an existing index, you must recreate the index:

-- Recreate HNSW index with new parameters
DROP INDEX IF EXISTS llmemory.document_chunks_embedding_hnsw;
CREATE INDEX document_chunks_embedding_hnsw
ON llmemory.document_chunks
USING hnsw (embedding vector_cosine_ops)
WITH (m = 24, ef_construction = 300);

RRF Configuration

The rrf_k parameter controls the Reciprocal Rank Fusion constant used to merge vector and text search results.

RRF Parameter

  • rrf_k (int, default: 50): RRF constant that controls rank position sensitivity
    • Higher values = less weight on top positions, more democratic fusion
    • Lower values = more weight on top positions, favors high-ranking results
    • Range: 10-100, typical values: 30 (aggressive), 50 (balanced), 70 (democratic)

How rrf_k affects fusion:

# For a chunk at rank position r (0-indexed):
rrf_score_contribution = alpha / (rrf_k + r + 1)

# Example with rrf_k=50:
# Rank 0: 1.0 / (50 + 0 + 1) = 0.0196
# Rank 1: 1.0 / (50 + 1 + 1) = 0.0192
# Rank 10: 1.0 / (50 + 10 + 1) = 0.0164

# Example with rrf_k=20 (favors top results):
# Rank 0: 1.0 / (20 + 0 + 1) = 0.0476
# Rank 1: 1.0 / (20 + 1 + 1) = 0.0455
# Rank 10: 1.0 / (20 + 10 + 1) = 0.0323

# Example with rrf_k=80 (more democratic):
# Rank 0: 1.0 / (80 + 0 + 1) = 0.0123
# Rank 1: 1.0 / (80 + 1 + 1) = 0.0122
# Rank 10: 1.0 / (80 + 10 + 1) = 0.0110

Configuring RRF via Environment Variable

# Lower k favors top-ranked results
export LLMEMORY_RRF_K=30

# Higher k gives more weight to mid-ranked results
export LLMEMORY_RRF_K=70

# Default balanced setting
export LLMEMORY_RRF_K=50

Note: Currently, rrf_k is not directly exposed via environment variable. To configure it, use programmatic configuration:

from llmemory import LLMemory
from llmemory.config import LLMemoryConfig

config = LLMemoryConfig()
config.search.rrf_k = 30  # Favor top-ranked results

async with LLMemory(
    connection_string="postgresql://localhost/mydb",
    config=config
) as memory:
    results = await memory.search(
        owner_id="workspace-1",
        query_text="search query",
        search_type=SearchType.HYBRID,
        limit=10
    )

Complete Configuration Example

Here's a complete example showing both environment variable and programmatic configuration:

import os
from llmemory import LLMemory, SearchType
from llmemory.config import LLMemoryConfig

# Option 1: Environment variable configuration
os.environ["LLMEMORY_HNSW_PROFILE"] = "accurate"
# HNSW will use: m=32, ef_construction=400, ef_search=200

async with LLMemory(connection_string="postgresql://localhost/mydb") as memory:
    results = await memory.search(
        owner_id="workspace-1",
        query_text="deep learning transformers",
        search_type=SearchType.HYBRID,
        alpha=0.6,
        limit=15
    )

# Option 2: Programmatic configuration with fine-tuning
config = LLMemoryConfig()

# HNSW search configuration
config.search.hnsw_ef_search = 150  # Higher accuracy than default

# HNSW index construction (for new indexes)
config.database.hnsw_m = 20
config.database.hnsw_ef_construction = 250

# RRF configuration
config.search.rrf_k = 40  # Favor top-ranked results slightly

# Other search settings
config.search.default_limit = 20
config.search.default_search_type = "hybrid"

async with LLMemory(
    connection_string="postgresql://localhost/mydb",
    config=config
) as memory:
    # Search with custom configuration
    results = await memory.search(
        owner_id="workspace-1",
        query_text="neural network architectures",
        search_type=SearchType.HYBRID,
        alpha=0.5,
        limit=20
    )

    for result in results:
        print(f"RRF: {result.rrf_score:.4f} | "
              f"Vector: {result.similarity:.4f} | "
              f"Text: {result.text_rank:.4f}")
        print(f"  {result.content[:80]}...")

Configuration Performance Impact

Different HNSW settings have measurable performance impacts:

Profile Index Size (100k docs) Construction Time Search Latency Recall
fast 150 MB 5 min 40-60ms ~95%
balanced 250 MB 12 min 80-120ms ~98%
accurate 450 MB 30 min 150-250ms ~99.5%

Tuning Guidelines:

  1. Start with balanced (default) for most applications
  2. Use fast if:
    • Search latency must be under 100ms
    • Recall around 95% is acceptable
    • Index size is a constraint
  3. Use accurate if:
    • High precision is critical (medical, legal, financial)
    • Search latency under 300ms is acceptable
    • Maximum recall is required
  4. Custom tune if:
    • You have specific latency/recall requirements
    • You've measured performance with your data
    • You're optimizing for your embedding model

Search Type Comparison

Vector Search Only

# Pure semantic similarity
results = await memory.search(
    owner_id="workspace-1",
    query_text="artificial intelligence",
    search_type=SearchType.VECTOR,
    limit=10
)

# Good for:
# - "AI" matching "machine learning" (synonym)
# - "dog" matching "puppy" (semantic)
# - Cross-lingual search
#
# Weak for:
# - Specific keywords ("PostgreSQL 14.2")
# - Exact phrases ("return on investment")
# - Technical terms ("ValueError exception")

Text Search Only

# Pure keyword matching
results = await memory.search(
    owner_id="workspace-1",
    query_text="PostgreSQL CONNECTION_LIMIT",
    search_type=SearchType.TEXT,
    limit=10
)

# Good for:
# - Exact keyword matches
# - Technical error messages
# - Code search
# - Structured data
#
# Weak for:
# - Synonyms ("automobile" vs "car")
# - Paraphrasing
# - Conceptual queries

Hybrid Search (Recommended)

# Combines both vector and text
results = await memory.search(
    owner_id="workspace-1",
    query_text="reduce server response time",
    search_type=SearchType.HYBRID,
    alpha=0.5,
    limit=10
)

# Strengths:
# - Finds semantically similar content ("optimize latency")
# - Also finds exact keywords ("response time")
# - Best overall retrieval quality
# - Robust to different query styles
#
# Use cases:
# - General-purpose search (recommended default)
# - Unknown query patterns
# - Mixed keyword + semantic needs

Practical Examples

E-commerce Product Search

# Product search benefits from hybrid
# - Vector: Understands "laptop for programming"
# - Text: Matches exact model numbers "MacBook Pro M3"

results = await memory.search(
    owner_id="store-1",
    query_text="fast laptop for developers",
    search_type=SearchType.HYBRID,
    alpha=0.6,  # Favor semantic understanding
    metadata_filter={"category": "computers"},
    limit=20
)

Technical Documentation Search

# Documentation needs both semantic and exact matches
# - Vector: Finds conceptually related docs
# - Text: Finds exact function/class names

results = await memory.search(
    owner_id="docs-site",
    query_text="authenticate users with OAuth2",
    search_type=SearchType.HYBRID,
    alpha=0.4,  # Slight favor to keywords ("OAuth2")
    metadata_filter={"doc_type": "api_reference"},
    limit=15
)

Customer Support Search

# Support tickets need semantic understanding
# - Vector: Matches similar issues ("can't log in" = "login failed")
# - Text: Matches error codes, product names

results = await memory.search(
    owner_id="support-team",
    query_text="error code 500 payment processing",
    search_type=SearchType.HYBRID,
    alpha=0.3,  # Favor exact error codes
    metadata_filter={"status": "resolved"},
    limit=10
)

Research Paper Search

# Academic search benefits from semantic understanding
# - Vector: Finds related concepts and methods
# - Text: Finds exact citations, author names

results = await memory.search(
    owner_id="research-db",
    query_text="transformer attention mechanism",
    search_type=SearchType.HYBRID,
    alpha=0.7,  # Favor semantic similarity
    date_from=datetime(2020, 1, 1),  # Recent papers
    limit=25
)

Performance Optimization

Hybrid Search Performance

Hybrid search runs vector and text searches in parallel for optimal performance:

# Both searches execute concurrently
# Total time ≈ max(vector_time, text_time) + rrf_fusion_time
# Typically: 50-150ms for hybrid search

import time

start = time.time()
results = await memory.search(
    owner_id="workspace-1",
    query_text="customer retention",
    search_type=SearchType.HYBRID,
    limit=20
)
elapsed = (time.time() - start) * 1000
print(f"Search completed in {elapsed:.2f}ms")

Tuning for Speed vs Quality

# Faster hybrid search (fewer candidates)
results = await memory.search(
    owner_id="workspace-1",
    query_text="query text",
    search_type=SearchType.HYBRID,
    limit=10,  # Lower limit = faster
    alpha=0.5
)

# Higher quality hybrid search (more candidates considered)
# Note: Uses internal candidate multiplier (typically limit * 2)
results = await memory.search(
    owner_id="workspace-1",
    query_text="query text",
    search_type=SearchType.HYBRID,
    limit=20,  # Higher limit for better recall
    alpha=0.5
)

Advanced Filtering with Hybrid Search

# Combine hybrid search with metadata filters
results = await memory.search(
    owner_id="workspace-1",
    query_text="financial performance analysis",
    search_type=SearchType.HYBRID,
    alpha=0.5,
    metadata_filter={
        "department": "finance",
        "year": 2024,
        "confidential": False
    },
    date_from=datetime(2024, 1, 1),
    date_to=datetime(2024, 12, 31),
    limit=15
)

# Hybrid search finds:
# - Vector: Similar financial concepts
# - Text: Exact keyword "performance analysis"
# - Both filtered by metadata and date range

Common Mistakes

Wrong: Always using default alpha=0.5

# This works but may not be optimal
results = await memory.search(
    owner_id="workspace-1",
    query_text="iPhone 14 Pro specs",  # Specific product name
    search_type=SearchType.HYBRID,
    alpha=0.5  # Equal weight not ideal here
)

Right: Tune alpha for query type

# Product names and specific terms favor text search
results = await memory.search(
    owner_id="workspace-1",
    query_text="iPhone 14 Pro specs",
    search_type=SearchType.HYBRID,
    alpha=0.3  # Favor exact keyword matching
)

Wrong: Using VECTOR for exact keyword matching

results = await memory.search(
    owner_id="workspace-1",
    query_text="ERROR CODE 404",
    search_type=SearchType.VECTOR  # Won't find exact "404"
)

Right: Use HYBRID or TEXT for exact keywords

results = await memory.search(
    owner_id="workspace-1",
    query_text="ERROR CODE 404",
    search_type=SearchType.HYBRID,
    alpha=0.2  # Heavily favor exact keywords
)

Wrong: Using TEXT for conceptual queries

results = await memory.search(
    owner_id="workspace-1",
    query_text="how to improve customer satisfaction",
    search_type=SearchType.TEXT  # Misses semantic matches
)

Right: Use HYBRID for conceptual queries

results = await memory.search(
    owner_id="workspace-1",
    query_text="how to improve customer satisfaction",
    search_type=SearchType.HYBRID,
    alpha=0.7  # Favor semantic understanding
)

Alpha Tuning Strategies

A/B Testing Different Alpha Values

# Test different alpha values to find optimal setting
query = "product launch strategy roadmap"
alpha_values = [0.3, 0.5, 0.7]

for alpha in alpha_values:
    results = await memory.search(
        owner_id="workspace-1",
        query_text=query,
        search_type=SearchType.HYBRID,
        alpha=alpha,
        limit=10
    )

    print(f"\nAlpha = {alpha}")
    for i, result in enumerate(results[:3], 1):
        print(f"  #{i}: {result.content[:60]}... (RRF={result.rrf_score:.4f})")
    # Compare results quality and adjust

Dynamic Alpha Based on Query Analysis

def calculate_alpha(query_text: str) -> float:
    """Dynamically adjust alpha based on query characteristics."""
    # Check for exact phrases (quotes)
    if '"' in query_text:
        return 0.2  # Favor exact matching

    # Check for technical terms or codes
    if any(char.isdigit() or char.isupper() for char in query_text.split()):
        return 0.3  # Favor keywords

    # Check for question words (semantic query)
    question_words = ["how", "why", "what", "when", "where", "who"]
    if any(word in query_text.lower() for word in question_words):
        return 0.7  # Favor semantic

    # Default balanced
    return 0.5

# Use dynamic alpha
query = "how to optimize database queries"
alpha = calculate_alpha(query)

results = await memory.search(
    owner_id="workspace-1",
    query_text=query,
    search_type=SearchType.HYBRID,
    alpha=alpha,
    limit=10
)

Monitoring and Debugging

Understanding Result Scores

results = await memory.search(
    owner_id="workspace-1",
    query_text="test query",
    search_type=SearchType.HYBRID,
    alpha=0.5,
    limit=5
)

for result in results:
    # Inspect individual scores
    print(f"Chunk ID: {result.chunk_id}")
    print(f"  RRF Score: {result.rrf_score:.4f} (overall ranking)")
    print(f"  Vector Similarity: {result.similarity:.4f}")
    print(f"  Text Rank: {result.text_rank:.4f}")
    print(f"  Content preview: {result.content[:80]}...")
    print()

# Look for:
# - High RRF but low similarity = text search dominated
# - High RRF but low text rank = vector search dominated
# - High in both = strong consensus (best results)

Related Skills

  • basic-usage - Core document and search operations
  • multi-query - Query expansion for better hybrid search results
  • rag - Using hybrid search in RAG systems with reranking
  • multi-tenant - Multi-tenant isolation patterns

Important Notes

HNSW Configuration: Hybrid search uses HNSW (Hierarchical Navigable Small World) index for fast vector similarity. Performance can be tuned with LLMEMORY_HNSW_PROFILE environment variable or programmatically via SearchConfig. See the "Configuring Hybrid Search with SearchConfig" section for comprehensive configuration details including:

  • Three presets: fast, balanced (default), accurate
  • Individual HNSW parameters (m, ef_construction, ef_search)
  • RRF tuning with rrf_k parameter
  • Performance impact comparison table

Language Support: Text search automatically detects document language and uses appropriate full-text search configuration (supports 14+ languages including English, Spanish, French, German, etc.).

Embedding Models: Vector search quality depends on embedding model. Default is OpenAI text-embedding-3-small (1536 dimensions). For local embeddings, use all-MiniLM-L6-v2 (384 dimensions).

Search Limits: Hybrid search internally retrieves limit * 2 candidates from each search method before RRF fusion. This ensures high-quality results even when vector and text return different chunks.