| name | hybrid-search |
| description | Use when building search systems that need both semantic similarity and keyword matching - covers combining vector and BM25 search with Reciprocal Rank Fusion, alpha tuning for search weight control, and optimizing retrieval quality |
| version | 1.0.0 |
LLMemory Hybrid Search
Installation
uv add llmemory
# or
pip install llmemory
Overview
Hybrid search combines vector similarity search (semantic understanding) with full-text search (keyword matching) to deliver superior retrieval quality. Results are merged using Reciprocal Rank Fusion (RRF) to create a unified ranking.
When to use hybrid search:
- Need both semantic similarity AND exact keyword matches
- Queries contain specific terms, names, or technical jargon
- Want best-of-both-worlds retrieval quality (recommended default)
When to use vector-only search:
- Purely semantic/conceptual queries
- Cross-lingual search
- Queries with synonyms or paraphrasing
When to use text-only search:
- Exact keyword/phrase matching required
- Search in structured data or code
- When embeddings are not available
Quick Start
from llmemory import LLMemory, SearchType
async with LLMemory(connection_string="postgresql://localhost/mydb") as memory:
# Hybrid search (default, recommended)
results = await memory.search(
owner_id="workspace-1",
query_text="machine learning algorithms",
search_type=SearchType.HYBRID,
limit=10,
alpha=0.5 # Equal weight to vector and text
)
for result in results:
print(f"[RRF={result.rrf_score:.3f}] {result.content[:80]}...")
Complete API Documentation
SearchType Enum
class SearchType(str, Enum):
VECTOR = "vector" # Vector similarity only
TEXT = "text" # Full-text search only
HYBRID = "hybrid" # Combines vector + text (recommended)
search() - Hybrid Mode
Signature:
async def search(
owner_id: str,
query_text: str,
search_type: Union[SearchType, str] = SearchType.HYBRID,
limit: int = 10,
alpha: float = 0.5,
metadata_filter: Optional[Dict[str, Any]] = None,
id_at_origins: Optional[List[str]] = None,
date_from: Optional[datetime] = None,
date_to: Optional[datetime] = None,
include_parent_context: bool = False,
context_window: int = 2
) -> List[SearchResult]
Hybrid Search Parameters:
search_type(SearchType, default: HYBRID): Set toSearchType.HYBRIDfor hybrid searchalpha(float, default: 0.5): Weight for vector vs text search0.0= text search only0.5= equal weight (balanced, recommended)1.0= vector search only0.3= favor text search (good for keyword-heavy queries)0.7= favor vector search (good for semantic queries)
Returns:
List[SearchResult]with hybrid-specific fields:rrf_score(float): Reciprocal Rank Fusion score (primary ranking)similarity(float): Vector similarity score (0-1)text_rank(float): Full-text search rankscore(float): Overall score (equals rrf_score for hybrid)
Example:
# Balanced hybrid search
results = await memory.search(
owner_id="workspace-1",
query_text="quarterly revenue growth",
search_type=SearchType.HYBRID,
alpha=0.5, # Equal weight
limit=20
)
for result in results:
print(f"RRF Score: {result.rrf_score:.3f}")
print(f"Vector Similarity: {result.similarity:.3f}")
print(f"Text Rank: {result.text_rank:.3f}")
print(f"Content: {result.content[:100]}...")
print("---")
Understanding Alpha Parameter
The alpha parameter controls the balance between vector and text search in hybrid mode.
Alpha Values Guide
# Text-heavy (alpha = 0.0 to 0.3)
# Use when: Query has specific keywords, names, or technical terms
results = await memory.search(
owner_id="workspace-1",
query_text="Python asyncio gather timeout",
search_type=SearchType.HYBRID,
alpha=0.3 # Favor keyword matching
)
# Balanced (alpha = 0.4 to 0.6)
# Use when: General queries, uncertain which is better
results = await memory.search(
owner_id="workspace-1",
query_text="customer retention strategies",
search_type=SearchType.HYBRID,
alpha=0.5 # Equal weight (recommended default)
)
# Semantic-heavy (alpha = 0.7 to 1.0)
# Use when: Conceptual queries, synonyms, paraphrasing
results = await memory.search(
owner_id="workspace-1",
query_text="ways to keep customers happy",
search_type=SearchType.HYBRID,
alpha=0.7 # Favor semantic similarity
)
Choosing Alpha for Different Query Types
| Query Type | Example | Recommended Alpha | Reasoning |
|---|---|---|---|
| Specific keywords | "PostgreSQL CONNECTION_LIMIT error" | 0.2-0.3 | Need exact keyword matches |
| Product/person names | "iPhone 15 Pro specifications" | 0.3-0.4 | Names matter more than semantics |
| Technical jargon | "SOLID principles dependency injection" | 0.4-0.5 | Balance needed |
| General concepts | "improve team collaboration" | 0.5-0.6 | Balanced approach |
| Semantic queries | "how to motivate employees" | 0.6-0.7 | Semantic understanding key |
| Paraphrased questions | "what are good ways to retain staff" | 0.7-0.8 | Vector search excels |
Reciprocal Rank Fusion (RRF)
Hybrid search uses RRF to merge vector and text search results into a unified ranking.
How RRF Works
k = 60 # RRF constant (prevents early results from dominating)
# Initialize score accumulator for each chunk
rrf_scores = {}
# Process vector search results
for rank, result in enumerate(vector_results):
chunk_id = result["chunk_id"]
vector_contribution = alpha / (k + rank + 1)
rrf_scores[chunk_id] = rrf_scores.get(chunk_id, 0) + vector_contribution
# Process text search results
for rank, result in enumerate(text_results):
chunk_id = result["chunk_id"]
text_contribution = (1 - alpha) / (k + rank + 1)
rrf_scores[chunk_id] = rrf_scores.get(chunk_id, 0) + text_contribution
# Sort by accumulated RRF score descending
sorted_results = sorted(rrf_scores.items(), key=lambda x: x[1], reverse=True)
Key points:
- Alpha is inside the division:
alpha / (k + rank + 1), not multiplied afterward - Rank is 1-indexed:
rank + 1where rank starts at 0 - Chunks appearing in both result lists get contributions from both
- k = 50 by default (configurable via
SearchConfig.rrf_k)
RRF Benefits
- Handles different score scales: Vector similarities (0-1) and text ranks (varying) are normalized
- Position-based fusion: Emphasizes consensus across search methods
- Robust to score outliers: Single high score doesn't dominate
- Tunable with alpha: Control the balance between search methods
Example: RRF in Action
results = await memory.search(
owner_id="workspace-1",
query_text="machine learning neural networks",
search_type=SearchType.HYBRID,
alpha=0.5,
limit=5
)
for i, result in enumerate(results, 1):
print(f"Result #{i}")
print(f" RRF Score: {result.rrf_score:.4f}")
print(f" Vector Sim: {result.similarity:.4f} (semantic match)")
print(f" Text Rank: {result.text_rank:.4f} (keyword match)")
print(f" Content: {result.content[:80]}...")
print()
# Output shows how RRF balances both signals:
# Result #1
# RRF Score: 0.0245 (highest combined score)
# Vector Sim: 0.85 (very semantically similar)
# Text Rank: 12.5 (good keyword match)
# Content: Deep learning uses neural networks with multiple layers...
Configuring Hybrid Search with SearchConfig
LLMemory's SearchConfig provides fine-grained control over hybrid search behavior, including HNSW vector index parameters and RRF fusion settings. You can configure these settings via environment variables or programmatically through LLMemoryConfig.
HNSW Index Configuration
The HNSW (Hierarchical Navigable Small World) index powers fast approximate nearest neighbor vector search. LLMemory provides three preset profiles and supports custom configuration.
HNSW Parameters
hnsw_m(int, default: 16): Number of bi-directional links per node- Higher values = better recall, larger index, slower construction
- Range: 8-64, typical values: 8 (fast), 16 (balanced), 32 (accurate)
hnsw_ef_construction(int, default: 200): Size of dynamic candidate list during index construction- Higher values = better index quality, slower construction
- Range: 100-1000, typical values: 80 (fast), 200 (balanced), 400 (accurate)
hnsw_ef_search(int, default: 100): Size of dynamic candidate list during search- Higher values = better recall, slower search
- Range: 40-500, typical values: 40 (fast), 100 (balanced), 200 (accurate)
HNSW Presets
LLMemory includes three built-in presets for common use cases:
HNSW_PRESETS = {
"fast": {
"m": 8,
"ef_construction": 80,
"ef_search": 40
},
"balanced": {
"m": 16,
"ef_construction": 200,
"ef_search": 100
},
"accurate": {
"m": 32,
"ef_construction": 400,
"ef_search": 200
}
}
Preset Recommendations:
- fast: Latency-critical applications (40-60ms search, ~95% recall)
- balanced: General-purpose use (80-120ms search, ~98% recall) - Default
- accurate: High-precision requirements (150-250ms search, ~99.5% recall)
Using HNSW Presets via Environment Variable
Set the LLMEMORY_HNSW_PROFILE environment variable to use a preset:
# Use fast profile for low-latency applications
export LLMEMORY_HNSW_PROFILE=fast
# Use accurate profile for high-precision requirements
export LLMEMORY_HNSW_PROFILE=accurate
# Use balanced profile (default, can be omitted)
export LLMEMORY_HNSW_PROFILE=balanced
Then initialize LLMemory normally - the preset will be applied automatically:
from llmemory import LLMemory, SearchType
# Automatically uses HNSW preset from environment
async with LLMemory(connection_string="postgresql://localhost/mydb") as memory:
results = await memory.search(
owner_id="workspace-1",
query_text="machine learning",
search_type=SearchType.HYBRID,
limit=10
)
Programmatic HNSW Configuration
For more control, configure HNSW parameters programmatically:
from llmemory import LLMemory, SearchType
from llmemory.config import LLMemoryConfig
# Create custom configuration
config = LLMemoryConfig()
# Configure search parameters
config.search.hnsw_ef_search = 150 # Higher search accuracy
# Configure database/index parameters
config.database.hnsw_m = 24
config.database.hnsw_ef_construction = 300
# Initialize with custom config
async with LLMemory(
connection_string="postgresql://localhost/mydb",
config=config
) as memory:
results = await memory.search(
owner_id="workspace-1",
query_text="neural networks",
search_type=SearchType.HYBRID,
limit=10
)
Note: Index construction parameters (hnsw_m, hnsw_ef_construction) only affect new indexes. To apply them to an existing index, you must recreate the index:
-- Recreate HNSW index with new parameters
DROP INDEX IF EXISTS llmemory.document_chunks_embedding_hnsw;
CREATE INDEX document_chunks_embedding_hnsw
ON llmemory.document_chunks
USING hnsw (embedding vector_cosine_ops)
WITH (m = 24, ef_construction = 300);
RRF Configuration
The rrf_k parameter controls the Reciprocal Rank Fusion constant used to merge vector and text search results.
RRF Parameter
rrf_k(int, default: 50): RRF constant that controls rank position sensitivity- Higher values = less weight on top positions, more democratic fusion
- Lower values = more weight on top positions, favors high-ranking results
- Range: 10-100, typical values: 30 (aggressive), 50 (balanced), 70 (democratic)
How rrf_k affects fusion:
# For a chunk at rank position r (0-indexed):
rrf_score_contribution = alpha / (rrf_k + r + 1)
# Example with rrf_k=50:
# Rank 0: 1.0 / (50 + 0 + 1) = 0.0196
# Rank 1: 1.0 / (50 + 1 + 1) = 0.0192
# Rank 10: 1.0 / (50 + 10 + 1) = 0.0164
# Example with rrf_k=20 (favors top results):
# Rank 0: 1.0 / (20 + 0 + 1) = 0.0476
# Rank 1: 1.0 / (20 + 1 + 1) = 0.0455
# Rank 10: 1.0 / (20 + 10 + 1) = 0.0323
# Example with rrf_k=80 (more democratic):
# Rank 0: 1.0 / (80 + 0 + 1) = 0.0123
# Rank 1: 1.0 / (80 + 1 + 1) = 0.0122
# Rank 10: 1.0 / (80 + 10 + 1) = 0.0110
Configuring RRF via Environment Variable
# Lower k favors top-ranked results
export LLMEMORY_RRF_K=30
# Higher k gives more weight to mid-ranked results
export LLMEMORY_RRF_K=70
# Default balanced setting
export LLMEMORY_RRF_K=50
Note: Currently, rrf_k is not directly exposed via environment variable. To configure it, use programmatic configuration:
from llmemory import LLMemory
from llmemory.config import LLMemoryConfig
config = LLMemoryConfig()
config.search.rrf_k = 30 # Favor top-ranked results
async with LLMemory(
connection_string="postgresql://localhost/mydb",
config=config
) as memory:
results = await memory.search(
owner_id="workspace-1",
query_text="search query",
search_type=SearchType.HYBRID,
limit=10
)
Complete Configuration Example
Here's a complete example showing both environment variable and programmatic configuration:
import os
from llmemory import LLMemory, SearchType
from llmemory.config import LLMemoryConfig
# Option 1: Environment variable configuration
os.environ["LLMEMORY_HNSW_PROFILE"] = "accurate"
# HNSW will use: m=32, ef_construction=400, ef_search=200
async with LLMemory(connection_string="postgresql://localhost/mydb") as memory:
results = await memory.search(
owner_id="workspace-1",
query_text="deep learning transformers",
search_type=SearchType.HYBRID,
alpha=0.6,
limit=15
)
# Option 2: Programmatic configuration with fine-tuning
config = LLMemoryConfig()
# HNSW search configuration
config.search.hnsw_ef_search = 150 # Higher accuracy than default
# HNSW index construction (for new indexes)
config.database.hnsw_m = 20
config.database.hnsw_ef_construction = 250
# RRF configuration
config.search.rrf_k = 40 # Favor top-ranked results slightly
# Other search settings
config.search.default_limit = 20
config.search.default_search_type = "hybrid"
async with LLMemory(
connection_string="postgresql://localhost/mydb",
config=config
) as memory:
# Search with custom configuration
results = await memory.search(
owner_id="workspace-1",
query_text="neural network architectures",
search_type=SearchType.HYBRID,
alpha=0.5,
limit=20
)
for result in results:
print(f"RRF: {result.rrf_score:.4f} | "
f"Vector: {result.similarity:.4f} | "
f"Text: {result.text_rank:.4f}")
print(f" {result.content[:80]}...")
Configuration Performance Impact
Different HNSW settings have measurable performance impacts:
| Profile | Index Size (100k docs) | Construction Time | Search Latency | Recall |
|---|---|---|---|---|
| fast | 150 MB | 5 min | 40-60ms | ~95% |
| balanced | 250 MB | 12 min | 80-120ms | ~98% |
| accurate | 450 MB | 30 min | 150-250ms | ~99.5% |
Tuning Guidelines:
- Start with balanced (default) for most applications
- Use fast if:
- Search latency must be under 100ms
- Recall around 95% is acceptable
- Index size is a constraint
- Use accurate if:
- High precision is critical (medical, legal, financial)
- Search latency under 300ms is acceptable
- Maximum recall is required
- Custom tune if:
- You have specific latency/recall requirements
- You've measured performance with your data
- You're optimizing for your embedding model
Search Type Comparison
Vector Search Only
# Pure semantic similarity
results = await memory.search(
owner_id="workspace-1",
query_text="artificial intelligence",
search_type=SearchType.VECTOR,
limit=10
)
# Good for:
# - "AI" matching "machine learning" (synonym)
# - "dog" matching "puppy" (semantic)
# - Cross-lingual search
#
# Weak for:
# - Specific keywords ("PostgreSQL 14.2")
# - Exact phrases ("return on investment")
# - Technical terms ("ValueError exception")
Text Search Only
# Pure keyword matching
results = await memory.search(
owner_id="workspace-1",
query_text="PostgreSQL CONNECTION_LIMIT",
search_type=SearchType.TEXT,
limit=10
)
# Good for:
# - Exact keyword matches
# - Technical error messages
# - Code search
# - Structured data
#
# Weak for:
# - Synonyms ("automobile" vs "car")
# - Paraphrasing
# - Conceptual queries
Hybrid Search (Recommended)
# Combines both vector and text
results = await memory.search(
owner_id="workspace-1",
query_text="reduce server response time",
search_type=SearchType.HYBRID,
alpha=0.5,
limit=10
)
# Strengths:
# - Finds semantically similar content ("optimize latency")
# - Also finds exact keywords ("response time")
# - Best overall retrieval quality
# - Robust to different query styles
#
# Use cases:
# - General-purpose search (recommended default)
# - Unknown query patterns
# - Mixed keyword + semantic needs
Practical Examples
E-commerce Product Search
# Product search benefits from hybrid
# - Vector: Understands "laptop for programming"
# - Text: Matches exact model numbers "MacBook Pro M3"
results = await memory.search(
owner_id="store-1",
query_text="fast laptop for developers",
search_type=SearchType.HYBRID,
alpha=0.6, # Favor semantic understanding
metadata_filter={"category": "computers"},
limit=20
)
Technical Documentation Search
# Documentation needs both semantic and exact matches
# - Vector: Finds conceptually related docs
# - Text: Finds exact function/class names
results = await memory.search(
owner_id="docs-site",
query_text="authenticate users with OAuth2",
search_type=SearchType.HYBRID,
alpha=0.4, # Slight favor to keywords ("OAuth2")
metadata_filter={"doc_type": "api_reference"},
limit=15
)
Customer Support Search
# Support tickets need semantic understanding
# - Vector: Matches similar issues ("can't log in" = "login failed")
# - Text: Matches error codes, product names
results = await memory.search(
owner_id="support-team",
query_text="error code 500 payment processing",
search_type=SearchType.HYBRID,
alpha=0.3, # Favor exact error codes
metadata_filter={"status": "resolved"},
limit=10
)
Research Paper Search
# Academic search benefits from semantic understanding
# - Vector: Finds related concepts and methods
# - Text: Finds exact citations, author names
results = await memory.search(
owner_id="research-db",
query_text="transformer attention mechanism",
search_type=SearchType.HYBRID,
alpha=0.7, # Favor semantic similarity
date_from=datetime(2020, 1, 1), # Recent papers
limit=25
)
Performance Optimization
Hybrid Search Performance
Hybrid search runs vector and text searches in parallel for optimal performance:
# Both searches execute concurrently
# Total time ≈ max(vector_time, text_time) + rrf_fusion_time
# Typically: 50-150ms for hybrid search
import time
start = time.time()
results = await memory.search(
owner_id="workspace-1",
query_text="customer retention",
search_type=SearchType.HYBRID,
limit=20
)
elapsed = (time.time() - start) * 1000
print(f"Search completed in {elapsed:.2f}ms")
Tuning for Speed vs Quality
# Faster hybrid search (fewer candidates)
results = await memory.search(
owner_id="workspace-1",
query_text="query text",
search_type=SearchType.HYBRID,
limit=10, # Lower limit = faster
alpha=0.5
)
# Higher quality hybrid search (more candidates considered)
# Note: Uses internal candidate multiplier (typically limit * 2)
results = await memory.search(
owner_id="workspace-1",
query_text="query text",
search_type=SearchType.HYBRID,
limit=20, # Higher limit for better recall
alpha=0.5
)
Advanced Filtering with Hybrid Search
# Combine hybrid search with metadata filters
results = await memory.search(
owner_id="workspace-1",
query_text="financial performance analysis",
search_type=SearchType.HYBRID,
alpha=0.5,
metadata_filter={
"department": "finance",
"year": 2024,
"confidential": False
},
date_from=datetime(2024, 1, 1),
date_to=datetime(2024, 12, 31),
limit=15
)
# Hybrid search finds:
# - Vector: Similar financial concepts
# - Text: Exact keyword "performance analysis"
# - Both filtered by metadata and date range
Common Mistakes
❌ Wrong: Always using default alpha=0.5
# This works but may not be optimal
results = await memory.search(
owner_id="workspace-1",
query_text="iPhone 14 Pro specs", # Specific product name
search_type=SearchType.HYBRID,
alpha=0.5 # Equal weight not ideal here
)
✅ Right: Tune alpha for query type
# Product names and specific terms favor text search
results = await memory.search(
owner_id="workspace-1",
query_text="iPhone 14 Pro specs",
search_type=SearchType.HYBRID,
alpha=0.3 # Favor exact keyword matching
)
❌ Wrong: Using VECTOR for exact keyword matching
results = await memory.search(
owner_id="workspace-1",
query_text="ERROR CODE 404",
search_type=SearchType.VECTOR # Won't find exact "404"
)
✅ Right: Use HYBRID or TEXT for exact keywords
results = await memory.search(
owner_id="workspace-1",
query_text="ERROR CODE 404",
search_type=SearchType.HYBRID,
alpha=0.2 # Heavily favor exact keywords
)
❌ Wrong: Using TEXT for conceptual queries
results = await memory.search(
owner_id="workspace-1",
query_text="how to improve customer satisfaction",
search_type=SearchType.TEXT # Misses semantic matches
)
✅ Right: Use HYBRID for conceptual queries
results = await memory.search(
owner_id="workspace-1",
query_text="how to improve customer satisfaction",
search_type=SearchType.HYBRID,
alpha=0.7 # Favor semantic understanding
)
Alpha Tuning Strategies
A/B Testing Different Alpha Values
# Test different alpha values to find optimal setting
query = "product launch strategy roadmap"
alpha_values = [0.3, 0.5, 0.7]
for alpha in alpha_values:
results = await memory.search(
owner_id="workspace-1",
query_text=query,
search_type=SearchType.HYBRID,
alpha=alpha,
limit=10
)
print(f"\nAlpha = {alpha}")
for i, result in enumerate(results[:3], 1):
print(f" #{i}: {result.content[:60]}... (RRF={result.rrf_score:.4f})")
# Compare results quality and adjust
Dynamic Alpha Based on Query Analysis
def calculate_alpha(query_text: str) -> float:
"""Dynamically adjust alpha based on query characteristics."""
# Check for exact phrases (quotes)
if '"' in query_text:
return 0.2 # Favor exact matching
# Check for technical terms or codes
if any(char.isdigit() or char.isupper() for char in query_text.split()):
return 0.3 # Favor keywords
# Check for question words (semantic query)
question_words = ["how", "why", "what", "when", "where", "who"]
if any(word in query_text.lower() for word in question_words):
return 0.7 # Favor semantic
# Default balanced
return 0.5
# Use dynamic alpha
query = "how to optimize database queries"
alpha = calculate_alpha(query)
results = await memory.search(
owner_id="workspace-1",
query_text=query,
search_type=SearchType.HYBRID,
alpha=alpha,
limit=10
)
Monitoring and Debugging
Understanding Result Scores
results = await memory.search(
owner_id="workspace-1",
query_text="test query",
search_type=SearchType.HYBRID,
alpha=0.5,
limit=5
)
for result in results:
# Inspect individual scores
print(f"Chunk ID: {result.chunk_id}")
print(f" RRF Score: {result.rrf_score:.4f} (overall ranking)")
print(f" Vector Similarity: {result.similarity:.4f}")
print(f" Text Rank: {result.text_rank:.4f}")
print(f" Content preview: {result.content[:80]}...")
print()
# Look for:
# - High RRF but low similarity = text search dominated
# - High RRF but low text rank = vector search dominated
# - High in both = strong consensus (best results)
Related Skills
basic-usage- Core document and search operationsmulti-query- Query expansion for better hybrid search resultsrag- Using hybrid search in RAG systems with rerankingmulti-tenant- Multi-tenant isolation patterns
Important Notes
HNSW Configuration:
Hybrid search uses HNSW (Hierarchical Navigable Small World) index for fast vector similarity. Performance can be tuned with LLMEMORY_HNSW_PROFILE environment variable or programmatically via SearchConfig. See the "Configuring Hybrid Search with SearchConfig" section for comprehensive configuration details including:
- Three presets:
fast,balanced(default),accurate - Individual HNSW parameters (m, ef_construction, ef_search)
- RRF tuning with
rrf_kparameter - Performance impact comparison table
Language Support: Text search automatically detects document language and uses appropriate full-text search configuration (supports 14+ languages including English, Spanish, French, German, etc.).
Embedding Models:
Vector search quality depends on embedding model. Default is OpenAI text-embedding-3-small (1536 dimensions). For local embeddings, use all-MiniLM-L6-v2 (384 dimensions).
Search Limits:
Hybrid search internally retrieves limit * 2 candidates from each search method before RRF fusion. This ensures high-quality results even when vector and text return different chunks.