name	semantic-search
description	Use this when deciding between semantic search and grep/glob for code discovery. Apply for concept-based queries (find payment processing), intent-based searches (how is auth implemented), or when user doesn't know exact class names. Use grep for exact matches like specific function names
categories	pattern, technique, search
tags	semantic-search, weaviate, embeddings, discovery
version	1.0.0

Semantic Search Technique

Purpose

Decision framework and execution guide for using semantic search effectively in CodeCompass.

When to Use Semantic Search

✅ Use Semantic Search When

1. Concept-based Queries

"Find code that handles payment processing"
"Where do we validate email addresses?"
"Show me error handling patterns"

2. Intent-based Queries

"How is user authentication implemented?"
"What code calculates shipping costs?"
"Find business rules for order approval"

3. Cross-language/Cross-file

Searching across PHP, TypeScript, config files
Pattern discovery across multiple modules
Finding similar implementations

4. Fuzzy/Exploratory

User doesn't know exact class/function names
Exploring unfamiliar codebase
"Code that does something like X"

5. Natural Language

"Show me all database migrations"
"Find controllers that handle file uploads"
"Where are API rate limits defined?"

❌ Use Grep/Glob When

1. Exact Matches

"Find class named PaymentController"
"Where is processPayment function defined?"
"Find all imports of UserService"

2. Syntax Patterns

"Find all functions starting with get"
"Show me all @Injectable() decorators"
"Find TypeScript interfaces"

3. Performance Critical

Quick lookups in known files
Repeated searches in tight loops
When you know exact location

4. Structural Queries

"Find all .ts files in src/modules"
"List all test files"
"Show directory structure"

Execution Guide

Step 1: Formulate Effective Query

❌ Bad Queries (too vague):

"payment"
"code"
"function"

✅ Good Queries (specific context):

"business logic for processing customer payments and updating order status"
"validation rules for user email and password requirements"
"error handling patterns for database connection failures"

Why: More context = better semantic matching

Formula:

[Action/Purpose] for [Specific Entity] with [Context/Constraints]

Examples:

"Extract business capabilities from Yii2 controllers"
"Validation logic for user registration with email verification"
"Database migration patterns for schema versioning"

Step 2: Verify Indexing

Before searching, ensure codebase is indexed:

# Check if indexed
curl http://localhost:8081/v1/schema

# Should show collections like:
# - CodeContext
# - AtlasCode

If not indexed:

codecompass batch:index <path-to-codebase>

Step 3: Execute Search

codecompass search:semantic "business logic for payment processing"

Alternative (if using as library):

const results = await searchService.semanticSearch({
  query: "business logic for payment processing",
  limit: 10,
  certainty: 0.7 // Minimum relevance score
});

Step 4: Interpret Results

Check relevance scores:

>0.8: Highly relevant (exact match)
0.7-0.8: Good match (related)
0.6-0.7: Moderate match (possibly relevant)
<0.6: Weak match (may be noise)

Verify context:

Does the returned code actually match intent?
Are results from expected modules?
Multiple related files found (good signal)
Or isolated random matches (refine query)

Step 5: Refine if Needed

Too many results (>50):

Add more specific context to query
Increase certainty threshold
Add domain constraints ("in authentication module")

Too few results (<3):

Broaden query (less specific)
Lower certainty threshold
Check if area is actually indexed
Try related terms/synonyms

Wrong results:

Rephrase query with different terminology
Add negative constraints
Try breaking into multiple specific queries

Behind the Scenes

Architecture

Query Text
    ↓
Ollama Embedding (mxbai-embed-large)
    ↓
1024-dimensional vector
    ↓
Weaviate Vector Search (cosine similarity)
    ↓
Ranked Results

Key Components

From .ai/capabilities.json:

Module: search, vectorizer, weaviate
Embedding: Ollama mxbai-embed-large (1024 dimensions)
Vector DB: Weaviate with HNSW indexing
Collections: CodeContext, AtlasCode

Configuration (from .env):

EMBEDDING_SERVICE=ollama
OLLAMA_EMBEDDING_MODEL=mxbai-embed-large
OLLAMA_URL=http://localhost:11434
CODECOMPASS_WEAVIATE_URL=http://localhost:8081

Advanced Patterns

Pattern 1: Multi-Query Exploration

For complex questions, break into multiple searches:

# Instead of:
"authentication and authorization and session management"

# Do:
codecompass search:semantic "user authentication login process"
codecompass search:semantic "authorization and access control"
codecompass search:semantic "session management and tokens"

Pattern 2: Iterative Refinement

# 1. Broad search
codecompass search:semantic "payment processing"

# 2. Review results, identify specific module
# 3. Narrow search
codecompass search:semantic "payment gateway integration in PaymentController"

# 4. Pinpoint implementation
codecompass search:semantic "Stripe API call for processing credit cards"

Pattern 3: Cross-Domain Search

Search across different aspects:

# Code implementation
codecompass search:semantic "email validation logic"

# Tests
codecompass search:semantic "test cases for email validation"

# Configuration
codecompass search:semantic "email service configuration"

Common Pitfalls

❌ Pitfall 1: Searching Before Indexing

Symptom: No results or error Solution: Run codecompass batch:index first

❌ Pitfall 2: Too Vague Queries

Symptom: Returns everything or nothing useful Solution: Add specific context and intent

❌ Pitfall 3: Expecting Exact Matches

Symptom: "Why didn't it find function processPayment?" Reason: Semantic search is for concepts, not exact names Solution: Use grep for exact matches

❌ Pitfall 4: Ignoring Relevance Scores

Symptom: Reading irrelevant results Solution: Filter by score >0.7, ignore weak matches

❌ Pitfall 5: Single Query for Complex Questions

Symptom: Poor results for multi-faceted questions Solution: Break into multiple targeted queries

Decision Tree

┌─────────────────────────────────────┐
│ I need to find code that...         │
└─────────────────────────────────────┘
              ↓
        ┌─────────┐
        │ Know    │ Exact class/function name?
        │ exact   │
        │ name?   │
        └─────────┘
          ↙     ↘
        YES      NO
         ↓        ↓
    Use Grep  ┌─────────┐
              │ Concept │ Searching by meaning/purpose?
              │ search? │
              └─────────┘
                ↙     ↘
              YES      NO
               ↓        ↓
          Semantic  ┌─────────┐
          Search    │ Pattern │ Looking for code pattern?
                    │ match?  │
                    └─────────┘
                      ↙     ↘
                    YES      NO
                     ↓        ↓
                Use Glob  Use both
                          (Glob + Semantic)

Performance Considerations

Speed

Grep: Milliseconds (fast, synchronous)
Semantic Search: 100-500ms (embedding + vector search)

Tradeoff: Semantic is slower but finds conceptually related code

Token Cost (Embeddings)

Each query → 1 embedding generation
Ollama local → No API cost
But consumes local compute

Scaling

Small codebase (<1K files): Either method fine
Medium codebase (1K-10K files): Semantic search advantage grows
Large codebase (>10K files): Semantic search essential

Integration with Other Tools

With Yii2 Analysis

# 1. Analyze Yii2 project
codecompass analyze:yii2 <path>

# 2. Index results
codecompass batch:index <path>

# 3. Explore with semantic search
codecompass search:semantic "Yii2 controller actions for user management"

With Requirements Extraction

# 1. Extract requirements
codecompass requirements:extract

# 2. Search extracted requirements
codecompass search:semantic "business rules for order validation"

With Weaviate Direct Query

# Alternative: Query Weaviate GraphQL API directly
curl -X POST http://localhost:8081/v1/graphql \
  -H "Content-Type: application/json" \
  -d '{
    "query": "{
      Get {
        CodeContext(
          nearText: { concepts: [\"payment processing\"] }
          limit: 10
        ) {
          content
          filePath
        }
      }
    }"
  }'

Related Skills

0-discover-capabilities.md - How to discover modules
analyze-yii2-project.md - Uses semantic search in workflow

Related Modules

From .ai/capabilities.json:

search - SearchService, IntegratedSearchService
vectorizer - Ollama embedding generation
weaviate - Vector database client
indexing - File indexing pipeline

Remember: Semantic search finds code by meaning, not by name. Choose the right tool for the job.

semantic-search

Install Skill

SKILL.md