| name | doc-search |
| description | Token-efficient documentation search using Serena Document Index. 90%+ token savings vs reading full files. Use BEFORE reading README.md or docs/ files. Triggers on architecture questions, pattern lookups, and project-specific documentation needs. |
Document Search
Search project documentation efficiently using the Serena Document Index system.
Why This Matters
| Approach | Tokens | Use Case |
|---|---|---|
| Read full README.md | 3000-8000 | Never (wasteful) |
| Read docs/*.md | 2000-5000 each | Rarely needed |
| Document Index Search | 100-500 | Always prefer |
| Section Retrieval | 200-800 | After finding relevant section |
Rule: Never read documentation files until the document index fails to answer.
Document Index Location
.serena/cache/documents/document_index.json
Index Types Available:
tag_index- Search by tags (architecture, api, testing, etc.)title_index- Search by section titlesproject_index- Filter by project (basecamp-server, interface-cli, etc.)doc_type_index- Filter by document type (readme, guide, api-reference, etc.)content_index- Keyword-based content search
Workflow Pattern
Step 1: Search Document Index (Python CLI)
# Search for relevant documentation sections
cd /Users/kun/github/1ambda/dataops-platform
python3 scripts/serena/document_indexer.py --search "hexagonal architecture" --max-results 5
Step 2: Read Specific Section Only
After finding relevant section from search:
# Use section coordinates from search result
# Example: project-basecamp-server/docs/PATTERNS.md#module-placement-rules
# Read only that section (lines 45-80) instead of entire file
Read(file_path="project-basecamp-server/docs/PATTERNS.md", offset=45, limit=35)
Step 3: Alternative - Direct JSON Query
# For programmatic access in agent workflows
import json
from pathlib import Path
cache_path = Path(".serena/cache/documents/document_index.json")
index = json.loads(cache_path.read_text())
# Search by tag
architecture_docs = index['tag_index'].get('architecture', [])
# Search by project
server_docs = index['project_index'].get('project-basecamp-server', [])
# Get section content
for ref in architecture_docs[:3]:
print(f"Section: {ref['section_title']}")
print(f"File: {ref['relative_path']}")
print(f"Lines: {ref['line_start']}-{ref['line_end']}")
Decision Tree
Need documentation?
|
+-- What patterns exist for X?
| +-- doc-search: tag_index["patterns"] or tag_index["architecture"]
|
+-- How to implement feature in project Y?
| +-- doc-search: project_index["project-Y"] + tag_index["implementation"]
|
+-- What does README say about Z?
| +-- doc-search: title_index["Z"] or content_index["keyword"]
|
+-- Full context needed?
+-- Read specific section (lines from search result)
+-- LAST RESORT: Read full file
Integration with mcp-efficiency
Document search is the first step before Serena symbol queries:
# 1. Search docs for patterns/context
doc_search("hexagonal architecture", max_results=3)
# 2. Use Serena for code structure
serena.get_symbols_overview("module-core-domain/")
# 3. Find specific symbols
serena.find_symbol("RepositoryJpa", depth=1)
Common Search Queries
| Need | Search Query |
|---|---|
| Architecture patterns | "hexagonal" OR "architecture" |
| API endpoints | "api" OR "endpoint" OR "controller" |
| Testing patterns | "test" OR "testing" OR "fixture" |
| Entity relationships | "entity" OR "repository" OR "jpa" |
| CLI commands | "command" OR "cli" OR "dli" |
| Configuration | "config" OR "environment" OR "settings" |
Token Savings Examples
| Task | Without Doc Search | With Doc Search | Savings |
|---|---|---|---|
| Find architecture pattern | 5000 tokens (full PATTERNS.md) | 300 tokens | 94% |
| Check entity rules | 3000 tokens (full README) | 400 tokens | 87% |
| Find API reference | 4000 tokens (full docs) | 250 tokens | 94% |
| Implementation guide | 6000 tokens (multiple files) | 500 tokens | 92% |
Updating the Index
# Rebuild after documentation changes
python3 scripts/serena/update-symbols.py --with-docs
# Incremental update (changed files only)
python3 scripts/serena/update-symbols.py --changed-only --with-docs
# Full rebuild
python3 scripts/serena/document_indexer.py --project-root . --rebuild
Anti-Patterns
| Anti-Pattern | Problem | Solution |
|---|---|---|
| Read full README.md first | 3000+ tokens wasted | Search index, read section |
| Read all docs/*.md | 10000+ tokens wasted | Search by tag/title |
| Skip doc search, use web | Slower, less relevant | Use indexed local docs |
| Guess file locations | Miss relevant docs | Use project_index filter |
Quick Reference
# CLI search (recommended)
python3 scripts/serena/document_indexer.py --search "QUERY" --max-results 5
# Build/rebuild index
python3 scripts/serena/update-symbols.py --with-docs
# Check index stats
python3 -c "import json; d=json.load(open('.serena/cache/documents/document_index.json')); print(d['metadata'])"