| name | rag-implementation |
| description | Build Retrieval-Augmented Generation (RAG) systems for AI applications with vector databases and semantic search. Use when implementing knowledge-grounded AI, building document Q&A systems, or integrating LLMs with external knowledge bases. |
| allowed-tools | Read, Write, Bash |
| category | ai-engineering |
| tags | rag, vector-databases, embeddings, retrieval, semantic-search |
| version | 1.0.0 |
RAG Implementation
Build Retrieval-Augmented Generation systems that extend AI capabilities with external knowledge sources.
Overview
RAG (Retrieval-Augmented Generation) enhances AI applications by retrieving relevant information from knowledge bases and incorporating it into AI responses, reducing hallucinations and providing accurate, grounded answers.
When to Use
Use this skill when:
- Building Q&A systems over proprietary documents
- Creating chatbots with current, factual information
- Implementing semantic search with natural language queries
- Reducing hallucinations with grounded responses
- Enabling AI systems to access domain-specific knowledge
- Building documentation assistants
- Creating research tools with source citation
- Developing knowledge management systems
Core Components
Vector Databases
Store and efficiently retrieve document embeddings for semantic search.
Key Options:
- Pinecone: Managed, scalable, production-ready
- Weaviate: Open-source, hybrid search capabilities
- Milvus: High performance, on-premise deployment
- Chroma: Lightweight, easy local development
- Qdrant: Fast, advanced filtering
- FAISS: Meta's library, full control
Embedding Models
Convert text to numerical vectors for similarity search.
Popular Models:
- text-embedding-ada-002 (OpenAI): General purpose, 1536 dimensions
- all-MiniLM-L6-v2: Fast, lightweight, 384 dimensions
- e5-large-v2: High quality, multilingual
- bge-large-en-v1.5: State-of-the-art performance
Retrieval Strategies
Find relevant content based on user queries.
Approaches:
- Dense Retrieval: Semantic similarity via embeddings
- Sparse Retrieval: Keyword matching (BM25, TF-IDF)
- Hybrid Search: Combine dense + sparse for best results
- Multi-Query: Generate multiple query variations
- Contextual Compression: Extract only relevant parts
Quick Implementation
Basic RAG Setup
// Load documents from file system
List<Document> documents = FileSystemDocumentLoader.loadDocuments("/path/to/docs");
// Create embedding store
InMemoryEmbeddingStore<TextSegment> embeddingStore = new InMemoryEmbeddingStore<>();
// Ingest documents into the store
EmbeddingStoreIngestor.ingest(documents, embeddingStore);
// Create AI service with RAG capability
Assistant assistant = AiServices.builder(Assistant.class)
.chatModel(chatModel)
.chatMemory(MessageWindowChatMemory.withMaxMessages(10))
.contentRetriever(EmbeddingStoreContentRetriever.from(embeddingStore))
.build();
Document Processing Pipeline
// Split documents into chunks
DocumentSplitter splitter = new RecursiveCharacterTextSplitter(
500, // chunk size
100 // overlap
);
// Create embedding model
EmbeddingModel embeddingModel = OpenAiEmbeddingModel.builder()
.apiKey("your-api-key")
.build();
// Create embedding store
EmbeddingStore<TextSegment> embeddingStore = PgVectorEmbeddingStore.builder()
.host("localhost")
.database("postgres")
.user("postgres")
.password("password")
.table("embeddings")
.dimension(1536)
.build();
// Process and store documents
for (Document document : documents) {
List<TextSegment> segments = splitter.split(document);
for (TextSegment segment : segments) {
Embedding embedding = embeddingModel.embed(segment).content();
embeddingStore.add(embedding, segment);
}
}
Implementation Patterns
Pattern 1: Simple Document Q&A
Create a basic Q&A system over your documents.
public interface DocumentAssistant {
String answer(String question);
}
DocumentAssistant assistant = AiServices.builder(DocumentAssistant.class)
.chatModel(chatModel)
.contentRetriever(retriever)
.build();
Pattern 2: Metadata-Filtered Retrieval
Filter results based on document metadata.
// Add metadata during document loading
Document document = Document.builder()
.text("Content here")
.metadata("source", "technical-manual.pdf")
.metadata("category", "technical")
.metadata("date", "2024-01-15")
.build();
// Filter during retrieval
EmbeddingStoreContentRetriever retriever = EmbeddingStoreContentRetriever.builder()
.embeddingStore(embeddingStore)
.embeddingModel(embeddingModel)
.maxResults(5)
.minScore(0.7)
.filter(metadataKey("category").isEqualTo("technical"))
.build();
Pattern 3: Multi-Source Retrieval
Combine results from multiple knowledge sources.
ContentRetriever webRetriever = EmbeddingStoreContentRetriever.from(webStore);
ContentRetriever documentRetriever = EmbeddingStoreContentRetriever.from(documentStore);
ContentRetriever databaseRetriever = EmbeddingStoreContentRetriever.from(databaseStore);
// Combine results
List<Content> allResults = new ArrayList<>();
allResults.addAll(webRetriever.retrieve(query));
allResults.addAll(documentRetriever.retrieve(query));
allResults.addAll(databaseRetriever.retrieve(query));
// Rerank combined results
List<Content> rerankedResults = reranker.reorder(query, allResults);
Best Practices
Document Preparation
- Clean and preprocess documents before ingestion
- Remove irrelevant content and formatting artifacts
- Standardize document structure for consistent processing
- Add relevant metadata for filtering and context
Chunking Strategy
- Use 500-1000 tokens per chunk for optimal balance
- Include 10-20% overlap to preserve context at boundaries
- Consider document structure when determining chunk boundaries
- Test different chunk sizes for your specific use case
Retrieval Optimization
- Start with high k values (10-20) then filter/rerank
- Use metadata filtering to improve relevance
- Combine multiple retrieval strategies for better coverage
- Monitor retrieval quality and user feedback
Performance Considerations
- Cache embeddings for frequently accessed content
- Use batch processing for document ingestion
- Optimize vector store configuration for your scale
- Monitor query performance and system resources
Common Issues and Solutions
Poor Retrieval Quality
Problem: Retrieved documents don't match user queries Solutions:
- Improve document preprocessing and cleaning
- Adjust chunk size and overlap parameters
- Try different embedding models
- Use hybrid search combining semantic and keyword matching
Irrelevant Results
Problem: Retrieved documents contain relevant information but are not specific enough Solutions:
- Add metadata filtering for domain-specific constraints
- Implement reranking with cross-encoder models
- Use contextual compression to extract relevant parts
- Fine-tune retrieval parameters (k values, similarity thresholds)
Performance Issues
Problem: Slow response times during retrieval Solutions:
- Optimize vector store configuration and indexing
- Implement caching for frequently retrieved content
- Use smaller embedding models for faster inference
- Consider approximate nearest neighbor algorithms
Hallucination Prevention
Problem: AI generates information not present in retrieved documents Solutions:
- Improve prompt engineering to emphasize grounding
- Add verification steps to check answer alignment
- Include confidence scoring for responses
- Implement fact-checking mechanisms
Evaluation Framework
Retrieval Metrics
- Precision@k: Percentage of relevant documents in top-k results
- Recall@k: Percentage of all relevant documents found in top-k results
- Mean Reciprocal Rank (MRR): Average rank of first relevant result
- Normalized Discounted Cumulative Gain (nDCG): Ranking quality metric
Answer Quality Metrics
- Faithfulness: Degree to which answers are grounded in retrieved documents
- Answer Relevance: How well answers address user questions
- Context Recall: Percentage of relevant context used in answers
- Context Precision: Percentage of retrieved context that is relevant
User Experience Metrics
- Response Time: Time from query to answer
- User Satisfaction: Feedback ratings on answer quality
- Task Completion: Rate of successful task completion
- Engagement: User interaction patterns with the system
Resources
Reference Documentation
- Vector Database Comparison - Detailed comparison of vector database options
- Embedding Models Guide - Model selection and optimization
- Retrieval Strategies - Advanced retrieval techniques
- Document Chunking - Chunking strategies and best practices
- LangChain4j RAG Guide - Official implementation patterns
Assets
assets/vector-store-config.yaml- Configuration templates for different vector storesassets/retriever-pipeline.java- Complete RAG pipeline implementationassets/evaluation-metrics.java- Evaluation framework code
Constraints and Limitations
- Token Limits: Respect model context window limitations
- API Rate Limits: Manage external API rate limits and costs
- Data Privacy: Ensure compliance with data protection regulations
- Resource Requirements: Consider memory and computational requirements
- Maintenance: Plan for regular updates and system monitoring
Security Considerations
- Secure access to vector databases and embedding services
- Implement proper authentication and authorization
- Validate and sanitize user inputs
- Monitor for abuse and unusual usage patterns
- Regular security audits and penetration testing