Claude Code Plugins

Community-maintained marketplace

Feedback

ai-llm-rag-engineering

@vasilyu1983/AI-Agents-public
14
0

>

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name ai-llm-rag-engineering
description Operational patterns for RAG systems (recent advances): page-level chunking (0.648 accuracy), hybrid retrieval with cross-encoder reranking, adaptive/multimodal/self-correcting systems, recall@k/nDCG evaluation, groundedness metrics, real-time quality tracking. Emphasizes **Modern shift to dynamic, intelligent retrieval beyond static RAG.

RAG Engineering – Quick Reference

This skill provides practical, production-grade RAG design patterns with recent advances:

  • Chunking strategies: Page-level chunking (0.648 accuracy, highest in NVIDIA benchmarks)
  • Contextual Retrieval: Anthropic's 2024 technique (67% accuracy improvement with prompt caching)
  • Hybrid retrieval: Lexical (BM25) + vector + cross-encoder reranking
  • Reranking: Cross-encoder (ms-marco-TinyBERT-L-2-v2, 4.3M params, outperforms larger models)
  • RAG evaluation: Recall@K, Precision@K, nDCG, groundedness, verbosity, instruction following
  • Modern paradigm shift: Adaptive, multimodal, self-correcting systems (static RAG is over)

Key Insights:

  • Page-level chunking achieved highest accuracy (0.648) with lowest variance
  • Contextual Retrieval reduces retrieval failures by 67% when combined with reranking
  • Semantic chunking improves recall by up to 9% over simpler methods
  • Hybrid retrieval + reranking drastically improves accuracy
  • Era of static RAG is over - adaptive, wise retrieval is mainstream

It focuses on doing, not explaining theory.

Scope note: Retrieval algorithm tuning (BM25/HNSW/hybrid, query rewriting) lives in ai-llm-search-retrieval; this skill covers RAG-specific packaging, context injection, and grounded generation.


Quick Reference

Task Tool/Framework Command/Pattern When to Use
Chunking Page-level, Semantic RecursiveCharacterTextSplitter (400-512) 0.648 accuracy, 85-90% recall
Contextual Retrieval Anthropic Claude Generate chunk context + prompt caching 67% failure reduction, $1.02/M tokens
Hybrid Retrieval BM25 + Vector LlamaIndex, LangChain, Haystack Significant relevance benefits (modern standard)
Reranking Cross-encoder ms-marco-TinyBERT-L-2-v2 (4.3M params) Drastically improves accuracy, <100ms
Vector Index HNSW, IVF FAISS, Pinecone, Qdrant, Weaviate <10M: HNSW, >10M: IVF/ScaNN
Evaluation RAGAS, TruLens Recall@K, nDCG, groundedness metrics Quality validation, A/B testing

Decision Tree: RAG Architecture Selection

Building RAG system: [Architecture Path]
    ├─ Document type?
    │   ├─ Page-structured? → Page-level chunking (0.648 accuracy, lowest variance)
    │   ├─ Technical docs? → Semantic chunking (9% recall improvement)
    │   └─ Simple content? → RecursiveCharacterTextSplitter (400-512, 85-90% recall)
    │
    ├─ Retrieval accuracy low?
    │   ├─ Multi-entity docs? → Contextual Retrieval (67% failure reduction)
    │   ├─ Noisy results? → Cross-encoder reranking (TinyBERT, <100ms)
    │   └─ Mixed queries? → Hybrid retrieval (BM25 + vector + reranking)
    │
    ├─ Dataset size?
    │   ├─ <100k chunks? → Flat index (exact search)
    │   ├─ 100k-10M? → HNSW (low latency)
    │   └─ >10M? → IVF/ScaNN/DiskANN (scalable)
    │
    └─ Production quality?
        └─ Full pipeline: Page-level + Contextual + Hybrid + Reranking → Optimal accuracy

When to Use This Skill

Claude should invoke this skill when the user asks:

  • "Help me design a RAG pipeline."
  • "How should I chunk this document?"
  • "Optimize retrieval for my use case."
  • "My RAG system is hallucinating — fix it."
  • "Choose the right vector database / index type."
  • "Create a RAG evaluation framework."
  • "Debug why retrieval gives irrelevant results."

Related Skills

For adjacent topics, reference these skills:


Detailed Guides

Core RAG Architecture

  • Pipeline Architecture - End-to-end RAG pipeline structure, ingestion, freshness, index hygiene, embedding selection
  • Chunking Strategies - Modern benchmarks (page-level 0.648 accuracy, semantic, RecursiveCharacterTextSplitter 400-512)
  • Index Selection Guide - Vector database configuration, HNSW/IVF/Flat selection, parameter tuning

Advanced Retrieval Techniques

  • Retrieval Patterns - Dense retrieval, hybrid search, query preprocessing, reranking workflow, metadata filtering
  • Contextual Retrieval Guide - Anthropic's 2024 technique (67% failure reduction), prompt caching, implementation
  • Grounding Checklists - Context compression, hallucination control, citation patterns, answerability validation

Production & Evaluation

  • RAG Evaluation Guide - Recall@K, nDCG, groundedness, RAGAS/TruLens, A/B testing, sliced evaluation
  • Advanced RAG Patterns - Graph/multimodal RAG, online evaluation, telemetry, shadow/canary testing, adaptive retrieval
  • RAG Troubleshooting - Failure mode triage, debugging irrelevant results, hallucination fixes

Existing Detailed Patterns


Templates

Chunking & Ingestion

Embedding & Indexing

Retrieval & Reranking

Context Packaging & Grounding

Evaluation

Navigation

Resources

Templates

Data


External Resources

See data/sources.json for:

  • Embedding models (OpenAI, Cohere, Sentence Transformers, Voyage AI, Jina)
  • Vector DBs (FAISS, Pinecone, Qdrant, Weaviate, Milvus, Chroma, pgvector, LanceDB)
  • Hybrid search libraries (Elasticsearch, OpenSearch, Typesense, Meilisearch)
  • Reranking models (Cohere Rerank, Jina Reranker, RankGPT, Flashrank)
  • Evaluation frameworks (RAGAS, TruLens, DeepEval, BEIR)
  • RAG frameworks (LlamaIndex, LangChain, Haystack, txtai)
  • Advanced techniques (RAG Fusion, CRAG, Self-RAG, Contextual Retrieval)
  • Production platforms (Vectara, AWS Kendra)

Use this skill whenever the user needs retrieval-augmented system design or debugging, not prompt work or deployment.