Claude Code Plugins

Community-maintained marketplace

Feedback

Complete LLM development and engineering skill. Covers strategy selection (prompting vs fine-tuning vs RAG), dataset design, PEFT/LoRA fine-tuning, evaluation workflows, vLLM deployment, and production optimization. Modern best practices for building, evaluating, and scaling LLM systems.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name ai-llm
description Complete LLM development and engineering skill. Covers strategy selection (prompting vs fine-tuning vs RAG), dataset design, PEFT/LoRA fine-tuning, evaluation workflows, vLLM deployment, and production optimization. Modern best practices for building, evaluating, and scaling LLM systems.

LLM Development & Engineering — Complete Reference

Build, evaluate, and deploy LLM systems with modern production standards.

This skill covers the full LLM lifecycle:

  • Development: Strategy selection, dataset design, instruction tuning, PEFT/LoRA fine-tuning
  • Evaluation: Automated testing, LLM-as-judge, metrics, rollout gates
  • Deployment: vLLM 0.12 (V1 architecture, 24x throughput), FP8/FP4 quantization
  • Operations: Drift detection, retraining triggers, monitoring
  • Safety: Multi-layered defenses, AI-powered guardrails

For detailed patterns: See Resources and Templates sections below.


Quick Reference

Task Tool/Framework Command/Pattern When to Use
RAG Pipeline LlamaIndex, LangChain Page-level chunking + hybrid retrieval Dynamic knowledge, 0.648 accuracy
Agentic Workflow LangGraph, AutoGen, CrewAI ReAct, multi-agent orchestration Complex tasks, tool use required
Prompt Design Anthropic, OpenAI guides CoT, few-shot, structured Task-specific behavior control
Evaluation LangSmith, W&B, RAGAS Multi-metric (hallucination, bias, cost) Quality validation, A/B testing
Production Deploy vLLM 0.12, TensorRT-LLM FP8/FP4 quantization, PagedAttention v2 High-throughput serving, cost optimization
Monitoring Arize Phoenix, LangFuse Drift detection, 18-second response Production LLM systems

Decision Tree: LLM System Architecture

Building LLM application: [Architecture Selection]
    ├─ Need current knowledge?
    │   ├─ Simple Q&A? → Basic RAG (page-level chunking + hybrid retrieval)
    │   └─ Complex retrieval? → Advanced RAG (reranking + contextual retrieval)
    │
    ├─ Need tool use / actions?
    │   ├─ Single task? → Simple agent (ReAct pattern)
    │   └─ Multi-step workflow? → Multi-agent (LangGraph, CrewAI)
    │
    ├─ Static behavior sufficient?
    │   ├─ Quick MVP? → Prompt engineering (CI/CD integrated)
    │   └─ Production quality? → Fine-tuning (PEFT/LoRA)
    │
    └─ Best results?
        └─ Hybrid (RAG + Fine-tuning + Agents) → Comprehensive solution

See Decision Matrices for detailed selection criteria.


When to Use This Skill

Claude should invoke this skill when the user asks about:

  • LLM preflight/project checklists, production best practices, or data pipelines
  • Building or deploying RAG, agentic, or prompt-based LLM apps
  • Prompt design, chain-of-thought (CoT), ReAct, or template patterns
  • Troubleshooting LLM hallucination, bias, retrieval issues, or production failures
  • Evaluating LLMs: benchmarks, multi-metric eval, or rollout/monitoring
  • LLMOps: deployment, rollback, scaling, resource optimization
  • Technology stack selection (models, vector DBs, frameworks)
  • Production deployment strategies and operational patterns

Scope Boundaries (Use These Skills for Depth)


Resources (Best Practices & Operational Patterns)

Comprehensive operational guides with checklists, patterns, and decision frameworks:

Core Operational Patterns

  • Project Planning Patterns - Stack selection, FTI pipeline, performance budgeting

    • AI engineering stack selection matrix
    • Feature/Training/Inference (FTI) pipeline blueprint
    • Performance budgeting and goodput gates
    • Progressive complexity (prompt → RAG → fine-tune → hybrid)
  • Production Checklists - Pre-deployment validation and operational checklists

    • LLM lifecycle checklist (modern production standards)
    • Data & training, RAG pipeline, deployment & serving
    • Safety/guardrails, evaluation, agentic systems
    • Reliability & data infrastructure (DDIA-grade)
    • Weekly production tasks
  • Common Design Patterns - Copy-paste ready implementation examples

    • Chain-of-Thought (CoT) prompting
    • ReAct (Reason + Act) pattern
    • RAG pipeline (minimal to advanced)
    • Agentic planning loop
    • Self-reflection and multi-agent collaboration
  • Decision Matrices - Quick reference tables for selection

    • RAG type decision matrix (naive → advanced → modular)
    • Production evaluation table with targets and actions
    • Model selection matrix (GPT-4, Claude, Gemini, self-hosted)
    • Vector database, embedding model, framework selection
    • Deployment strategy matrix
  • Anti-Patterns - Common mistakes and prevention strategies

    • Data leakage, prompt dilution, RAG context overload
    • Agentic runaway, over-engineering, ignoring evaluation
    • Hard-coded prompts, missing observability
    • Detection methods and prevention code examples

Domain-Specific Patterns

Note: Each resource file includes preflight/validation checklists, copy-paste reference tables, inline templates, anti-patterns, and decision matrices.


Templates (Copy-Paste Ready)

Production templates by use case and technology:

RAG Pipelines

  • Basic RAG - Simple retrieval-augmented generation
  • Advanced RAG - Hybrid retrieval, reranking, contextual embeddings

Prompt Engineering

Agentic Workflows

Data Pipelines

Deployment

Evaluation


Shared Utilities (Centralized patterns — extract, don't duplicate)


Related Skills

This skill integrates with complementary Claude Code skills:

Core Dependencies

  • ai-rag - Advanced RAG patterns, chunking strategies, hybrid retrieval, reranking
  • ai-rag - Search optimization, BM25 tuning, vector search, ranking pipelines
  • ai-prompt-engineering - Systematic prompt design, evaluation, testing, and optimization
  • ai-agents - Agent architectures, tool use, multi-agent systems, autonomous workflows

Production & Operations

  • ai-llm - Model training, fine-tuning, dataset creation, instruction tuning
  • ai-llm-inference - Production serving, quantization, batching, GPU optimization
  • ai-mlops - Deployment patterns, monitoring, drift detection, API design
  • ai-mlops - Security guardrails, prompt injection defense, privacy protection

External Resources

See data/sources.json for 50+ curated authoritative sources:

  • Official LLM platform docs - OpenAI, Anthropic, Gemini, Mistral, Azure OpenAI, AWS Bedrock
  • Open-source models and frameworks - HuggingFace Transformers, LLaMA, vLLM 0.12 (V1 architecture, PyTorch 2.9), PEFT/LoRA, DeepSpeed
  • RAG frameworks and vector DBs - LlamaIndex, LangChain 1.1+, LangGraph, LangGraph Studio v2, Haystack, Pinecone, Qdrant, Chroma
  • 2025 Agentic frameworks - Anthropic Agent SDK, AutoGen, CrewAI, LangGraph Multi-Agent, Semantic Kernel
  • 2025 RAG innovations - Microsoft GraphRAG (knowledge graphs), Pathway (real-time), hybrid retrieval
  • Prompt engineering - Anthropic Prompt Library, Prompt Engineering Guide, CoT/ReAct patterns
  • Evaluation and monitoring - OpenAI Evals, HELM, Anthropic Evals, LangSmith, W&B, Arize Phoenix
  • Production deployment - LiteLLM, Ollama, RunPod, Together AI, vLLM serving

Usage

For New Projects

  1. Start with Production Checklists - Validate all pre-deployment requirements
  2. Use Decision Matrices - Select technology stack
  3. Reference Project Planning Patterns - Design FTI pipeline
  4. Implement with Common Design Patterns - Copy-paste code examples
  5. Avoid Anti-Patterns - Learn from common mistakes

For Troubleshooting

  1. Check Anti-Patterns - Identify failure modes and mitigations
  2. Use Decision Matrices - Evaluate if architecture fits use case
  3. Reference Common Design Patterns - Verify implementation correctness

For Ongoing Operations

  1. Follow Production Checklists - Weekly operational tasks
  2. Integrate Evaluation Patterns - Continuous quality monitoring
  3. Apply LLMOps Best Practices - Deployment and rollback procedures

Navigation Summary

Quick Decisions: Decision Matrices Pre-Deployment: Production Checklists Planning: Project Planning Patterns Implementation: Common Design Patterns Troubleshooting: Anti-Patterns

Domain Depth: LLMOps | Evaluation | Prompts | Agents | RAG

Templates: templates/ - Copy-paste ready production code

Sources: data/sources.json - Authoritative documentation links