404: Not Found
ai-evaluation-suite
@doctorduke/claude-config0
0
Comprehensive AI/LLM evaluation toolkit for production AI systems. Covers LLM output quality, prompt engineering, RAG evaluation, agent performance, hallucination detection, bias assessment, cost/token optimization, latency metrics, model comparison, and fine-tuning evaluation. Includes BLEU/ROUGE metrics, perplexity, F1 scores, LLM-as-judge patterns, and benchmarks like MMLU and HumanEval.
Install Skill
1Download skill
2Enable skills in Claude
Open claude.ai/settings/capabilities and find the "Skills" section
3Upload to Claude
Click "Upload skill" and select the downloaded ZIP file
Note: Please verify skill by going through its instructions before using it.