Claude Code Plugins

Community-maintained marketplace

Feedback

ai-evaluation-suite

@doctorduke/claude-config
0
0

Comprehensive AI/LLM evaluation toolkit for production AI systems. Covers LLM output quality, prompt engineering, RAG evaluation, agent performance, hallucination detection, bias assessment, cost/token optimization, latency metrics, model comparison, and fine-tuning evaluation. Includes BLEU/ROUGE metrics, perplexity, F1 scores, LLM-as-judge patterns, and benchmarks like MMLU and HumanEval.

Install Skill

Shared

Installs to .agents/skills, used by Codex, Amp, Warp, Cursor, OpenCode, and more.

CodexAmp
Warp
CursorOpenCode
Cline
Gemini CLI
GitHub Copilot
Personal

Available across projects.

$npx skills-installer add @doctorduke/claude-config/ai-evaluation-suite --client shared
Project

Writes to .agents/skills.

$npx skills-installer add @doctorduke/claude-config/ai-evaluation-suite -p --client shared
Note: Review the skill instructions before using it.

SKILL.md

404: Not Found