name	skills-eval
description	Evaluate and improve Claude skill quality through auditing. Triggers: skill audit, quality review, compliance check, improvement suggestions, token usage analysis, skill evaluation, skill assessment, skill optimization, skill standards, skill metrics, skill performance Use when: reviewing skill quality, preparing skills for production, auditing existing skills, generating improvement recommendations, checking compliance with standards, analyzing token efficiency, benchmarking skill performance DO NOT use when: creating new skills from scratch - use modular-skills instead. DO NOT use when: writing prose for humans - use writing-clearly-and-concisely. DO NOT use when: need architectural design patterns - use modular-skills. Use this skill BEFORE shipping any skill to production. Check even if unsure.
version	2.0.0
category	skill-management
tags	evaluation, improvement, skills, optimization, quality-assurance, tool-use, performance-metrics
dependencies	modular-skills, performance-optimization
tools	skills-auditor, improvement-suggester, compliance-checker, tool-performance-analyzer, token-usage-tracker
provides	[object Object]
estimated_tokens	1800
usage_patterns	skill-audit, quality-assessment, improvement-planning, skills-inventory, tool-performance-evaluation, dynamic-discovery-optimization, advanced-tool-use-analysis, programmatic-calling-efficiency, context-preservation-quality, token-efficiency-optimization, modular-architecture-validation, integration-testing, compliance-reporting, performance-benchmarking
complexity	advanced
evaluation_criteria	[object Object]

Skills Evaluation and Improvement

Overview

Analyze and improve Claude skills. The tools audit skills against quality standards, measure token usage, and generate improvement recommendations.

Tools

skills-auditor: Scan and analyze skills
improvement-suggester: Generate prioritized fixes
compliance-checker: Validate standards and security
tool-performance-analyzer: Measure tool use patterns
token-usage-tracker: Track context efficiency

What It Is

Evaluates and improves existing skills. It runs quality assessments, performance analysis, and generates improvement plans.

Quick Start

Basic Skill Audit

# Run detailed audit of all skills
python scripts/skills_eval/skills_auditor.py --scan-all --format markdown

# Audit specific skill
python scripts/skills_eval/skills_auditor.py --skill-path path/to/skill/SKILL.md

# Or use Makefile:
make audit-skill PATH=path/to/skill/SKILL.md
make audit-all

Skill Analysis

# Deep analysis of single skill
python scripts/skill_analyzer.py --path path/to/skill/SKILL.md --verbose

# Check token usage
python scripts/token_estimator.py --file path/to/skill/SKILL.md

# Or use Makefile:
make analyze-skill PATH=path/to/skill/SKILL.md
make estimate-tokens PATH=path/to/skill/SKILL.md

Generate Improvements

# Get prioritized improvement suggestions
python scripts/skills_eval/improvement_suggester.py --skill-path path/to/skill/SKILL.md --priority high

# Check standards compliance
python scripts/skills_eval/compliance_checker.py --skill-path path/to/skill/SKILL.md --standard all

# Or use Makefile:
make improve-skill PATH=path/to/skill/SKILL.md
make check-compliance PATH=path/to/skill/SKILL.md

Typical Workflow

Discovery: Run make audit-all to find and audit all skills
Analysis: Use make audit-skill PATH=... for specific skills
Deep Dive: Run make analyze-skill PATH=... for complexity analysis
Improvements: Generate plan with make improve-skill PATH=...
Compliance: Verify standards with make check-compliance PATH=...
Optimization: Check tokens with make estimate-tokens PATH=...

Common Tasks

Quality Assessment

# detailed evaluation with scoring
./scripts/skills-auditor --scan-all --format table --priority high

# Detailed analysis of specific skill
./scripts/improvement-suggester --skill-path path/to/skill/SKILL.md --priority all --format markdown

Performance Analysis

# Token usage and efficiency
./scripts/token-usage-tracker --skill-path path/to/skill/SKILL.md --context-analysis

# Advanced tool performance metrics
./scripts/tool-performance-analyzer --skill-path path/to/skill/SKILL.md --metrics all

Standards Compliance

# Validate against Claude Skills standards
./scripts/compliance-checker --skill-path path/to/skill/SKILL.md --standard all --format summary

# Auto-fix common issues
./scripts/compliance-checker --skill-path path/to/skill/SKILL.md --auto-fix --severity high

Improvements and Optimization

# Generate prioritized improvement plan
./scripts/improvement-suggester --skill-path path/to/skill/SKILL.md --priority critical,high

# Benchmark performance
./scripts/token-usage-tracker --skill-path path/to/skill/SKILL.md --benchmark optimization-targets

Evaluation Framework

Quality Metrics Overview

The framework evaluates skills across multiple dimensions with weighted scoring:

Primary Categories (100 points total):

Structure Compliance (20 points): YAML frontmatter, progressive disclosure, organization
Content Quality (20 points): Clarity, completeness, examples, user experience
Token Efficiency (15 points): Content density, progressive loading, context optimization
Activation Reliability (15 points): Trigger effectiveness, context indicators, discovery patterns
Tool Integration (10 points): Executable components, API integration, workflow support
Trigger Isolation (10 points): ALL conditional logic in description field, no body duplicates
Enforcement Language (5 points): Appropriate intensity for skill category
Negative Triggers (5 points): Explicit "DO NOT use when" with alternatives named

Scoring System

91-100: Excellent quality, best practices implemented
76-90: Good quality with minor improvement opportunities
51-75: Meets basic requirements with room for enhancement
26-50: Below acceptable standards, needs significant improvement
0-25: Major issues requiring detailed overhaul

Priority Levels

Critical: Security issues, broken functionality, missing required fields
High: Poor structure, incomplete documentation, performance issues
Medium: Missing best practices, optimization opportunities
Low: Minor improvements, formatting issues, enhanced examples

Detailed Resources

For detailed implementation details and advanced techniques:

Shared Modules (Cross-Skill Patterns)

Anti-Rationalization Patterns: See anti-rationalization.md for red flags table and bypass patterns
Enforcement Language: See enforcement-language.md for tiered intensity templates
Trigger Patterns: See trigger-patterns.md for description field structure and CSO

Skill-Specific Modules

Trigger Isolation Analysis: See modules/trigger-isolation-analysis.md for evaluating frontmatter compliance
Skill Authoring Best Practices: See modules/skill-authoring-best-practices.md for official Claude guidance
Authoring Checklist: See modules/authoring-checklist.md for quick-reference validation checklist
Implementation Guide: See modules/evaluation-workflows.md for detailed workflows
Quality Metrics: See modules/quality-metrics.md for scoring criteria and evaluation levels
Advanced Tool Use Analysis: See modules/advanced-tool-use-analysis.md for specialized evaluation techniques
Evaluation Framework: See modules/evaluation-framework.md for detailed scoring and quality gates
Integration Patterns: See modules/integration.md for workflow integration with other skills
Troubleshooting: See modules/troubleshooting.md for common issues and solutions
Pressure Testing: See modules/pressure-testing.md for adversarial validation methodology

Tools and Automation

Tools: Executable analysis utilities in scripts/ directory
Automation: Setup and validation scripts in scripts/automation/

Install Skill

SKILL.md