| name | skill-evaluator |
| license | MIT |
| description | Comprehensive evaluation toolkit for analyzing Claude skills across security, quality, utility, and compliance dimensions. This skill should be used when users need to evaluate a skill before installation, review before publishing, or assess overall quality and safety. Performs 5-layer security analysis, validates structure and documentation, checks compliance with skill-creator guidelines, and generates markdown reports with scoring and recommendations. |
Skill Evaluator
Comprehensive evaluation toolkit for analyzing Claude skills before installation or publication.
Purpose
Evaluate Claude skills across four critical dimensions:
- Security - Identify vulnerabilities, injection risks, privilege escalation, and security weaknesses
- Quality - Assess code quality, documentation clarity, structural organization, and functionality
- Utility - Evaluate practical value, usability, scope appropriateness, and effectiveness
- Compliance - Validate adherence to skill-creator guidelines and best practices
Generate detailed markdown reports with scores (0-100), risk assessments, and actionable recommendations.
When to Use This Skill
Use this skill when:
- Evaluating skills before installation - Assess safety and quality of third-party skills
- Pre-publication review - Validate skills before distributing to others
- Security auditing - Check for vulnerabilities and security risks
- Quality assessment - Review code quality and documentation
- Compliance validation - Ensure skills follow skill-creator guidelines
Evaluation Modes
Mode 1: Full Evaluation (Default)
Usage: "Evaluate this skill at [path/to/skill]"
Comprehensive analysis across all four dimensions with detailed scoring and recommendations.
Output: Complete markdown report with overall score, security analysis, quality assessment, utility evaluation, compliance validation, and recommendations.
Mode 2: Security-Focused Quick Check
Usage: "Is this skill safe to install?" or "Check the security of [skill-path]"
Deep security analysis with brief checks on other dimensions.
Output: Security-focused report emphasizing vulnerabilities, risk level, and installation safety.
Mode 3: Pre-Publication Review
Usage: "Review my skill before I publish it" or "Help me improve [skill-path] for publication"
Full evaluation with detailed, actionable improvement guidance for skill authors.
Output: Comprehensive report with prioritized recommendations for improvement.
How to Use
Basic Usage
Provide the skill path (directory or .zip file):
"Evaluate the skill at /path/to/my-skill" "Is /path/to/skill.zip safe to install?"Claude will execute evaluation scripts to analyze the skill:
scripts/evaluate_skill.py- Main orchestratorscripts/security_scanner.py- 5-layer security analysisscripts/quality_checker.py- Quality assessmentscripts/compliance_validator.py- Compliance validationscripts/report_generator.py- Report creation
Receive a markdown report with scores, findings, and recommendations
Understanding the Report
Overall Score (0-100)
Weighted calculation:
- Security: 35% (highest weight due to critical importance)
- Quality: 25%
- Utility: 20%
- Compliance: 20%
Score Ranges:
- 90-100: EXCELLENT - Highly recommended
- 75-89: GOOD - Recommended
- 60-74: FAIR - Use with caution
- 40-59: POOR - Not recommended
- 0-39: CRITICAL - Do not install
Security Analysis
Uses 5-layer defense-in-depth architecture:
- Layer 1: Input Validation & Sanitization - Command injection, path traversal, file validation
- Layer 2: Execution Environment Control - Privilege escalation, sandboxing, environment manipulation
- Layer 3: Output Sanitization - XSS prevention, information disclosure, data exposure
- Layer 4: Privilege Management - Credential handling, weak cryptography, authentication
- Layer 5: Self-Protection - DoS patterns, SSRF, resource exhaustion
Vulnerability Severity:
- CRITICAL: Command injection, arbitrary code execution, privilege escalation
- HIGH: Path traversal, insecure deserialization, SSRF
- MEDIUM: Information disclosure, weak crypto, XSS
- LOW: Minor issues, hardening opportunities
Security Overrides:
- Security score < 50 → ❌ DO NOT INSTALL (automatic)
- Any CRITICAL vulnerability → ❌ DO NOT INSTALL (automatic)
Quality Assessment
Four quality dimensions (25 points each):
- Code Quality - Readability, error handling, modularity, dependencies, best practices
- Documentation - Purpose clarity, usage instructions, resource references, writing quality, completeness
- Structure & Organization - Directory structure, file naming, YAML frontmatter
- Functionality - Practical value, appropriate tool usage, reusability, completeness
Utility Evaluation
Assesses practical value (100 points):
- Problem-solving value (25 pts) - Addresses real needs
- Usability (25 pts) - Clear and easy to use
- Scope (25 pts) - Appropriate complexity and boundaries
- Effectiveness (25 pts) - Works as described
Compliance Validation
Validates against skill-creator guidelines (100 points):
- SKILL.md structure (10 pts)
- YAML frontmatter (20 pts)
- Progressive disclosure (15 pts)
- Scripts/references/assets usage (30 pts total)
- Writing style (10 pts)
- Trigger description (10 pts)
Critical Violations (Auto-Fail):
- Missing SKILL.md
- Missing required YAML fields
- Invalid YAML syntax
Bundled Resources
Scripts (scripts/)
Execute these for evaluation:
evaluate_skill.py- Main orchestrator coordinating all analysessecurity_scanner.py- 5-layer security architecture with pattern detectionquality_checker.py- Code quality, documentation, and structure assessmentcompliance_validator.py- Guideline adherence and compliance checkingreport_generator.py- Markdown report generation from results
References (references/)
Load these for detailed evaluation criteria:
security_patterns.md- Vulnerability pattern database with detection criteria and secure examplesquality_criteria.md- Quality assessment rubrics and scoring guidelinescompliance_checklist.md- skill-creator guideline requirementsevaluation_methodology.md- Evaluation process, scoring formulas, and report structure
Assets (assets/)
report_template.md- Markdown report template with structured sections
Evaluation Workflow
Step 1: Skill Discovery
Accept skill input (directory or .zip), extract if needed, identify SKILL.md and bundled resources.
Step 2: Run Analyses
Execute evaluations: Security Scanner → Quality Checker → Compliance Validator → Utility Evaluator
Step 3: Calculate Scores
Apply weighted formula and override rules:
Overall = (Security × 0.35) + (Quality × 0.25) + (Utility × 0.20) + (Compliance × 0.20)
Step 4: Generate Report
Create markdown report using template with executive summary, detailed analyses, and recommendations.
Step 5: Save Report
Write report to {skill_name}_evaluation_report.md and present to user.
Installation Recommendations
- ✅ HIGHLY RECOMMENDED (90-100) - Excellent quality, safe to install
- ✅ RECOMMENDED (75-89) - Good quality, safe to install
- ⚠️ USE WITH CAUTION (60-74) - Review findings before installing
- ⚠️ NOT RECOMMENDED (40-59) - Major improvements needed
- ❌ DO NOT INSTALL (0-39 or security override) - Critical issues, unsafe
Limitations
Can Assess
- ✅ Static code analysis
- ✅ Pattern-based vulnerability detection
- ✅ Structure and compliance
- ✅ Documentation quality
Cannot Assess
- ❌ Runtime behavior
- ❌ Performance at scale
- ❌ Novel attack vectors
- ❌ Subjective satisfaction
⚠️ Important Disclaimers
READ CAREFULLY BEFORE USING THIS SKILL
No Guarantee of Safety
This evaluation CANNOT determine with certainty that a skill is safe. Like all security analysis tools:
- Cannot prove absence of vulnerabilities - Only detect known patterns; novel or obfuscated attacks may go undetected
- Static analysis limitations - Cannot assess runtime behavior, dynamic code execution, or context-dependent risks
- False negatives possible - Sophisticated malicious code may evade pattern-based detection
- Time-bound assessment - New vulnerabilities may be discovered after evaluation
Use as ONE Input Only
This evaluation should be used as ONE input into your security decision, not the sole determining factor.
You are responsible for:
- Manual code review - Read and understand the skill's code yourself
- Test in isolated environment - Run skills in sandboxed/test environments first
- Organizational policies - Always follow your organization's security policies and approval processes
- Risk assessment - Consider your specific threat model and risk tolerance
- Ongoing monitoring - Continue to monitor skill behavior after installation
Your Responsibility
- YOU are responsible for skills you install - Not the evaluator, not the skill author
- Follow organizational policies - Security policies override any evaluation recommendation
- Trust but verify - Even "HIGHLY RECOMMENDED" skills should be reviewed
- When in doubt, don't install - If unsure about a skill's safety, consult security experts
Limitations of Automated Analysis
This tool performs pattern-based static analysis, which means:
- ✅ Good at: Detecting common vulnerability patterns, structural issues, compliance violations
- ❌ Cannot detect: Zero-day exploits, logic bombs, social engineering, supply chain attacks
- ❌ Cannot assess: Author trustworthiness, long-term maintenance, backdoor triggers
- ❌ Cannot guarantee: Complete security, absence of malicious intent, future safety
Legal Disclaimer
NO WARRANTIES: This evaluation tool is provided "as-is" without warranties of any kind. The authors and contributors assume no liability for damages resulting from use of this tool or skills evaluated by it.
USE AT YOUR OWN RISK: You accept all risks associated with installing and using evaluated skills.
Examples
Example 1: Security Check
User: "Is /downloads/data-analyzer.zip safe?"
Output: Security report with vulnerabilities, risk level, and installation recommendation.
Example 2: Pre-Publication
User: "Review my skill: /my-projects/excel-parser/"
Output: Full evaluation with priority improvements and publication readiness assessment.
Example 3: Full Evaluation
User: "Evaluate /skills/api-connector/"
Output: Complete report with all dimensions, scores, and recommendations.
Best Practices for Skill Authors
Security
- Never use subprocess with shell=True
- Validate and sanitize inputs
- Use Path.resolve() for paths
- Avoid hardcoded credentials
- Implement error handling
Quality
- Write clean, readable code
- Add type hints and docstrings
- Remove TODO placeholders
- Provide comprehensive documentation
Compliance
- Use imperative/infinitive form
- Write clear, specific descriptions
- Follow progressive disclosure
- Organize files correctly
- Use lowercase-with-hyphens naming
Utility
- Solve real problems
- Provide clear instructions
- Include practical examples
- Ensure appropriate scope