| name | critical-skill-finder |
| description | Find and evaluate publicly available Claude skills using logically valid metrics. Use when searching for custom skills for a specific purpose. Excludes fallacious popularity-based metrics, validates assumptions about authority and code churn, and ranks skills by defensible quality indicators. |
Critical Skill Finder
Find publicly available Claude skills for a specific purpose using rigorous evaluation metrics with validated logical connections to quality.
When to Use This Skill
Trigger this skill when:
- Searching for custom Claude skills for a specific capability
- Evaluating multiple skills to find the best fit
- Comparing skills across repositories
- Validating claims made by skill authors
Core Principle: Logical Validity
Every evaluation metric must have a defensible logical connection to skill quality. Metrics are categorized by validity:
- ✅ Valid: Direct logical connection to quality
- ⚠️ Moderate: Plausible connection, requires context
- ⚠️ Conditional: Valid only if assumptions are verified
- ❌ Fallacious: No valid logical connection (excluded)
Excluded Metrics (Logically Fallacious)
DO NOT use these metrics to assess skill quality:
| Metric | Fallacy | Why It Fails |
|---|---|---|
| Stars | Argumentum ad populum | Popularity ≠ quality; viral mediocrity exists |
| Forks | Argumentum ad populum | Copying ≠ quality |
| Download count | Argumentum ad populum | Usage ≠ effectiveness |
| Mentions in awesome-lists | Argumentum ad populum | Curation criteria unknown |
| Total commits | Effort ≠ outcome | More work ≠ better result |
| Lines of code | Verbosity ≠ clarity | Longer ≠ better |
| Time since creation | Age ≠ quality | Old ≠ mature |
| Number of contributors | Team size ≠ quality | More cooks ≠ better broth |
Evaluation Framework
Tier 1: Content Analysis (Most Valid)
Directly examine the SKILL.md artifact. This is the most defensible evaluation method.
| Metric | Validity | How to Assess |
|---|---|---|
| Domain specificity | ✅ Valid (fitness for purpose) | Search for domain-specific terms, patterns, frameworks |
| Framework flexibility | ✅ Valid (fitness for purpose) | Does it assume one tool or support multiple? |
| Instruction depth | ⚠️ Moderate | Assess structure, comprehensiveness, clarity |
| Capability coverage | ⚠️ Moderate | What specific capabilities does it enable? |
| Example quality | ⚠️ Moderate | Are examples specific and actionable? |
Action: Fetch and read the raw SKILL.md file. Quote relevant sections as evidence.
Tier 2: Iterative Refinement (Moderate - Validate for Churn)
Examine git history for the specific skill file, not the whole repository.
| Metric | Validity | How to Assess |
|---|---|---|
| Commits to skill file | ⚠️ Moderate | Git history for the exact file path |
| Commit message quality | ⚠️ Moderate | "fix", "improve" vs "add", "initial", formatting |
| Churn-controlled ratio | ⚠️ Moderate (better) | (skill commits / total repo commits) adjusted for (skill size / repo size) |
Validation required: High commit count could indicate:
- ✅ Active refinement based on feedback (good)
- ❌ Churn from instability or poor initial design (bad)
- ❌ Cosmetic changes (typos, formatting) (neutral)
How to validate: Read actual commit messages. Compare skill-specific commit ratio to repo-wide activity. High ratio in active repo = focused attention. High ratio in dead repo = unclear signal.
Tier 3: Authority Signals (Conditional - Validate Relevance)
Author credentials can indicate quality, but only if:
- Credentials are verifiable
- Expertise is relevant to the skill's domain
| Metric | Validity | How to Assess |
|---|---|---|
| Author's stated background | ⚠️ Conditional | GitHub bio, linked portfolio |
| Author's related projects | ⚠️ Conditional | Other repos in same domain? |
Validation required:
- Is the claimed expertise verifiable (public profile, blog, employer)?
- Is the expertise relevant (QA background for QA skill, not general "developer")?
- Beware: Authority in unrelated domain = fallacious appeal to authority
Tier 4: Published Outcomes (Rare but Valuable)
Empirical evidence of skill effectiveness is strongest, but rare.
| Metric | Validity | How to Assess |
|---|---|---|
| Case studies with metrics | ⚠️ Moderate | Search for skill name + "results" |
| Before/after comparisons | ⚠️ Moderate | Documented usage outcomes |
Validation required:
- What was the methodology?
- Is it self-reported (selection bias risk)?
- Are results reproducible or anecdotal?
Search Strategy
Step 1: Identify Candidate Skills
Search these sources:
GitHub:
- "claude skill [purpose]"
- "SKILL.md [purpose]"
- Repositories: anthropics/skills, travisvn/awesome-claude-skills,
obra/superpowers, wshobson/agents, daymade/claude-code-skills
Skill directories:
- claude-plugins.dev
- skillsmp.com
Step 2: Fetch and Analyze Content (Tier 1)
For each candidate:
- Fetch the raw SKILL.md file
- Assess domain specificity for user's stated purpose
- Evaluate instruction depth and example quality
- Quote relevant sections as evidence
Step 3: Check Git History (Tier 2)
For promising candidates:
- Check commits to the specific skill file
- Calculate churn-controlled ratio
- Read commit messages to validate refinement vs churn
- Note file creation date vs last modification
Step 4: Validate Authority (Tier 3)
For top candidates:
- Look up author's GitHub profile
- Check for relevant domain expertise
- Verify credentials are real (not just claimed)
- Note if expertise is relevant or tangential
Step 5: Search for Outcomes (Tier 4)
For finalists:
- Web search for skill name + outcomes/results/case study
- Check author's blog for usage reports
- Assess methodology of any claimed outcomes
Output Format
For each skill evaluated:
### [Skill Name]
**Source:** [repo URL]
**File:** [path to SKILL.md]
#### Tier 1: Content Analysis
- Domain specificity: [score + evidence with quotes]
- Framework flexibility: [score + evidence]
- Instruction depth: [assessment]
- Capabilities: [list]
- Example quality: [assessment]
#### Tier 2: Iterative Refinement
- Commits to this file: [count]
- Total repo commits: [count]
- Churn ratio: [percentage]
- Churn validation: [substantive vs cosmetic - cite commit messages]
#### Tier 3: Authority Signals
- Author background: [findings]
- Relevance validation: [is expertise relevant to this skill's domain?]
- Verifiability: [can claims be verified?]
#### Tier 4: Published Outcomes
- [Any found, with methodology assessment]
#### Overall Assessment
- Strengths: [based on valid metrics only]
- Weaknesses: [based on valid metrics only]
- Fitness for stated purpose: [high/medium/low with justification]
Final Ranking
Rank skills by weighted validity:
- Tier 1 (Content) - Primary factor. Poor content = disqualify regardless of other metrics.
- Tier 2 (Refinement) - Secondary factor, only if churn validated as improvement.
- Tier 3 (Authority) - Tiebreaker, only if credentials validated as relevant.
- Tier 4 (Outcomes) - Strongest evidence if found, but rare.
Anti-Patterns to Avoid
- Ranking by stars - Popularity is not quality
- Trusting commit count without reading commits - Could be churn
- Accepting author claims without verification - Could be exaggerated
- Assuming comprehensive = good - Breadth ≠ depth
- Ignoring domain fit - Best skill overall ≠ best skill for purpose
- Treating all metrics equally - Validity varies; weight accordingly
Example Validation Dialogue
When presenting findings, explicitly state validation:
"The skill has 47 commits to the specific file. I examined the commit messages and found 38 were substantive improvements ('fix edge case in...', 'add support for...') and 9 were cosmetic ('fix typo', 'update formatting'). The 81% substantive rate suggests genuine iteration rather than churn."
"The author claims 10 years of QA experience. Their GitHub bio links to a LinkedIn profile showing employment at [Company] as QA Lead from 2015-2023. This is verifiable and directly relevant to a QA skill."
"The skill has 24,000 stars but only 3 commits. High popularity with minimal iteration suggests viral spread rather than quality refinement. Stars excluded from assessment per logical validity framework."