name	critical-skill-finder
description	Find and evaluate publicly available Claude skills using logically valid metrics. Use when searching for custom skills for a specific purpose. Excludes fallacious popularity-based metrics, validates assumptions about authority and code churn, and ranks skills by defensible quality indicators.

Critical Skill Finder

Find publicly available Claude skills for a specific purpose using rigorous evaluation metrics with validated logical connections to quality.

When to Use This Skill

Trigger this skill when:

Searching for custom Claude skills for a specific capability
Evaluating multiple skills to find the best fit
Comparing skills across repositories
Validating claims made by skill authors

Core Principle: Logical Validity

Every evaluation metric must have a defensible logical connection to skill quality. Metrics are categorized by validity:

✅ Valid: Direct logical connection to quality
⚠️ Moderate: Plausible connection, requires context
⚠️ Conditional: Valid only if assumptions are verified
❌ Fallacious: No valid logical connection (excluded)

Excluded Metrics (Logically Fallacious)

DO NOT use these metrics to assess skill quality:

Metric	Fallacy	Why It Fails
Stars	Argumentum ad populum	Popularity ≠ quality; viral mediocrity exists
Forks	Argumentum ad populum	Copying ≠ quality
Download count	Argumentum ad populum	Usage ≠ effectiveness
Mentions in awesome-lists	Argumentum ad populum	Curation criteria unknown
Total commits	Effort ≠ outcome	More work ≠ better result
Lines of code	Verbosity ≠ clarity	Longer ≠ better
Time since creation	Age ≠ quality	Old ≠ mature
Number of contributors	Team size ≠ quality	More cooks ≠ better broth

Evaluation Framework

Tier 1: Content Analysis (Most Valid)

Directly examine the SKILL.md artifact. This is the most defensible evaluation method.

Metric	Validity	How to Assess
Domain specificity	✅ Valid (fitness for purpose)	Search for domain-specific terms, patterns, frameworks
Framework flexibility	✅ Valid (fitness for purpose)	Does it assume one tool or support multiple?
Instruction depth	⚠️ Moderate	Assess structure, comprehensiveness, clarity
Capability coverage	⚠️ Moderate	What specific capabilities does it enable?
Example quality	⚠️ Moderate	Are examples specific and actionable?

Action: Fetch and read the raw SKILL.md file. Quote relevant sections as evidence.

Tier 2: Iterative Refinement (Moderate - Validate for Churn)

Examine git history for the specific skill file, not the whole repository.

Metric	Validity	How to Assess
Commits to skill file	⚠️ Moderate	Git history for the exact file path
Commit message quality	⚠️ Moderate	"fix", "improve" vs "add", "initial", formatting
Churn-controlled ratio	⚠️ Moderate (better)	(skill commits / total repo commits) adjusted for (skill size / repo size)

Validation required: High commit count could indicate:

✅ Active refinement based on feedback (good)
❌ Churn from instability or poor initial design (bad)
❌ Cosmetic changes (typos, formatting) (neutral)

How to validate: Read actual commit messages. Compare skill-specific commit ratio to repo-wide activity. High ratio in active repo = focused attention. High ratio in dead repo = unclear signal.

Tier 3: Authority Signals (Conditional - Validate Relevance)

Author credentials can indicate quality, but only if:

Credentials are verifiable
Expertise is relevant to the skill's domain

Metric	Validity	How to Assess
Author's stated background	⚠️ Conditional	GitHub bio, linked portfolio
Author's related projects	⚠️ Conditional	Other repos in same domain?

Validation required:

Is the claimed expertise verifiable (public profile, blog, employer)?
Is the expertise relevant (QA background for QA skill, not general "developer")?
Beware: Authority in unrelated domain = fallacious appeal to authority

Tier 4: Published Outcomes (Rare but Valuable)

Empirical evidence of skill effectiveness is strongest, but rare.

Metric	Validity	How to Assess
Case studies with metrics	⚠️ Moderate	Search for skill name + "results"
Before/after comparisons	⚠️ Moderate	Documented usage outcomes

Validation required:

What was the methodology?
Is it self-reported (selection bias risk)?
Are results reproducible or anecdotal?

Search Strategy

Step 1: Identify Candidate Skills

Search these sources:

GitHub:
- "claude skill [purpose]"
- "SKILL.md [purpose]"
- Repositories: anthropics/skills, travisvn/awesome-claude-skills,
  obra/superpowers, wshobson/agents, daymade/claude-code-skills

Skill directories:
- claude-plugins.dev
- skillsmp.com

Step 2: Fetch and Analyze Content (Tier 1)

For each candidate:

Fetch the raw SKILL.md file
Assess domain specificity for user's stated purpose
Evaluate instruction depth and example quality
Quote relevant sections as evidence

Step 3: Check Git History (Tier 2)

For promising candidates:

Check commits to the specific skill file
Calculate churn-controlled ratio
Read commit messages to validate refinement vs churn
Note file creation date vs last modification

Step 4: Validate Authority (Tier 3)

For top candidates:

Look up author's GitHub profile
Check for relevant domain expertise
Verify credentials are real (not just claimed)
Note if expertise is relevant or tangential

Step 5: Search for Outcomes (Tier 4)

For finalists:

Web search for skill name + outcomes/results/case study
Check author's blog for usage reports
Assess methodology of any claimed outcomes

Output Format

For each skill evaluated:

### [Skill Name]
**Source:** [repo URL]
**File:** [path to SKILL.md]

#### Tier 1: Content Analysis
- Domain specificity: [score + evidence with quotes]
- Framework flexibility: [score + evidence]
- Instruction depth: [assessment]
- Capabilities: [list]
- Example quality: [assessment]

#### Tier 2: Iterative Refinement
- Commits to this file: [count]
- Total repo commits: [count]
- Churn ratio: [percentage]
- Churn validation: [substantive vs cosmetic - cite commit messages]

#### Tier 3: Authority Signals
- Author background: [findings]
- Relevance validation: [is expertise relevant to this skill's domain?]
- Verifiability: [can claims be verified?]

#### Tier 4: Published Outcomes
- [Any found, with methodology assessment]

#### Overall Assessment
- Strengths: [based on valid metrics only]
- Weaknesses: [based on valid metrics only]
- Fitness for stated purpose: [high/medium/low with justification]

Final Ranking

Rank skills by weighted validity:

Tier 1 (Content) - Primary factor. Poor content = disqualify regardless of other metrics.
Tier 2 (Refinement) - Secondary factor, only if churn validated as improvement.
Tier 3 (Authority) - Tiebreaker, only if credentials validated as relevant.
Tier 4 (Outcomes) - Strongest evidence if found, but rare.

Anti-Patterns to Avoid

Ranking by stars - Popularity is not quality
Trusting commit count without reading commits - Could be churn
Accepting author claims without verification - Could be exaggerated
Assuming comprehensive = good - Breadth ≠ depth
Ignoring domain fit - Best skill overall ≠ best skill for purpose
Treating all metrics equally - Validity varies; weight accordingly

Example Validation Dialogue

When presenting findings, explicitly state validation:

"The skill has 47 commits to the specific file. I examined the commit messages and found 38 were substantive improvements ('fix edge case in...', 'add support for...') and 9 were cosmetic ('fix typo', 'update formatting'). The 81% substantive rate suggests genuine iteration rather than churn."

"The author claims 10 years of QA experience. Their GitHub bio links to a LinkedIn profile showing employment at [Company] as QA Lead from 2015-2023. This is verifiable and directly relevant to a QA skill."

"The skill has 24,000 stars but only 3 commits. High popularity with minimal iteration suggests viral spread rather than quality refinement. Stars excluded from assessment per logical validity framework."

Install Skill

SKILL.md