Claude Code Plugins

Community-maintained marketplace

Feedback

generalization-evaluator

@Cloudhabil/AGI-Server
0
0

Cross-domain evaluation to estimate generality and detect blind spots. Use when asked to assess broad capability, compare models across domains, or identify missing skills.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name generalization-evaluator
description Cross-domain evaluation to estimate generality and detect blind spots. Use when asked to assess broad capability, compare models across domains, or identify missing skills.

Generalization Evaluator

Use this skill to measure generality across domains and identify weak coverage.

Workflow

  1. Load a task set (use references/task_set.example.json).
  2. Run the task set with a consistent runner.
  3. Score pass/fail per task and summarize by domain.
  4. Rank gaps by impact.

Scripts

  • Run: python scripts/run_eval.py --tasks references/task_set.example.json --runner ollama --model qwen3:latest

Output Expectations

  • Provide a domain score table and a short summary of weaknesses.
  • List the top 3 skill gaps with suggested skill actions.