name	generalization-evaluator
description	Cross-domain evaluation to estimate generality and detect blind spots. Use when asked to assess broad capability, compare models across domains, or identify missing skills.

Generalization Evaluator

Name: generalization-evaluator
Author: Cloudhabil

Use this skill to measure generality across domains and identify weak coverage.

Workflow

Run: python scripts/run_eval.py --tasks references/task_set.example.json --runner ollama --model qwen3:latest