Claude Code Plugins

Community-maintained marketplace

Feedback
8
0

Measure model performance on test datasets. Use when assessing accuracy, precision, recall, and other metrics.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name evaluate-model
description Measure model performance on test datasets. Use when assessing accuracy, precision, recall, and other metrics.
mcp_fallback none
category ml
tier 2

Evaluate Model

Measure machine learning model performance using appropriate metrics for the task (classification, regression, etc.).

When to Use

  • Comparing different model architectures
  • Assessing performance on test/validation datasets
  • Detecting overfitting or underfitting
  • Reporting model accuracy for papers and documentation

Quick Reference

# Mojo model evaluation pattern
struct ModelEvaluator:
    fn evaluate_classification(
        mut self,
        predictions: ExTensor,
        ground_truth: ExTensor
    ) -> Tuple[Float32, Float32, Float32]:
        # Returns accuracy, precision, recall
        ...

    fn evaluate_regression(
        mut self,
        predictions: ExTensor,
        ground_truth: ExTensor
    ) -> Tuple[Float32, Float32]:
        # Returns MSE, MAE
        ...

Workflow

  1. Load test data: Prepare test/validation dataset
  2. Generate predictions: Run model inference on test set
  3. Select metrics: Choose appropriate metrics (accuracy, precision, recall, F1, AUC, MSE, etc.)
  4. Calculate metrics: Compute performance metrics
  5. Analyze results: Compare to baseline and identify strengths/weaknesses

Output Format

Evaluation report:

  • Task type (classification, regression, etc.)
  • Metrics (accuracy, precision, recall, F1, AUC, etc.)
  • Per-class breakdown (if applicable)
  • Comparison to baseline model
  • Confusion matrix (classification)
  • Error analysis

References

  • See CLAUDE.md > Language Preference (Mojo for ML models)
  • See train-model skill for model training
  • See /notes/review/mojo-ml-patterns.md for Mojo tensor operations