| name | evaluate-model |
| description | Measure model performance on test datasets. Use when assessing accuracy, precision, recall, and other metrics. |
| mcp_fallback | none |
| category | ml |
| tier | 2 |
Evaluate Model
Measure machine learning model performance using appropriate metrics for the task (classification, regression, etc.).
When to Use
- Comparing different model architectures
- Assessing performance on test/validation datasets
- Detecting overfitting or underfitting
- Reporting model accuracy for papers and documentation
Quick Reference
# Mojo model evaluation pattern
struct ModelEvaluator:
fn evaluate_classification(
mut self,
predictions: ExTensor,
ground_truth: ExTensor
) -> Tuple[Float32, Float32, Float32]:
# Returns accuracy, precision, recall
...
fn evaluate_regression(
mut self,
predictions: ExTensor,
ground_truth: ExTensor
) -> Tuple[Float32, Float32]:
# Returns MSE, MAE
...
Workflow
- Load test data: Prepare test/validation dataset
- Generate predictions: Run model inference on test set
- Select metrics: Choose appropriate metrics (accuracy, precision, recall, F1, AUC, MSE, etc.)
- Calculate metrics: Compute performance metrics
- Analyze results: Compare to baseline and identify strengths/weaknesses
Output Format
Evaluation report:
- Task type (classification, regression, etc.)
- Metrics (accuracy, precision, recall, F1, AUC, etc.)
- Per-class breakdown (if applicable)
- Comparison to baseline model
- Confusion matrix (classification)
- Error analysis
References
- See CLAUDE.md > Language Preference (Mojo for ML models)
- See
train-modelskill for model training - See
/notes/review/mojo-ml-patterns.mdfor Mojo tensor operations