Claude Code Plugins

Community-maintained marketplace

Feedback

mcp-code-execution-results-comparison-analyzer

@olaservo/code-execution-with-mcp
1
0

Analyzes experimental results comparing code-execution vs direct-MCP approaches for any task. Extracts performance metrics, execution logs, and workspace outputs from zip archives to generate detailed comparison reports. Use when you have a results zip file and need to compare code-execution vs direct-MCP performance, cost, and output quality for any task.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name mcp-code-execution-results-comparison-analyzer
description Analyzes experimental results comparing code-execution vs direct-MCP approaches for any task. Extracts performance metrics, execution logs, and workspace outputs from zip archives to generate detailed comparison reports. Use when you have a results zip file and need to compare code-execution vs direct-MCP performance, cost, and output quality for any task.

MCP vs Code Execution Results Comparison Analyzer

Instructions

This skill analyzes zip files containing experimental comparison data between code-execution and direct-MCP approaches. It extracts performance metrics, execution logs, and workspace outputs to generate comprehensive comparison reports with actionable recommendations.

Usage

  1. Provide the path to a results zip file
  2. The skill extracts and discovers all files (adapts to varying structures)
  3. Parses metrics, logs, and catalogs workspace outputs
  4. Generates comparison report and saves data to ./workspace/

Features

  • Flexible Discovery: Automatically scans and adapts to any zip structure
  • Failed Run Detection: Identifies failed runs via FAILED__ prefix in metrics files, separates from successful runs
  • Performance Metrics: Duration, cost, token usage, efficiency ratios (calculated from successful runs only)
  • Output Quality Assessment: Compares completeness and accuracy of results
  • Hybrid Analysis: Script aggregates data, model performs semantic comparison
  • Actionable Recommendations: Clear guidance on when to use each approach

Comparison Dimensions

The skill compares both approaches across:

  • Performance: Execution time, number of turns, patterns
  • Cost: USD per run, token costs, cache utilization
  • Token Efficiency: Input/output/cache ratios
  • Output Quality: Completeness, accuracy, format (by reading workspace files)

Examples

import { compareResults } from './.claude/skills/results-comparison-analyzer/implementation';

// Analyze a results zip file
const zipPath = './example_results/task_results-2025-11-20T06-22-50-300Z.zip';
await compareResults(zipPath);

// The function will:
// 1. Extract and parse all metrics and logs
// 2. Catalog workspace outputs with excerpts
// 3. Aggregate statistics by approach
// 4. Save comparison-data.json
// 5. Return structured data for model analysis

Output Files

The skill automatically saves:

  • ./workspace/comparison-data.json - Aggregated metrics, file catalog, raw data including failure summary
  • ./workspace/comparison-analysis-report.md - Comparison report with:
    • Executive summary with key findings
    • Success/failure rates for each approach
    • Performance comparison (successful runs only)
    • Approach analysis (pros/cons)
    • Output quality assessment
    • Recommendations for when to use each approach

Changelog

  • 2025-11-19: Initial version - flexible discovery and hybrid analysis