Claude Code Plugins

Community-maintained marketplace

Feedback

Analyze Excel files and automatically generate custom query skills for conversational data interaction. Also provides standalone conversion of Excel files with multiple sheets to Parquet datasets. Handles files with millions of rows, thousands of columns, formulas, and multiple sheets. Converts data to optimized Parquet format and creates complete skill packages with documentation, query helpers, and formula mappings.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name excel-analyzer
description Analyze Excel files and automatically generate custom query skills for conversational data interaction. Also provides standalone conversion of Excel files with multiple sheets to Parquet datasets. Handles files with millions of rows, thousands of columns, formulas, and multiple sheets. Converts data to optimized Parquet format and creates complete skill packages with documentation, query helpers, and formula mappings.

Excel Analyzer Skill

This skill analyzes Excel files and automatically generates custom Claude Agent Skills tailored to specific Excel files, enabling conversational querying of complex Excel data.

What This Skill Does

The Excel Analyzer provides two main capabilities:

Full Analysis & Skill Generation (Workflow 1)

Performs comprehensive analysis of Excel files and generates a complete, ready-to-use query skill that includes:

  1. Schema Extraction: Analyzes all sheets, columns, data types, and relationships
  2. Formula Analysis: Extracts Excel formulas with their locations and dependencies
  3. Data Conversion: Converts Excel to optimized Parquet format for 30x faster queries
  4. Skill Generation: Creates a complete skill with documentation, query helpers, and data dictionaries
  5. Documentation: Generates comprehensive SKILL.md with usage instructions and examples

Standalone Parquet Conversion (Workflow 2)

Quickly converts Excel files with multiple sheets to Parquet datasets:

  1. Multi-Sheet Detection: Automatically detects all sheets in the Excel workbook
  2. Parquet Conversion: Converts each sheet to a separate compressed Parquet file
  3. Organized Storage: Stores files in .claude/skills/{skill-name}/resources/parquet_data/
  4. Ready for Analysis: Output files are immediately usable with Polars, Pandas, or PyArrow

When to Use This Skill

Use this skill when you need to:

  • Enable conversational AI querying of complex Excel files
  • Work with large Excel files (100K+ rows, hundreds of columns)
  • Analyze Excel files containing formulas and computed values
  • Create reusable query interfaces for Excel datasets
  • Convert Excel data to more efficient formats (Parquet)
  • Generate data dictionaries and schema documentation automatically
  • Build custom skills for specific business datasets

Environment Setup and Dependency Management

CRITICAL: All generated skills MUST include automatic environment detection and dependency management to avoid polluting the global Python environment.

When generating a new skill, the skill generator MUST add the following capabilities to the generated SKILL.md:

1. Operating System Detection

The generated skill must detect the current OS to use correct commands:

import platform
import sys

os_type = platform.system()  # 'Windows', 'Linux', or 'Darwin' (macOS)
python_executable = sys.executable
python_version = sys.version

2. Virtual Environment Detection

The generated skill must detect and use virtual environments in this priority order:

  1. Check if already running in an activated venv (VIRTUAL_ENV environment variable)
  2. Look for .venv/ directory (modern convention)
  3. Look for venv/ directory
  4. Look for env/ directory
  5. Only fall back to system Python if no venv found (with warning)

Detection logic:

import os
from pathlib import Path

def detect_venv():
    """Detect virtual environment in current directory"""
    # Check if already in activated venv
    if os.environ.get('VIRTUAL_ENV'):
        return Path(os.environ['VIRTUAL_ENV'])

    # Check for common venv directory names
    venv_candidates = ['.venv', 'venv', 'env']
    for venv_name in venv_candidates:
        venv_path = Path(venv_name)
        # Verify it's a valid venv by checking for pyvenv.cfg
        if venv_path.exists() and (venv_path / 'pyvenv.cfg').exists():
            return venv_path.absolute()

    return None  # No venv found, will use system Python

3. Dependency Installation

The generated skill must:

  • Check if required packages are installed in the detected venv
  • Automatically install missing dependencies from requirements.txt
  • NEVER install packages globally unless no venv exists

Installation logic:

import subprocess
import importlib.util

def check_package_installed(package_name):
    """Check if a package is installed"""
    return importlib.util.find_spec(package_name) is not None

def install_dependencies(requirements_file, venv_path=None):
    """Install missing dependencies in virtual environment"""
    if not Path(requirements_file).exists():
        return

    # Determine pip executable
    if venv_path:
        if platform.system() == 'Windows':
            pip_executable = venv_path / 'Scripts' / 'pip.exe'
        else:
            pip_executable = venv_path / 'bin' / 'pip'
    else:
        pip_executable = 'pip'

    # Install dependencies
    cmd = [str(pip_executable), 'install', '-r', str(requirements_file)]
    if not venv_path:
        # Add --break-system-packages only if using system Python
        cmd.append('--break-system-packages')

    subprocess.run(cmd, check=True)

4. Complete Setup Function

Every generated skill MUST include this setup function at the beginning of SKILL.md:

def setup_environment():
    """
    Detect environment and ensure all dependencies are installed.
    This function MUST be called before any skill operations.
    """
    import platform
    import sys
    import os
    from pathlib import Path
    import subprocess
    import importlib.util

    # Detect OS
    os_type = platform.system()
    print(f"Detected OS: {os_type}")
    print(f"Python version: {sys.version}")
    print(f"Python executable: {sys.executable}")

    # Detect virtual environment
    venv_path = None
    if os.environ.get('VIRTUAL_ENV'):
        venv_path = Path(os.environ['VIRTUAL_ENV'])
        print(f"Already in activated venv: {venv_path}")
    else:
        # Look for venv in current directory
        for venv_name in ['.venv', 'venv', 'env']:
            candidate = Path(venv_name)
            if candidate.exists() and (candidate / 'pyvenv.cfg').exists():
                venv_path = candidate.absolute()
                print(f"Found virtual environment: {venv_path}")
                break

    if not venv_path:
        print("WARNING: No virtual environment detected. Will use system Python.")
        print("Consider creating a venv: python -m venv .venv")

    # Check and install dependencies
    skill_dir = Path(__file__).parent if '__file__' in globals() else Path.cwd()
    requirements_file = skill_dir / 'requirements.txt'

    if requirements_file.exists():
        print(f"Checking dependencies from: {requirements_file}")

        # Determine pip executable
        if venv_path:
            if os_type == 'Windows':
                pip_exec = venv_path / 'Scripts' / 'pip.exe'
            else:
                pip_exec = venv_path / 'bin' / 'pip'
        else:
            pip_exec = sys.executable.replace('python', 'pip')

        # Install missing dependencies
        try:
            cmd = [str(pip_exec), 'install', '-r', str(requirements_file), '-q']
            if not venv_path:
                cmd.append('--break-system-packages')

            result = subprocess.run(cmd, capture_output=True, text=True)
            if result.returncode == 0:
                print("✓ All dependencies installed successfully")
            else:
                print(f"Warning: Dependency installation had issues: {result.stderr}")
        except Exception as e:
            print(f"Error installing dependencies: {e}")

    return {
        'os': os_type,
        'python_version': sys.version,
        'python_executable': sys.executable,
        'venv_path': str(venv_path) if venv_path else None,
        'venv_active': bool(venv_path),
        'skill_dir': str(skill_dir)
    }

5. Usage in Generated Skills

Every generated skill's SKILL.md must include instructions to call setup_environment() BEFORE any operations:

# At the start of any generated skill usage:
from skill_setup import setup_environment

# MUST run this first
env_info = setup_environment()

# Now safe to proceed with skill operations
from query_helper import QueryHelper
helper = QueryHelper('data_parquet')
# ... rest of the code

The skill generator MUST create a skill_setup.py file in every generated skill directory containing the complete setup_environment() function.

Prerequisites

Required Python packages (will be auto-installed if missing):

pip install openpyxl polars pyarrow  pandas --break-system-packages

Workflow Overview

Step 1: Analyze Excel File
    ↓
    analyzer.py → Extracts schema, formulas, dependencies, samples
    ↓
Step 2: Generate Query Skill
    ↓
    skill_generator.py → Creates custom skill package with environment setup
    ↓
Step 3: Use Generated Skill
    ↓
    Auto-detects venv, installs dependencies, queries data conversationally

Step-by-Step Usage

Workflow Option 1: Full Analysis + Skill Generation (Recommended)

For comprehensive analysis and automatic skill generation with documentation, query helpers, and formula mapping.

Step 1A: Analyze the Excel File

python scripts/analyzer.py \
  --file /path/to/your/data.xlsx \
  --output-dir ./analysis_output

What this does:

  • Loads the Excel file and analyzes all sheets
  • Extracts comprehensive schema (columns, types, samples)
  • Identifies and documents all formulas
  • Builds formula dependency graphs
  • Extracts stratified sample data (50 rows per sheet)
  • Converts all sheets to Parquet format
  • Saves analysis results as JSON files

Output files created:

analysis_output/
├── analysis_results.json    # Complete analysis
├── data_dictionary.json     # Schema and column metadata
├── formula_map.json         # All formulas with locations
├── dependencies.json        # Formula dependency graph
├── sample_data.json         # Representative data samples
└── resources/parquet_data            # Converted Parquet files
    ├── Sheet1.parquet
    └── Sheet2.parquet

Step 1B: Generate the Query Skill

# Option 1: Auto-detect .claude/skills directory (recommended)
python scripts/skill_generator.py \
  --analysis-dir ./analysis_output

# Option 2: Specify custom output directory
python scripts/skill_generator.py \
  --analysis-dir ./analysis_output \
  --output-dir ./custom_skills_directory

Default Behavior: If --output-dir is not specified, the skill will automatically be created in .claude/skills/ in the current working directory. This ensures all generated skills are organized alongside the excel-analyzer skill.

What this does:

  • Reads the analysis results
  • Generates a complete skill package
  • Creates comprehensive SKILL.md documentation
  • Generates query_helper.py with utility functions
  • Creates formula_functions.py for Excel formula equivalents
  • Creates skill_setup.py with environment detection and dependency management
  • Copies all data files and Parquet data
  • Includes requirements.txt for dependencies

Generated skill structure:

.claude/skills/{filename}-query/
├── SKILL.md                    # Complete documentation with setup instructions
├── skill_setup.py              # Environment detection and dependency management
├── data_dictionary.json        # Schema reference
├── formula_map.json           # Formula documentation
├── dependencies.json          # Dependencies
├── sample_data.json           # Sample data
├── query_helper.py            # Query utility class
├── formula_functions.py       # Formula equivalents
├── requirements.txt           # Dependencies
└── resources/parquet_data            # Converted Parquet files
    ├── Sheet1.parquet
    └── Sheet2.parquet

Note: By default, skills are created in .claude/skills/ directory in your current working directory. This keeps all skills organized in one location alongside the excel-analyzer skill. The skill name is automatically derived from the Excel filename with -query suffix (e.g., sales_2024.xlsx becomes sales_2024-query).

Step 1C: Use the Generated Skill

The generated skill can be used immediately with Claude Agent SDK and will automatically handle environment setup:

# Example: Query the data (environment setup happens automatically)
from skill_setup import setup_environment
from query_helper import QueryHelper
import polars as pl

# Setup environment (detects venv, installs dependencies)
env_info = setup_environment()

# Initialize
helper = QueryHelper('data_parquet')

# Load a sheet
df = helper.load_sheet('Sheet1', lazy=True)

# Query with filters
result = helper.query_by_filter(
    'Sheet1',
    [pl.col('age') > 25, pl.col('status') == 'active'],
    columns=['name', 'age', 'revenue'],
    limit=100
)

print(result)

# Aggregate data
aggregated = helper.aggregate_by_group(
    'Sheet1',
    group_by=['category'],
    aggregations={'revenue': 'sum', 'orders': 'count'},
    limit=50
)

print(aggregated)

Workflow Option 2: Standalone Parquet Conversion (Quick & Simple)

For users who just need to convert Excel files with multiple sheets to Parquet datasets without the full analysis and skill generation. This is ideal when you:

  • Already have your own query tools/scripts
  • Just need efficient data storage format
  • Want to quickly convert Excel to Parquet for use with Polars/Pandas
  • Don't need formula analysis or auto-generated documentation

Step 2A: Convert Excel to Parquet

# Basic usage (auto-detects skill name from filename)
python scripts/excel_to_parquet.py --file /path/to/your/data.xlsx

# Specify custom skill name
python scripts/excel_to_parquet.py --file /path/to/sales_data.xlsx --skill-name sales-2024

What this does:

  • Detects all sheets in the Excel file
  • Converts each sheet to a separate Parquet file with zstd compression
  • Stores files in .claude/skills/{skill-name}/resources/parquet_data/
  • Sanitizes sheet names for valid filenames
  • Provides detailed conversion summary with row/column counts and file sizes

Output structure:

.claude/skills/{skill-name}/
└── resources/
    └── parquet_data/
        ├── Sheet1.parquet
        ├── Sheet2.parquet
        ├── Sales_Data.parquet
        └── Monthly_Report.parquet

Example output:

🚀 Starting Excel to Parquet conversion...
📄 File: sales_data.xlsx
📁 Skill: sales-data
💾 Output: .claude/skills/sales-data/resources/parquet_data

✅ Created directory structure: .claude/skills/sales-data/resources/parquet_data

📊 Loading workbook...
✅ Found 3 sheet(s): Summary, Q1_Sales, Q2_Sales

🔄 Converting sheets to Parquet...
   [1/3] Summary... ✅ (150 rows × 8 cols, 0.05 MB)
   [2/3] Q1_Sales... ✅ (5000 rows × 15 cols, 0.45 MB)
   [3/3] Q2_Sales... ✅ (5200 rows × 15 cols, 0.47 MB)

======================================================================
📊 CONVERSION SUMMARY
======================================================================
✅ Successful: 3/3

📦 Generated Parquet files:
   • Summary.parquet (150 rows, 0.05 MB)
   • Q1_Sales.parquet (5000 rows, 0.45 MB)
   • Q2_Sales.parquet (5200 rows, 0.47 MB)

📏 Total size: 0.97 MB

📁 Location: .claude/skills/sales-data/resources/parquet_data
======================================================================

🎉 Conversion complete!

Step 2B: Use the Parquet Files

Once converted, you can use the Parquet files directly with Polars, Pandas, or PyArrow:

import polars as pl

# Load a specific sheet
df = pl.read_parquet('.claude/skills/sales-data/resources/parquet_data/Q1_Sales.parquet')

# Query the data
result = df.filter(pl.col('revenue') > 10000).select(['customer', 'revenue', 'date'])
print(result)

# Aggregate
summary = df.group_by('region').agg([
    pl.col('revenue').sum().alias('total_revenue'),
    pl.col('orders').count().alias('order_count')
])
print(summary)

With Pandas:

import pandas as pd

# Load a specific sheet
df = pd.read_parquet('.claude/skills/sales-data/resources/parquet_data/Q1_Sales.parquet')

# Query the data
result = df[df['revenue'] > 10000][['customer', 'revenue', 'date']]
print(result)

Comparison: Workflow 1 vs Workflow 2

Feature Workflow 1 (Full Analysis) Workflow 2 (Parquet Only)
Parquet Conversion ✅ Yes ✅ Yes
Schema Analysis ✅ Yes ❌ No
Formula Extraction ✅ Yes ❌ No
Auto-generated Skill ✅ Yes ❌ No
Query Helper Classes ✅ Yes ❌ No
Documentation ✅ Complete ❌ No
Dependencies Setup ✅ Automatic ⚠️ Manual
Use Case Full-featured skills Quick data conversion
Best For Production, reusable skills One-off conversions

Supported Features

Excel File Characteristics

  • Size: Up to 100MB files (tested with 500K rows × 10K columns)
  • Formats: .xlsx, .xlsm
  • Sheets: Multiple sheets per file
  • Formulas: Native Excel formulas (SUM, AVERAGE, IF, VLOOKUP, etc.)
  • Data Types: Text, numeric, datetime, boolean, mixed
  • Special Features: Named ranges, formatted tables, computed columns

Analysis Capabilities

  • Schema Detection: Automatic column type inference
  • Formula Extraction: Identifies all formula cells and their expressions
  • Dependency Graphs: Maps formula dependencies across cells and sheets
  • Data Profiling: Sample values, null counts, unique values
  • Performance: Handles large files with streaming processing

Generated Query Features

  • Lazy Loading: Memory-efficient data loading
  • Streaming Queries: Process large datasets without memory overflow
  • Filter Operations: Complex multi-condition filtering
  • Aggregations: Group by, sum, mean, count, min, max
  • Pagination: Automatic result limiting and pagination
  • Formula Support: Pre-computed values and formula documentation
  • Environment Management: Automatic venv detection and dependency installation

Technical Details

Performance Characteristics

File Size Rows Columns Analysis Time Query Time
Small <10K <100 1-2 min <1 sec
Medium 100K <500 5-10 min <10 sec
Large 500K <2K 20-30 min 1-5 min
Very Large 1M+ 5-10K 1-2 hours 5-15 min

Memory Management

  • Uses Polars streaming mode to handle datasets larger than available RAM
  • Lazy loading prevents loading entire datasets into memory
  • Parquet columnar format enables efficient column selection
  • Read-only mode for initial analysis reduces memory footprint

Formula Handling

Supported Formula Functions:

  • Arithmetic: SUM, AVERAGE, COUNT, MIN, MAX
  • Logical: IF, AND, OR, NOT
  • Lookup: VLOOKUP, HLOOKUP, INDEX, MATCH
  • Conditional: SUMIF, COUNTIF, AVERAGEIF
  • Text: CONCATENATE, LEFT, RIGHT, MID, LEN

Formula Conversion: Excel formulas are documented in formula_map.json and can be converted to Polars expressions. Complex formulas may require manual review and adjustment.

Data Type Inference

The analyzer automatically infers data types from samples:

  • Numeric: Integers and floats (including percentages, currencies)
  • Text: Strings and categorical data
  • Datetime: Dates, times, and timestamps
  • Boolean: True/False values
  • Mixed: Columns with multiple data types

Examples

Example 1: Sales Data Analysis

# Analyze sales data
python scripts/analyzer.py \
  --file sales_2024.xlsx \
  --output-dir ./sales_analysis

# Generate query skill (auto-saves to .claude/skills/)
python scripts/skill_generator.py \
  --analysis-dir ./sales_analysis

# Now you can query: "Show me total revenue by region for Q1"
# The skill will be available at: .claude/skills/sales_2024-query/
# Environment setup happens automatically!

Example 2: Customer Database

# Analyze customer database
python scripts/analyzer.py \
  --file customer_data.xlsx \
  --output-dir ./customer_analysis

# Generate skill (auto-saves to .claude/skills/)
python scripts/skill_generator.py \
  --analysis-dir ./customer_analysis

# Query: "Find all active customers over age 30 with purchases > $1000"
# The skill will be available at: .claude/skills/customer_data-query/

Example 3: Financial Reports

# Analyze financial report with formulas
python scripts/analyzer.py \
  --file financial_report.xlsx \
  --output-dir ./finance_analysis

# Generate skill (auto-saves to .claude/skills/)
python scripts/skill_generator.py \
  --analysis-dir ./finance_analysis

# Query: "Compare revenue vs expenses across all departments"
# The skill will be available at: .claude/skills/financial_report-query/

Troubleshooting

Common Issues

Q: "File not found" error A: Ensure the Excel file path is absolute or relative to current directory

Q: "Memory error during analysis" A: Excel file may be too large. Try analyzing on a machine with more RAM or reduce file size

Q: "Formula parsing error" A: Some complex or custom formulas may not parse correctly. Check formula_map.json for details

Q: "Parquet conversion failed" A: Install required packages: pip install polars pyarrow --break-system-packages

Q: Generated skill doesn't load A: Ensure all JSON files are valid. Check for Unicode or encoding issues in Excel data

Q: "Virtual environment not detected" A: The skill will fall back to system Python with a warning. Consider creating a venv: python -m venv .venv

Q: "Dependency installation failed" A: Check that pip is available in your venv. Try manually: pip install -r requirements.txt

Performance Optimization

For large files:

  1. Run analysis on a machine with sufficient RAM (8GB+ recommended)
  2. Use SSD storage for faster I/O
  3. Close other applications during analysis
  4. Consider splitting very large files into multiple smaller files

Best Practices

  1. Start Small: Test with a smaller sample of your data first
  2. Clean Data: Remove unnecessary sheets or columns before analysis
  3. Document Formulas: Complex formulas should be documented in Excel comments
  4. Validate Results: Compare generated skill queries against Excel calculations
  5. Version Control: Keep analysis outputs and generated skills in version control
  6. Iterate: Refine generated skills based on usage patterns
  7. Use Virtual Environments: Always work within a venv to avoid dependency conflicts

Limitations

  • Maximum tested file size: 100MB (500K rows × 10K columns)
  • Formula conversion may require manual review for complex cases
  • Custom VBA macros are not analyzed or converted
  • Pivot tables are not directly supported (analyze source data instead)
  • External data connections are not followed

Additional Resources

For detailed information on:

  • Formula conversion patterns: See references/formula_conversion.md
  • Query patterns and examples: See references/query_patterns.md
  • Performance tuning: See references/performance_guide.md
  • Advanced usage: See references/advanced_usage.md

Skill Outputs

After running both scripts, you will have:

  1. Analysis Directory: Contains all analysis results and Parquet data
  2. Generated Skill: A complete, ready-to-use Claude Agent Skill with environment management
  3. Documentation: Comprehensive SKILL.md in the generated skill
  4. Query Tools: Helper classes and functions for data querying
  5. Data Dictionary: Complete schema and metadata documentation
  6. Environment Setup: Automatic venv detection and dependency installation

Integration with Claude Agent SDK

The generated skills work seamlessly with Claude Agent SDK:

from claude_agent_sdk import Agent

# Initialize agent with generated skill
agent = Agent(skills_dir="./claude/skills")

# Query naturally (environment setup happens automatically)
response = agent.query(
    "Show me the top 10 customers by revenue in 2024"
)

print(response)