Claude Code Plugins

Community-maintained marketplace

Feedback

Guidance for creating standalone CLI tools that perform neural network inference by extracting PyTorch model weights and reimplementing inference in C/C++. This skill applies when tasks involve converting PyTorch models to standalone executables, extracting model weights to portable formats (JSON), implementing neural network forward passes in C/C++, or creating CLI tools that load images and run inference without Python dependencies.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name pytorch-model-cli
description Guidance for creating standalone CLI tools that perform neural network inference by extracting PyTorch model weights and reimplementing inference in C/C++. This skill applies when tasks involve converting PyTorch models to standalone executables, extracting model weights to portable formats (JSON), implementing neural network forward passes in C/C++, or creating CLI tools that load images and run inference without Python dependencies.

PyTorch Model to CLI Tool Conversion

This skill provides guidance for tasks that require converting PyTorch models into standalone command-line tools, typically implemented in C/C++ for portability and independence from Python runtime.

Task Recognition

This skill applies when the task involves:

  • Converting a PyTorch model to a standalone executable
  • Extracting model weights to a portable format (JSON, binary)
  • Implementing neural network inference in C/C++
  • Creating CLI tools that perform image classification or prediction
  • Building inference tools using libraries like cJSON and lodepng

Recommended Approach

Phase 1: Environment Analysis

Before writing any code, thoroughly analyze the available resources:

  1. Identify the model architecture

    • Read the model definition file (e.g., model.py) completely
    • Document all layer types, dimensions, and activation functions
    • Note any default parameters (hidden dimensions, number of classes)
  2. Examine available libraries

    • Check for image loading libraries (lodepng, stb_image)
    • Check for JSON parsing libraries (cJSON, nlohmann/json)
    • Identify compilation requirements (headers, source files)
  3. Understand input requirements

    • Determine expected image dimensions (e.g., 28x28 for MNIST)
    • Identify color format (grayscale, RGB, RGBA)
    • Document normalization requirements (divide by 255, mean/std normalization)
  4. Verify preprocessing pipeline

    • If training code is available, examine data transformations
    • Match inference preprocessing exactly to training preprocessing
    • Common transformations: resize, grayscale conversion, normalization

Phase 2: Weight Extraction

Extract model weights from PyTorch format to a portable format:

  1. Load the model checkpoint

    import torch
    import json
    
    # Load state dict
    state_dict = torch.load('model.pth', map_location='cpu')
    
  2. Convert tensors to lists

    weights = {}
    for key, tensor in state_dict.items():
        weights[key] = tensor.numpy().tolist()
    
  3. Save to JSON

    with open('weights.json', 'w') as f:
        json.dump(weights, f)
    
  4. Verify extraction

    • Check that all expected layer weights are present
    • Verify dimensions match the model architecture
    • For a model with layers fc1, fc2, fc3: expect fc1.weight, fc1.bias, etc.

Phase 3: Reference Implementation

Before implementing in C/C++, create a reference output:

  1. Run inference in PyTorch

    model.eval()
    with torch.no_grad():
        output = model(input_tensor)
        prediction = output.argmax().item()
    
  2. Save reference outputs

    • Store intermediate layer outputs for debugging
    • Record the final prediction for verification
    • This allows validating the C/C++ implementation

Phase 4: C/C++ Implementation

Implement the inference logic in C/C++:

  1. Image loading and preprocessing

    • Load image using the available library (lodepng for PNG)
    • Handle color channel conversion (RGBA to grayscale if needed)
    • Apply normalization (typically divide by 255.0)
    • Flatten to 1D array in correct order (row-major)
  2. Weight loading

    • Parse JSON file containing weights
    • Store weights in appropriate data structures
    • Verify dimensions during loading
  3. Forward pass implementation

    • Implement matrix-vector multiplication for linear layers
    • Implement activation functions (ReLU, softmax, etc.)
    • Process layers in correct order
  4. Output handling

    • Find argmax for classification tasks
    • Write prediction to output file
    • Ensure only prediction goes to stdout (not progress/debug info)

Phase 5: Compilation and Testing

  1. Compile with appropriate flags

    g++ -o cli_tool main.cpp lodepng.cpp cJSON.c -std=c++11 -lm
    
    • Double-check flag syntax (avoid concatenation errors like -std=c++11-lm)
  2. Test against reference

    • Run the CLI tool on the same input used for reference
    • Compare output to PyTorch reference
    • Debug any discrepancies by checking intermediate values

Verification Strategies

Before Implementation

  • Model architecture fully documented
  • All layer dimensions verified
  • Preprocessing requirements identified
  • Reference output generated from PyTorch

After Weight Extraction

  • All expected keys present in JSON
  • Weight dimensions match architecture
  • Bias terms included for all layers

After C/C++ Implementation

  • Compilation succeeds without warnings
  • Output matches PyTorch reference exactly
  • CLI tool handles missing files gracefully
  • Only prediction output goes to stdout

Final Validation

  • All test cases pass
  • Memory properly managed (no leaks)
  • Error messages go to stderr, not stdout

Common Pitfalls

Weight Extraction

  • Forgetting to use map_location='cpu' when loading on CPU-only systems
  • Missing bias terms - ensure both weights and biases are extracted
  • Incorrect tensor ordering - PyTorch uses different conventions than some C libraries

Preprocessing Mismatches

  • Wrong normalization - training might use mean/std normalization, not just /255
  • Color channel issues - PNG might be RGBA while model expects grayscale
  • Dimension ordering - ensure row-major vs column-major consistency

C/C++ Implementation

  • Matrix multiplication order - verify (input × weights^T) vs (weights × input)
  • Activation function placement - apply after linear layer, before next layer
  • Integer vs float division - use 255.0, not 255, for normalization

Compilation Issues

  • Flag concatenation - ensure spaces between compiler flags
  • Missing libraries - include all required source files (lodepng.cpp, cJSON.c)
  • Header dependencies - verify all headers are in include path

Output Handling

  • Verbose library output - suppress or redirect debug/progress output
  • Newline handling - ensure consistent line endings in output files
  • Buffering issues - flush stdout before program exit

Efficiency Guidelines

  • Avoid repeatedly checking package managers; identify available tools first
  • Create reference outputs early to catch implementation bugs quickly
  • Review complete code before compilation attempts
  • Minimize status-only updates; batch related operations
  • Test with multiple inputs when possible, not just the provided test case