name

pytorch-model-cli

description

Guidance for creating standalone CLI tools that perform neural network inference by extracting PyTorch model weights and reimplementing inference in C/C++. This skill applies when tasks involve converting PyTorch models to standalone executables, extracting model weights to portable formats (JSON), implementing neural network forward passes in C/C++, or creating CLI tools that load images and run inference without Python dependencies.

PyTorch Model to CLI Tool Conversion

This skill provides guidance for tasks that require converting PyTorch models into standalone command-line tools, typically implemented in C/C++ for portability and independence from Python runtime.

Task Recognition

This skill applies when the task involves:

Converting a PyTorch model to a standalone executable
Extracting model weights to a portable format (JSON, binary)
Implementing neural network inference in C/C++
Creating CLI tools that perform image classification or prediction
Building inference tools using libraries like cJSON and lodepng

Recommended Approach

Phase 1: Environment Analysis

Before writing any code, thoroughly analyze the available resources:

Identify the model architecture
- Read the model definition file (e.g., model.py) completely
- Document all layer types, dimensions, and activation functions
- Note any default parameters (hidden dimensions, number of classes)
Examine available libraries
- Check for image loading libraries (lodepng, stb_image)
- Check for JSON parsing libraries (cJSON, nlohmann/json)
- Identify compilation requirements (headers, source files)
Understand input requirements
- Determine expected image dimensions (e.g., 28x28 for MNIST)
- Identify color format (grayscale, RGB, RGBA)
- Document normalization requirements (divide by 255, mean/std normalization)
Verify preprocessing pipeline
- If training code is available, examine data transformations
- Match inference preprocessing exactly to training preprocessing
- Common transformations: resize, grayscale conversion, normalization

Phase 2: Weight Extraction

Extract model weights from PyTorch format to a portable format:

Load the model checkpoint

import torch
import json

# Load state dict
state_dict = torch.load('model.pth', map_location='cpu')

Convert tensors to lists

weights = {}
for key, tensor in state_dict.items():
    weights[key] = tensor.numpy().tolist()

Save to JSON

with open('weights.json', 'w') as f:
    json.dump(weights, f)

Verify extraction
- Check that all expected layer weights are present
- Verify dimensions match the model architecture
- For a model with layers fc1, fc2, fc3: expect fc1.weight, fc1.bias, etc.

Phase 3: Reference Implementation

Before implementing in C/C++, create a reference output:

Run inference in PyTorch

model.eval()
with torch.no_grad():
    output = model(input_tensor)
    prediction = output.argmax().item()

Save reference outputs
- Store intermediate layer outputs for debugging
- Record the final prediction for verification
- This allows validating the C/C++ implementation

Phase 4: C/C++ Implementation

Implement the inference logic in C/C++:

Image loading and preprocessing
- Load image using the available library (lodepng for PNG)
- Handle color channel conversion (RGBA to grayscale if needed)
- Apply normalization (typically divide by 255.0)
- Flatten to 1D array in correct order (row-major)
Weight loading
- Parse JSON file containing weights
- Store weights in appropriate data structures
- Verify dimensions during loading
Forward pass implementation
- Implement matrix-vector multiplication for linear layers
- Implement activation functions (ReLU, softmax, etc.)
- Process layers in correct order
Output handling
- Find argmax for classification tasks
- Write prediction to output file
- Ensure only prediction goes to stdout (not progress/debug info)

Phase 5: Compilation and Testing

Compile with appropriate flags
```
g++ -o cli_tool main.cpp lodepng.cpp cJSON.c -std=c++11 -lm
```
- Double-check flag syntax (avoid concatenation errors like -std=c++11-lm)
Test against reference
- Run the CLI tool on the same input used for reference
- Compare output to PyTorch reference
- Debug any discrepancies by checking intermediate values

Verification Strategies

Before Implementation

Model architecture fully documented
All layer dimensions verified
Preprocessing requirements identified
Reference output generated from PyTorch

After Weight Extraction

All expected keys present in JSON
Weight dimensions match architecture
Bias terms included for all layers

After C/C++ Implementation

Compilation succeeds without warnings
Output matches PyTorch reference exactly
CLI tool handles missing files gracefully
Only prediction output goes to stdout

Final Validation

All test cases pass
Memory properly managed (no leaks)
Error messages go to stderr, not stdout

Common Pitfalls

Weight Extraction

Forgetting to use map_location='cpu' when loading on CPU-only systems
Missing bias terms - ensure both weights and biases are extracted
Incorrect tensor ordering - PyTorch uses different conventions than some C libraries

Preprocessing Mismatches

Wrong normalization - training might use mean/std normalization, not just /255
Color channel issues - PNG might be RGBA while model expects grayscale
Dimension ordering - ensure row-major vs column-major consistency

C/C++ Implementation

Matrix multiplication order - verify (input × weights^T) vs (weights × input)
Activation function placement - apply after linear layer, before next layer
Integer vs float division - use 255.0, not 255, for normalization

Compilation Issues

Flag concatenation - ensure spaces between compiler flags
Missing libraries - include all required source files (lodepng.cpp, cJSON.c)
Header dependencies - verify all headers are in include path

Output Handling

Verbose library output - suppress or redirect debug/progress output
Newline handling - ensure consistent line endings in output files
Buffering issues - flush stdout before program exit

Efficiency Guidelines

Avoid repeatedly checking package managers; identify available tools first
Create reference outputs early to catch implementation bugs quickly
Review complete code before compilation attempts
Minimize status-only updates; batch related operations
Test with multiple inputs when possible, not just the provided test case