| name | pytorch-model-cli |
| description | Guidance for creating standalone CLI tools that perform neural network inference by extracting PyTorch model weights and reimplementing inference in C/C++. This skill applies when tasks involve converting PyTorch models to standalone executables, extracting model weights to portable formats (JSON), implementing neural network forward passes in C/C++, or creating CLI tools that load images and run inference without Python dependencies. |
PyTorch Model to CLI Tool Conversion
This skill provides guidance for tasks that require converting PyTorch models into standalone command-line tools, typically implemented in C/C++ for portability and independence from Python runtime.
Task Recognition
This skill applies when the task involves:
- Converting a PyTorch model to a standalone executable
- Extracting model weights to a portable format (JSON, binary)
- Implementing neural network inference in C/C++
- Creating CLI tools that perform image classification or prediction
- Building inference tools using libraries like cJSON and lodepng
Recommended Approach
Phase 1: Environment Analysis
Before writing any code, thoroughly analyze the available resources:
Identify the model architecture
- Read the model definition file (e.g.,
model.py) completely - Document all layer types, dimensions, and activation functions
- Note any default parameters (hidden dimensions, number of classes)
- Read the model definition file (e.g.,
Examine available libraries
- Check for image loading libraries (lodepng, stb_image)
- Check for JSON parsing libraries (cJSON, nlohmann/json)
- Identify compilation requirements (headers, source files)
Understand input requirements
- Determine expected image dimensions (e.g., 28x28 for MNIST)
- Identify color format (grayscale, RGB, RGBA)
- Document normalization requirements (divide by 255, mean/std normalization)
Verify preprocessing pipeline
- If training code is available, examine data transformations
- Match inference preprocessing exactly to training preprocessing
- Common transformations: resize, grayscale conversion, normalization
Phase 2: Weight Extraction
Extract model weights from PyTorch format to a portable format:
Load the model checkpoint
import torch import json # Load state dict state_dict = torch.load('model.pth', map_location='cpu')Convert tensors to lists
weights = {} for key, tensor in state_dict.items(): weights[key] = tensor.numpy().tolist()Save to JSON
with open('weights.json', 'w') as f: json.dump(weights, f)Verify extraction
- Check that all expected layer weights are present
- Verify dimensions match the model architecture
- For a model with layers fc1, fc2, fc3: expect fc1.weight, fc1.bias, etc.
Phase 3: Reference Implementation
Before implementing in C/C++, create a reference output:
Run inference in PyTorch
model.eval() with torch.no_grad(): output = model(input_tensor) prediction = output.argmax().item()Save reference outputs
- Store intermediate layer outputs for debugging
- Record the final prediction for verification
- This allows validating the C/C++ implementation
Phase 4: C/C++ Implementation
Implement the inference logic in C/C++:
Image loading and preprocessing
- Load image using the available library (lodepng for PNG)
- Handle color channel conversion (RGBA to grayscale if needed)
- Apply normalization (typically divide by 255.0)
- Flatten to 1D array in correct order (row-major)
Weight loading
- Parse JSON file containing weights
- Store weights in appropriate data structures
- Verify dimensions during loading
Forward pass implementation
- Implement matrix-vector multiplication for linear layers
- Implement activation functions (ReLU, softmax, etc.)
- Process layers in correct order
Output handling
- Find argmax for classification tasks
- Write prediction to output file
- Ensure only prediction goes to stdout (not progress/debug info)
Phase 5: Compilation and Testing
Compile with appropriate flags
g++ -o cli_tool main.cpp lodepng.cpp cJSON.c -std=c++11 -lm- Double-check flag syntax (avoid concatenation errors like
-std=c++11-lm)
- Double-check flag syntax (avoid concatenation errors like
Test against reference
- Run the CLI tool on the same input used for reference
- Compare output to PyTorch reference
- Debug any discrepancies by checking intermediate values
Verification Strategies
Before Implementation
- Model architecture fully documented
- All layer dimensions verified
- Preprocessing requirements identified
- Reference output generated from PyTorch
After Weight Extraction
- All expected keys present in JSON
- Weight dimensions match architecture
- Bias terms included for all layers
After C/C++ Implementation
- Compilation succeeds without warnings
- Output matches PyTorch reference exactly
- CLI tool handles missing files gracefully
- Only prediction output goes to stdout
Final Validation
- All test cases pass
- Memory properly managed (no leaks)
- Error messages go to stderr, not stdout
Common Pitfalls
Weight Extraction
- Forgetting to use
map_location='cpu'when loading on CPU-only systems - Missing bias terms - ensure both weights and biases are extracted
- Incorrect tensor ordering - PyTorch uses different conventions than some C libraries
Preprocessing Mismatches
- Wrong normalization - training might use mean/std normalization, not just /255
- Color channel issues - PNG might be RGBA while model expects grayscale
- Dimension ordering - ensure row-major vs column-major consistency
C/C++ Implementation
- Matrix multiplication order - verify (input × weights^T) vs (weights × input)
- Activation function placement - apply after linear layer, before next layer
- Integer vs float division - use 255.0, not 255, for normalization
Compilation Issues
- Flag concatenation - ensure spaces between compiler flags
- Missing libraries - include all required source files (lodepng.cpp, cJSON.c)
- Header dependencies - verify all headers are in include path
Output Handling
- Verbose library output - suppress or redirect debug/progress output
- Newline handling - ensure consistent line endings in output files
- Buffering issues - flush stdout before program exit
Efficiency Guidelines
- Avoid repeatedly checking package managers; identify available tools first
- Create reference outputs early to catch implementation bugs quickly
- Review complete code before compilation attempts
- Minimize status-only updates; batch related operations
- Test with multiple inputs when possible, not just the provided test case