name: numpy-interop description: Protocols for cross-library data exchange including DLPack, buffer interfaces, and array_ufunc for overriding NumPy functions. Triggers: DLPack, interoperability, array_interface, array_ufunc, buffer protocol.
Overview
NumPy Interoperability encompasses the protocols that allow different numerical libraries (PyTorch, TensorFlow, SciPy) to exchange data without redundant copies. It focuses on modern standards like DLPack and the implementation of custom behavior for non-NumPy objects using the __array_ufunc__ protocol.
When to Use
- Passing data from a GPU-based library (like PyTorch) back to the CPU for NumPy analysis.
- Implementing custom array-like objects that should work seamlessly with
np.sin()ornp.add(). - Interfacing with low-level C/C++ extensions using memory pointers and typestrings.
- Avoiding memory copies when moving large tensors between frameworks.
Decision Tree
- Exchanging data with a modern tensor library (PyTorch)?
- Use
np.from_dlpack().
- Use
- Creating a custom class that needs to handle NumPy operations?
- Implement the
__array_ufunc__method.
- Implement the
- Accessing raw memory pointers for a C extension?
- Inspect the
__array_interface__attribute.
- Inspect the
Workflows
Converting PyTorch Tensors to NumPy
- Ensure the PyTorch tensor is on the CPU:
tensor.cpu(). - Convert using
np.from_dlpack(tensor). - The resulting NumPy array is a view of the tensor's memory.
- Ensure the PyTorch tensor is on the CPU:
Overriding NumPy Ops for Subclasses
- Implement the
__array_ufunc__method in a custom class. - Define how standard operations (like
np.sin) should behave for your object. - Call
np.sin(my_obj)and observe that your custom implementation is executed.
- Implement the
Low-Level Memory Access via Interface
- Inspect an object's
__array_interface__attribute. - Retrieve the 'data' pointer and 'typestr'.
- Use these to wrap the memory in a third-party C/C++ extension for high-performance processing.
- Inspect an object's
Non-Obvious Insights
- CPU Limitation: NumPy currently only supports DLPack for CPU-resident data; GPU-resident objects must be moved to host memory before conversion.
- Writeability Requirement: Current NumPy implementations of DLPack primarily support writeable arrays; read-only arrays may fail to export.
- Legacy Interfaces:
__array_struct__is considered legacy; developers should prioritize the buffer protocol or DLPack for new library integrations.
Evidence
- "DLPack is yet another protocol to convert foreign objects to NumPy arrays in a language and device agnostic manner." Source
- "As long as foreign objects implement the array_ufunc or array_function protocols, it is possible to operate on them without the need for explicit conversion." Source
Scripts
scripts/numpy-interop_tool.py: Example of array_ufunc implementation and DLPack usage.scripts/numpy-interop_tool.js: Simulated memory pointer inspection.
Dependencies
numpy(Python)