name: numpy-indexing description: Advanced indexing techniques including slicing, fancy indexing, and boolean masks, along with memory implications of views vs. copies. Triggers: indexing, slicing, fancy indexing, boolean mask, np.where, np.ix_.
Overview
Indexing in NumPy ranges from basic slicing (zero-copy) to advanced "fancy" indexing (always creates a copy). Understanding the distinction is vital for memory management and avoiding unintended side effects in data analysis.
When to Use
- Extracting sub-regions of arrays for processing.
- Filtering data based on complex conditional logic (boolean masking).
- Selecting arbitrary elements using coordinate lists.
- Managing memory when dealing with large datasets that have small regions of interest.
Decision Tree
- Do you need a view or a copy?
- View: Use basic slicing (
arr[0:5]). - Copy: Use advanced indexing (
arr[[0, 1, 2]]) or.copy().
- View: Use basic slicing (
- Are you filtering by value?
- Use a boolean mask:
arr[arr > threshold].
- Use a boolean mask:
- Selecting a grid of values across axes?
- Use
np.ix_to construct the selection mesh.
- Use
Workflows
Filtering Data with Boolean Masks
- Apply a comparison operator (e.g.,
x > 0) to an array to create a boolean mask. - Pass the mask into the array's indexing brackets:
x[mask]. - Operate on the resulting array (note that this is a copy, not a view).
- Apply a comparison operator (e.g.,
Memory-Efficient Sub-array Extraction
- Slice a small portion from a large ndarray.
- Call
.copy()on the slice to create a new independent array. - Delete the original large array to free system memory.
Cross-Axis Selection with np.ix_
- Define row indices and column indices as separate lists.
- Pass them into
np.ix_to construct the appropriate broadcasting meshes. - Apply the resulting objects to the array to select a sub-grid of values.
Non-Obvious Insights
- Memory Leak Risks: Small views of large arrays prevent garbage collection of the entire base array; always copy small slices of massive data.
- Copy vs. View Rule: Basic slicing always returns a view; advanced indexing (using non-tuple sequences or arrays) always returns a copy.
- Adjacent Indexing: Mixing basic and advanced indexing behavior changes significantly based on whether the advanced indices are adjacent in the index tuple.
Evidence
- "All arrays generated by basic slicing are always views of the original array." Source
- "Advanced indexing always returns a copy of the data (contrast with basic slicing that returns a view)." Source
Scripts
scripts/numpy-indexing_tool.py: Demonstrates boolean masking and sub-array extraction.scripts/numpy-indexing_tool.js: Simulated coordinate selection logic.
Dependencies
numpy(Python)