name: numpy-random description: Modern random number generation using the Generator API, focusing on statistical properties, parallel streams, and reproducibility. Triggers: random, rng, default_rng, SeedSequence, probability distributions, shuffle.
Overview
NumPy's random module shifted from the legacy RandomState to the modern Generator API. This new approach provides better statistical properties, faster algorithms, and a robust system for parallel random number generation using SeedSequence.
When to Use
- Stochastic simulations requiring high-quality random bits.
- Shuffling datasets for machine learning training.
- Generating independent random streams for parallel computing workers.
- Creating reproducible experiments across different runs.
Decision Tree
- Starting a new project?
- Use
np.random.default_rng(). Do not usenp.random.seed().
- Use
- Need independent streams for multiple CPUs?
- Use
SeedSequence.spawn()to create children.
- Use
- Shuffling in-place?
- Use
rng.shuffle(arr). For a copy, userng.permuted(arr).
- Use
Workflows
Parallel Random Stream Generation
- Initialize a SeedSequence with a high-quality entropy source.
- Use the
.spawn(n)method to create independent seed sequences for workers. - Instantiate a new Generator for each worker using its specific child sequence.
Reproducible Simulation Setup
- Obtain a 128-bit seed (e.g., using
secrets.randbits(128)). - Initialize the generator:
rng = np.random.default_rng(seed). - Log the seed to allow exact reproduction of the stochastic results in future runs.
- Obtain a 128-bit seed (e.g., using
In-Place Array Shuffling
- Create a Generator instance.
- Pass an existing array to
rng.shuffle(arr)to modify it in-place. - Specify the
axisparameter if only certain dimensions (e.g., rows) should be rearranged.
Non-Obvious Insights
- Legacy Discouragement:
RandomStateis essentially in maintenance mode;Generatoris faster and has better statistical distribution qualities. - Small Seed Limitation: Seeding with small integers (0-100) limits the reachable state space;
SeedSequenceensures high-entropy starting states. - Bitstream Instability: Even with the same seed, the bitstream is not guaranteed to be identical across different NumPy versions due to algorithmic improvements.
Evidence
- "In general, users will create a Generator instance with default_rng and call the various methods on it to obtain samples." Source
- "SeedSequence mixes sources of entropy in a reproducible way to set the initial state for independent and very probably non-overlapping BitGenerators." Source
Scripts
scripts/numpy-random_tool.py: Implements parallel seed spawning and reproducible RNG.scripts/numpy-random_tool.js: Basic random sampling logic.
Dependencies
numpy(Python)