Back to Numpy

Performance

doc/source/reference/random/performance.rst

2.5.0.dev06.7 KB
Original Source

Performance

.. currentmodule:: numpy.random

Recommendation

The recommended generator for general use is PCG64 or its upgraded variant PCG64DXSM for heavily-parallel use cases. They are statistically high quality, full-featured, and fast on most platforms, but somewhat slow when compiled for 32-bit processes. See :ref:upgrading-pcg64 for details on when heavy parallelism would indicate using PCG64DXSM.

Philox is fairly slow, but its statistical properties have very high quality, and it is easy to get an assuredly-independent stream by using unique keys. If that is the style you wish to use for parallel streams, or you are porting from another system that uses that style, then Philox is your choice.

SFC64 is statistically high quality and very fast. However, it lacks jumpability. If you are not using that capability and want lots of speed, even on 32-bit processes, this is your choice.

MT19937 fails some statistical tests_ and is not especially fast compared to modern PRNGs. For these reasons, we mostly do not recommend using it on its own, only through the legacy RandomState for reproducing old results. That said, it has a very long history as a default in many systems.

.. _fails some statistical tests: https://www.iro.umontreal.ca/~lecuyer/myftp/papers/testu01.pdf

Timings

The timings below are the time in ns to produce 1 random value from a specific distribution. The original MT19937 generator is much slower since it requires 2 32-bit values to equal the output of the faster generators.

Integer performance has a similar ordering.

The pattern is similar for other, more complex generators. The normal performance of the legacy RandomState generator is much lower than the other since it uses the Box-Muller transform rather than the Ziggurat method. The performance gap for Exponentials is also large due to the cost of computing the log function to invert the CDF. The column labeled MT19973 uses the same 32-bit generator as RandomState but produces random variates using Generator.

.. csv-table:: :header: ,MT19937,PCG64,PCG64DXSM,Philox,SFC64,RandomState :widths: 14,14,14,14,14,14,14

32-bit Unsigned Ints,3.3,1.9,2.0,3.3,1.8,3.1
64-bit Unsigned Ints,5.6,3.2,2.9,4.9,2.5,5.5
Uniforms,5.9,3.1,2.9,5.0,2.6,6.0
Normals,13.9,10.8,10.5,12.0,8.3,56.8
Exponentials,9.1,6.0,5.8,8.1,5.4,63.9
Gammas,37.2,30.8,28.9,34.0,27.5,77.0
Binomials,21.3,17.4,17.6,19.3,15.6,21.4
Laplaces,73.2,72.3,76.1,73.0,72.3,82.5
Poissons,111.7,103.4,100.5,109.4,90.7,115.2

The next table presents the performance in percentage relative to values generated by the legacy generator, RandomState(MT19937()). The overall performance was computed using a geometric mean.

.. csv-table:: :header: ,MT19937,PCG64,PCG64DXSM,Philox,SFC64 :widths: 14,14,14,14,14,14

32-bit Unsigned Ints,96,162,160,96,175
64-bit Unsigned Ints,97,171,188,113,218
Uniforms,102,192,206,121,233
Normals,409,526,541,471,684
Exponentials,701,1071,1101,784,1179
Gammas,207,250,266,227,281
Binomials,100,123,122,111,138
Laplaces,113,114,108,113,114
Poissons,103,111,115,105,127
Overall,159,219,225,174,251

.. note::

All timings were taken using Linux on an AMD Ryzen 9 3900X processor.

Performance on different operating systems

Performance differs across platforms due to compiler and hardware availability (e.g., register width) differences. The default bit generator has been chosen to perform well on 64-bit platforms. Performance on 32-bit operating systems is very different.

The values reported are normalized relative to the speed of MT19937 in each table. A value of 100 indicates that the performance matches the MT19937. Higher values indicate improved performance. These values cannot be compared across tables.

64-bit Linux


=====================   =========  =======  ===========  ========  =======
Distribution            MT19937    PCG64    PCG64DXSM    Philox    SFC64
=====================   =========  =======  ===========  ========  =======
32-bit Unsigned Ints          100      168         166        100      182
64-bit Unsigned Ints          100      176         193        116      224
Uniforms                      100      188         202        118      228
Normals                       100      128         132        115      167
Exponentials                  100      152         157        111      168
Overall                       100      161         168        112      192
=====================   =========  =======  ===========  ========  =======


64-bit Windows

The relative performance on 64-bit Linux and 64-bit Windows is broadly similar with the notable exception of the Philox generator.

===================== ========= ======= =========== ======== ======= Distribution MT19937 PCG64 PCG64DXSM Philox SFC64 ===================== ========= ======= =========== ======== ======= 32-bit Unsigned Ints 100 155 131 29 150 64-bit Unsigned Ints 100 157 143 25 154 Uniforms 100 151 144 24 155 Normals 100 129 128 37 150 Exponentials 100 150 145 28 159 Overall 100 148 138 28 154 ===================== ========= ======= =========== ======== =======

32-bit Windows


The performance of 64-bit generators on 32-bit Windows is much lower than on 64-bit
operating systems due to register width. MT19937, the generator that has been
in NumPy since 2005, operates on 32-bit integers.

=====================   =========  =======  ===========  ========  =======
Distribution            MT19937    PCG64    PCG64DXSM    Philox    SFC64
=====================   =========  =======  ===========  ========  =======
32-bit Unsigned Ints          100       24           34        14       57
64-bit Unsigned Ints          100       21           32        14       74
Uniforms                      100       21           34        16       73
Normals                       100       36           57        28      101
Exponentials                  100       28           44        20       88
**Overall**                   100       25           39        18       77
=====================   =========  =======  ===========  ========  =======


.. note::

   Linux timings used Ubuntu 20.04 and GCC 9.3.0.  Windows timings were made on
   Windows 10 using Microsoft C/C++ Optimizing Compiler Version 19 (Visual
   Studio 2019). All timings were produced on an AMD Ryzen 9 3900X processor.