torch/utils/benchmark/README.md
NOTE: These components are currently work in progress.
This class is modeled on the timeit.Timer API, but with PyTorch specific
facilities which make it more suitable for benchmarking kernels. These fall
into two broad categories:
Timer will invoke torch.cuda.synchronize() if applicable, control the
number of torch threads, add a warmup, and warn if a measurement appears
suspect or downright unreliable.
Timer, while modeled after the timeit analog, uses a slightly different
API from timeit.Timer.
The constructor accepts additional metadata and timing methods return
a Measurement class rather than a float. This Measurement class is
serializable and allows many examples to be grouped and interpreted.
(See Compare for more details.)
Timer implements the blocked_autorange function which is a
mixture of timeit.Timer.repeat and timeit.Timer.autorange. This function
selects an appropriate number and runs for a roughly fixed amount of time
(like autorange), but is less wasteful than autorange which discards
~75% of measurements. It runs many times, similar to repeat, and returns
a Measurement containing all of the run results.
Compare takes a list of Measurements in its constructor, and displays them
as a formatted table for easier analysis. Identical measurements will be
merged, which allows Compare to process replicate measurements. Several
convenience methods are also provided to truncate displayed values based on
the number of significant figures and color code measurements to highlight
performance differences. Grouping and layout is based on metadata passed to
Timer:
label: This is a top level description. (e.g. add, or multiply) one
table will be generated per unique label.
sub_label: This is the label for a given configuration. Multiple statements
may be logically equivalent differ in implementation. Assigning separate
sub_labels will result in a row per sub_label. If a sublabel is not provided,
stmt is used instead. Statistics (such as computing the fastest
implementation) use all sub_labels.
description: This describes the inputs. For instance, stmt=torch.add(x, y)
can be run over several values of x and y. Each pair should be given its
own description, which allows them to appear in separate columns.
Statistics do not mix values of different descriptions, since comparing the
run time of drastically different inputs is generally not meaningful.
env: An optional description of the torch environment. (e.g. master or
my_branch). Like sub_labels, statistics are calculated across envs. (Since
comparing a branch to master or a stable release is a common use case.)
However Compare will visually group rows which are run with the same env.
num_threads: By default, Timer will run in single-threaded mode. If
Measurements with different numbers of threads are given to Compare, they
will be grouped into separate blocks of rows.
The Fuzzer class is designed to allow very flexible and repeatable
construction of a wide variety of Tensors while automating away some
of the tedium that comes with creating good benchmark inputs. The two
APIs of interest are the constructor and Fuzzer.take(self, n: int).
At construction, a Fuzzer is a spec for the kind of Tensors that
should be created. It takes a list of FuzzedParameters, a list of
FuzzedTensors, and an integer with which to seed the Fuzzer.
The reason for distinguishing between parameters and Tensors is that the shapes
and data of Tensors is often linked (e.g. shapes must be identical or
broadcastable, indices must fall within a given range, etc.) As a result we
must first materialize values for each parameter, and then use them to
construct Tensors in a second pass. As a concrete reference, the following
will create Tensors x and y, where x is a 2D Tensor and y is
broadcastable to the shape of x:
fuzzer = Fuzzer(
parameters=[
FuzzedParameter("k0", 16, 16 * 1024, "loguniform"),
FuzzedParameter("k1", 16, 16 * 1024, "loguniform"),
],
tensors=[
FuzzedTensor(
name="x", size=("k0", "k1"), probability_contiguous=0.75
),
FuzzedTensor(
name="y", size=("k0", 1), probability_contiguous=0.75
),
],
seed=0,
)
Calling fuzzer.take(n) will create a generator with n elements which
yields randomly generated Tensors satisfying the above definition, as well
as some metadata about the parameters and Tensors. Critically, calling
.take(...) multiple times will produce generators which select the same
parameters, allowing repeat measurements and different environments to
conduct the same trial. FuzzedParameter and FuzzedTensor support a
fairly involved set of behaviors to reflect the rich character of Tensor
operations and representations. (For instance, note the
probability_contiguous argument which signals that some fraction of the
time non-contiguous Tensors should be created.) The best way to understand
Fuzzer, however, is probably to experiment with examples.fuzzer.
python -m examples.simple_timeit
python -m examples.compare
python -m examples.fuzzer
python -m examples.end_to_end