tools/experimental/trt-engine-explorer/notebooks/tutorial.ipynb
Use this notebook to learn how to use trex to explore the structure and characteristics of a TensorRT Engine plan.
Starting with TensorRT 8.2, engine-plan graph and profiling data can be exported to JSON files. trex loads these files and queries their content using a simple API that wraps Pandas' API
Using trtexec it is easy to create the plan graph and profiling JSON files. Three flags are required:
--exportProfile=$profile_json
--exportLayerInfo=$graph_json
--profilingVerbosity=detailed
A utility Python script utils/process_engine.py can be used to create the JSON files. This script executes trtexec and therefore the script can only be invoked within an environment that has trtexec installed and included in $PATH.
# !python3 ../utils/process_engine.py ../tests/inputs/mobilenet.qat.onnx ../tests/inputs best
%matplotlib inline
import matplotlib.pyplot as plt
import os
import pandas as pd
import trex
import trex.notebook
import trex.plotting
import trex.graphing
import trex.df_preprocessing
# Configure a wider output (for the wide graphs)
trex.notebook.set_wide_display()
# Choose an engine file to load. This notebook assumes that you've saved the engine to the following paths.
engine_name = "../tests/inputs/mobilenet.qat.onnx.engine"
engine_name = "../tests/inputs/mobilenet_v2_residuals.qat.onnx.engine"
EnginePlan instance and start exploringassert engine_name is not None
plan = trex.EnginePlan(f'{engine_name}.graph.json', f'{engine_name}.profile.json', f'{engine_name}.profile.metadata.json')
It is helpful to look at a high-level summary of the engine plan before diving into the details.
EnginePlan's name is derived from the graph file name, but you can also set the name explicitly at any time. This is useful when there are multiple plans with the same file name.print(f"Summary for {plan.name}:\n")
plan.summary()
An EnginePlan is an object that wraps a Pandas DataFrame data-structure. Most of the examples below utilize this dataframe (df) for querying, slicing and rendering information about the EnginePlan.
The dataframe captures the information from the plan engine graph and profiling JSON files. If both JSON files are available the latency data of each layer is added as three new columns: latency.time (the total latency of the layer summed across all measurement iterations), latency.avg_time (the average latency of the layer), latency.pct_time (the latency of the layer as a proportion of the overall engine latency).
When the dataframe is constructed several new layers that summarize footprint information are added: total_io_size_bytes, weights_size, total_footprint_bytes.
A few of the column names are changed from the original JSON file, to give them clearer names.
Accessing the dataframe is straight-forward:
df = plan.df
You can print the names of the columns in the dataframe.
available_cols = df.columns
print(f"These are the column names in the plan\n: {available_cols}")
A dataframe can be rendered as a table. The columns are from various layers so the dataframe is very sparse.
Use the column controls to sort or filter layers.
An interesting view sorts the layers by latency.pct_time.
The dtale toolbar makes it easy to open the table in a new tab (useful for large tables) and to export the data to CSV and HTML.
trex.notebook.display_df(plan.df)
When rendering engine plan dataframes we usually want to reduce the visual clutter and render only the important columns.
The function clean_for_display does exactly that.
The column order is changed, in order to bring important columns to the front.
Columns Inputs and Outputs are reformatted to reduce verbosity.
Finally, a few columns are dropped and NaNs are replaced with zeros.
df = trex.df_preprocessing.clean_for_display(plan.df)
print(f"These are the column names in the plan\n: {df.columns}")
trex.notebook.display_df(df)
This example shows how to create a bar diagram of the count of each layer type.
trex provides a utility wrapper around Pandas' API, but you can freely use the Pandas API to extract data from the plan dataframe.
layer_types = trex.group_count(plan.df, 'type')
# Simple DF print
print(layer_types)
# dtale DF display
trex.notebook.display_df(layer_types)
trex provides wrappers to plotly's plotting API. plotly_bar2 is the main utility for creating bar charts.
trex.plotting.plotly_bar2(
df=layer_types,
title='Layer Count By Type',
values_col='count',
names_col='type',
orientation='v',
color='type',
colormap=trex.colors.layer_colormap,
show_axis_ticks=(True, True));
Pandas' powerful API can be used on the Plan dataframe. For example, we can easily query for the 3 layers that consume the most time:
top3 = plan.df.nlargest(3, 'latency.pct_time')
trex.notebook.display_df(top3)
The chart below provides a quick view of the layers latencies. The values of df[values_col] set the bar height and the values of df[names_col] provide the bar name. In this case, the latency of each layer is plotted vs the name of the layer. The colors of the bars are determined by color and colormap, if provided.
For example in the statement colormap[df['type']], the bar colors are determined by the layer type and layer_colormap, which is a trex dictionary which maps layer types to preset colors.
trex.plotting.plotly_bar2(
df=plan.df,
title="% Latency Budget Per Layer",
values_col="latency.pct_time",
names_col="Name",
color='type',
use_slider=False,
colormap=trex.colors.layer_colormap);
plotly_hist is a wrapper of Plotly's histograms chart. It has arguments similar to plotly_bar2, but not as many. plotly_hist plots the histogram of df[values_col].
Here's a look at how the layer latencies distribute:
trex.plotting.plotly_hist(
df=plan.df,
title="Layer Latency Distribution",
values_col="latency.pct_time",
xaxis_title="Latency (ms)",
color='type',
colormap=trex.colors.layer_colormap);
Pandas' aggregation and reductions can be used to provide interesting information.
Here we group the layer latencies by the layer types. The data can be displayed as a chart or as a summary table, like the one below.
time_pct_by_type = plan.df.groupby(["type"])[["latency.pct_time", "latency.avg_time"]].sum().reset_index()
trex.notebook.display_df(time_pct_by_type)
trex.plotting.plotly_bar2(
df=time_pct_by_type,
title="% Latency Budget Per Layer Type",
values_col="latency.pct_time",
names_col="type",
orientation='h',
color='type',
colormap=trex.colors.layer_colormap);
Treemaps provide a different view of the profiling data.
In this example we use a Plotly Express Treemap directly, without any wrappers.
import plotly.express as px
fig = px.treemap(
plan.df,
path=['type', 'Name'],
values='latency.pct_time',
title='Treemap Of Layer Latencies (Size & Color Indicate Latency)',
color='latency.pct_time')
fig.update_layout(margin = dict(t=50, l=25, r=25, b=25))
fig.show()
This is another view of how layer latencies interact with layer data size.
fig = px.treemap(
plan.df,
path=['type', 'Name'],
values='latency.pct_time',
title='Treemap Of Layer Latencies (Size Indicates Latency. Color Indicates Activations Size)',
color='total_io_size_bytes')
fig.update_traces(root_color="white")
fig.update_layout(margin = dict(t=50, l=25, r=25, b=25))
fig.show()
trex.plotting.plotly_bar2(
plan.df,
"Weights Sizes Per Layer",
"weights_size", "Name",
color='type',
colormap=trex.colors.layer_colormap)
trex.plotting.plotly_bar2(
plan.df,
"Activations Sizes Per Layer",
"total_io_size_bytes",
"Name",
color='type',
colormap=trex.colors.layer_colormap)
trex.plotting.plotly_hist(
plan.df,
"Layer Activations Sizes Distribution",
"total_io_size_bytes",
"Size (bytes)",
color='type',
colormap=trex.colors.layer_colormap)
plan.df["total_io_size_bytes"].describe()
trex provides a wrapper to Plotly's pie charts as well. Several charts can be plotted in a grid.
precision_colormap color the pie slices by the value of df['precision'].
charts = []
layer_precisions = trex.group_count(plan.df, 'precision')
charts.append((layer_precisions, 'Layer Count By Precision', 'count', 'precision'))
layers_time_pct_by_precision = trex.group_sum_attr(plan.df, grouping_attr='precision', reduced_attr='latency.pct_time')
display(layers_time_pct_by_precision)
charts.append((layers_time_pct_by_precision, '% Latency Budget By Precision', 'latency.pct_time', 'precision'))
trex.plotting.plotly_pie2("Precision Statistics", charts, colormap=trex.colors.precision_colormap);
trex.plotting.plotly_bar2(
plan.df,
"% Latency Budget Per Layer
(bar color indicates precision)",
"latency.pct_time",
"Name",
color='precision',
colormap=trex.colors.precision_colormap);
It is very helpful to draw the graph of the engine plan.
A formatter can be used to configure the colors of nodes. trex provides layer_type_formatter which paints graph nodes by their layer type, and precision_formatter which paints graph nodes according to their precision.
to_dot converts an EnginePlan to dot file which can be rendered to SVG or PNG.
SVG files render faster than PNG, they are searchable and provide sharp and crisp graphs in all resolutions. Because graphs are large, it is recommended to view the rendered graph file in another browser window.
formatter = trex.graphing.layer_type_formatter if True else trex.graphing.precision_formatter
graph = trex.graphing.to_dot(plan, formatter)
svg_name = trex.graphing.render_dot(graph, engine_name, 'svg')
PNG files can be rendered inside the notebook, but the graphs are usually very large and resolution suffers.
png_name = trex.graphing.render_dot(graph, engine_name, 'png')
from IPython.display import Image
display(Image(filename=png_name))
Sometimes it is interesting to look at all layers of a certain type. You can use Pandas' API (e.g. query) to slice the dataframe by layer type:
convs1 = plan.df.query("type == 'Convolution'")
convs2 = df[df.type == 'Convolution']
However, trex provides a get_layers_by_type API which performs some layer-type-specific preprocessing which is often useful. In the case of convolutions,
convs = plan.get_layers_by_type('Convolution')
trex.notebook.display_df(convs)
trex.plotting.plotly_bar2(
convs,
"Latency Per Layer (%)
(bar color indicates precision)",
"attr.arithmetic_intensity", "Name",
color='precision',
colormap=trex.colors.precision_colormap)
trex.plotting.plotly_bar2(
convs,
"Convolution Data Sizes
(bar color indicates latency)",
"total_io_size_bytes",
"Name",
color='latency.pct_time');
Arithmetic intensity (AI) is a measure of the amount of compute expended per data byte. Layers with higher AI are in general more efficient because moving data is much slower than computing an operation. For each unit of data fetched from memory we want to perform many computations, because the GPU can compute much faster than it can fetch data from memory. From more on AI, see https://en.wikipedia.org/wiki/Roofline_model#Arithmetic_intensity.
This is a simplistic model which assumes that we read the data only once, and write it out once. In practice, when computing Convolution and GEMM operations, memory tiles are read several times, usually from fast shared-memory, L1 or L2 cached memory.
trex.plotting.plotly_bar2(
convs,
"Convolution Arithmetic Intensity
(bar color indicates activations size)",
"attr.arithmetic_intensity",
"Name",
color='total_io_size_bytes')
trex.plotting.plotly_bar2(
convs,
"Convolution Arithmetic Intensity
(bar color indicates latency)",
"attr.arithmetic_intensity",
"Name",
color='latency.pct_time');
Another simplistic model measures the compute and memory efficiency. These indicators are calculated by taking the number of operations (or memory bytes) and dividing by the layer's execution time.
# Memory accesses per ms (assuming one time read/write penalty)
trex.plotting.plotly_bar2(
convs,
"Convolution Memory Efficiency
(bar color indicates latency)",
"attr.memory_efficiency",
"Name",
color='latency.pct_time')
# Compute operations per ms (assuming one time read/write penalty)
trex.plotting.plotly_bar2(
convs,
"Convolution Compute Efficiency
(bar color indicates latency)",
"attr.compute_efficiency",
"Name",
color='latency.pct_time');
convs = plan.get_layers_by_type('Convolution')
charts = []
convs_count_by_type = trex.group_count(convs, 'subtype')
charts.append((convs_count_by_type, 'Count', 'count', 'subtype'))
convs_time_pct_by_type = trex.group_sum_attr(convs, grouping_attr='subtype', reduced_attr='latency.pct_time')
charts.append((convs_time_pct_by_type, '% Latency Budget', 'latency.pct_time', 'subtype'))
trex.plotting.plotly_pie2("Convolutions Statistics (Subtype)", charts)
charts = []
convs_count_by_group_size = trex.group_count(convs, 'attr.groups')
charts.append((convs_count_by_group_size, 'Count', 'count', 'attr.groups'))
convs_time_pct_by_grp_size = trex.group_sum_attr(convs, grouping_attr='attr.groups', reduced_attr='latency.pct_time')
charts.append((convs_time_pct_by_grp_size, '% Latency Budget', 'latency.pct_time', 'attr.groups'))
trex.plotting.plotly_pie2("Convolutions Statistics (Number of Groups)", charts)
charts = []
convs_count_by_precision = trex.group_count(convs, 'precision')
charts.append((convs_count_by_precision, 'Count', 'count', 'precision'))
convs_time_pct_by_precision = trex.group_sum_attr(convs, grouping_attr='precision', reduced_attr='latency.pct_time')
charts.append((convs_time_pct_by_precision, '% Latency Budget', 'latency.pct_time', 'precision'))
trex.plotting.plotly_pie2("Convolutions Statistics (Precision)", charts, colormap=trex.colors.precision_colormap);