docs/maxperf.rst
.. include:: rolesAndUtils.rst
Internally, Pixar relies on USD for low-latency access, preview, and introspection of production-scale 3D datasets throughout our film-making pipeline. We would like to help ensure that USD works as well and with as good performance for you as it does for us. Here are a few tips that can make a huge difference.
.. note::
Starting with release 24.11, Pixar is publishing performance metrics
measured using several different assets. See :ref:`performance_metrics` for
details.
.. _maxperf_optimized_allocator:
Use an allocator optimized for multithreading ##############################################
Most of the tips here are about how to construct and structure your USD data files, but this one tip is about a software-build issue, and it can make a 2x difference in USD's performance.
It requires a significant amount of computation and building of data structures to load a composition of hundreds to thousands of files, and efficiently extract the data needed to render a scene. Like many modern systems, USD tries to leverage multithreaded computations and graph-building as extensively as possible, and attempting to ensure that the USD code scales as closely to linearly as possible as we make more compute cores available to it.
Unfortunately, the "malloc" memory allocators that still ship with some popular
compilers do not scale well in multithreaded algorithms that must allocate
memory in many threads concurrently. Our primary experience is on Linux with
gcc's glibc which is based on ptmalloc <https://github.com/emeryberger/Malloc-Implementations/tree/master/allocators/ptmalloc/ptmalloc3>,
neither of which hold up very well under high thread-count. At Pixar,
we have been successfully and stably using jemalloc <http://jemalloc.net/>,
which has the following virtues:
* Outperforms glibc consistently for USD stage loading and imaging through
Hydra by a factor of **2x** on a 16 core Intel workstation
* Does an outstanding job of returning freed memory to the kernel
In the Advanced Build Configuration <https://github.com/PixarAnimationStudios/OpenUSD/blob/release/BUILDING.md>_
document, you can find instructions for how to build USD linked against a
third-party malloc package such as jemalloc. If you are using USD in a third
party application as an embedded plugin, you can force the application to use a
different allocator on Linux by using the :envvar:LD_PRELOAD environment
variable. For example:
.. code-block:: console
env LD_PRELOAD=/path/to/jemalloc.so thirdPartyApplication
Use binary ".usd" files for geometry and shading caches #######################################################
:filename:.usda text files can be a good choice for small files
that primarily reference/subLayer other files together, such as the top-level
"asset interface" file that defines a :ref:Model <glossary:Model>, provides
its :ref:AssetInfo <glossary:AssetInfo>, declares its :ref:VariantSets <glossary:VariantSet>, and contains a :ref:payload arc <glossary:Payload>
to the asset's contents. But for files that contain more than a few small
definitions or overrides, the binary :ref:"usdc" format <glossary:Crate File Format> will open faster and consume substantially much less memory while
held open (a :ref:UsdStage <glossary:Stage> keeps open all the layers that
participate in a composition). You should not need to exert any extra effort
to get this behavior since creating a new layer or stage with a filename
:filename:someFile.usd will, by default, create a usdc file.
As much as your pipeline allows, you should also prefer binary USD files over Alembic caches, for performance. Alembic is a ground-breaking and outstanding interchange format for geometry caches, and can perform very well in an Alembic-based pipeline. However, while USD is committed to supporting Alembic as an input to composed USD scenes, it cannot perform as well as native USD files for two reasons:
#. In order to provide lockless data access in a highly multithreaded
client, one must configure an Alembic Ogawa archive to open the file
redundantly as many times as you have threads. Each opened file retains
a file descriptor, and file descriptors, on Unix systems, are a finite
resource, especially in applications that still use the *select()* system
call. We commonly construct USD scenes that reference thousands to tens
of thousands of files, which would result in hundreds of thousands of
required file descriptors to make Ogawa competitive with usdc for
multithreaded data access, where usdc, by default, need retain no file
descriptors for the files it keeps open. Our `Alembic File Format plugin
</display/usddoc/alembic+usd+plugin>`_ does not explicitly configure the
archives it opens, which results in a single file descriptor being
consumed, but all threads contending to read data from it.
..
#. Although the Alembic and USD schemas and datatypes are mostly equivalent,
they require translation at the C++ data structure level. So there will
always be an extra level of copying/translation required when serving
Alembic-backed data to USD.
You should not need to replace all your Alembic exporters with USD exporters to
achieve this, because the USD File Format plugin system makes it very easy to
convert between supported file formats. To convert a file from Alembic to USD,
one need only use :ref:usdcat <toolset:usdcat>:
.. code-block:: console
usdcat -o snowman.usd snowman.abc
Package assets with payloads ############################
When dealing with very large scenes, many important pipeline tasks can be
accomplished without knowing about or processing all of the geometry and shading
on many (or all!) of the assets in the scene. These tasks can therefore be
accomplished much more quickly if we can get a view of the scene that does not
populate those aspects of the referenced assets. USD provides a composition arc
called a :ref:Payload <glossary:Payload> that is essentially a
"deferred reference". It allows us to structure scenes so that we can open a
:ref:Stage <glossary:Stage> "unloaded", meaning that we USD will populate the stage
only "up to" the payload arcs. One effective way to make use of this is to
publish each of your "model assets" such that the file that gets referenced into
assemblies and shots is a very lightweight description of the model's
"interface" (e.g. its :ref:AssetInfo <glossary:AssetInfo>,
:ref:VariantSets <glossary:VariantSet>, rest bounding box), and
a payload arc to a separate file that pulls in the complete geometric and
shading description of the asset.
When a scene constructed from references to assets built this way is opened
unloaded, you get a summary view of the scene that will contain its :ref:Model Hierarchy <glossary:Model Hierarchy>, which is sufficient for
some entire tasks, and if not, provides all the information necessary to load
just the model instances required for the task. Large scenes can take
seconds or minutes to open, but typically the Model Hierarchy view can be opened
in under a second, or a small number of seconds.
The USD distribution includes an example python script <https://github.com/PixarAnimationStudios/OpenUSD/blob/release/extras/usd/examples/usdMakeFileVariantModelAsset/usdMakeFileVariantModelAsset.py>_
that demonstrates one simple kind of asset packaging using payloads.
What makes a USD scene heavy/expensive? #######################################
It is common knowledge that the more geometry you have in a scene, and the more complex shading you have, the more expensive it will be to render - and that is true regardless of how you feed the data to the renderer. While we do not have a mathematical formula for how to structure data in USD for minimal latency and lowest memory footprint in getting data to a renderer, we do have some guidelines that we try to keep in mind:
Prefer crate files.
As described above, putting big data in :filename:.usda files increases
latency to opening a :ref:Stage <glossary:Stage> and memory footprint.
Monitor Layer count.
The cost and weight of opening a scene scales with the number of files
that must be opened. Of course, much of the workflow power of USD comes
from its ability to maintain references to assets, and this caution is not
meant to inspire anyone to flatten out their sets or environments into a
single file. Rather, it is to encourage careful consideration of common
published asset structure. In our own experience, it is all too easy to
solve an asset authoring / organization problem by throwing new layers at
it, and the number of published layers per asset can quickly rise from
three or four to ten or more, and those increases multiply by every
(uniquely) referenced asset in a scene. By the time an asset is ready to
be published, typically the workflow issues that benefited from many
layers are no longer relevant, so when possible, try to collapse layers as
part of a publishing step. Currently, the only provided flattening tools
are :ref:usdcat --flatten <toolset:usdcat> and :ref:usdstitch <toolset:usdstitch>, each of which are limited in their
handling of composition arcs. We hope to provide more flattening options
in the future. You can count and examine the number of layers that are
required to produce the current view of a stage using
:usdcpp:UsdStage::GetUsedLayers.
.
Minimize Prim count.
The cost and weight of opening a scene also scales with the ultimate number of
:ref:prims <glossary:Prim> populated on the resultant :ref:UsdStage <glossary:Stage>, but much, much less so with the number of :ref:properties <glossary:Property> contained in the scene. This is due to the fact that
prims can introduce new :ref:composition arcs <glossary:Composition Arcs>
into the scene, and therefore each prim must be uniquely :ref:indexed <glossary:Index> , and the results cached for later property
evaluation. This leads to the following guidelines:
Leverage transformable gprims.
Take advantage of the fact that geometric primitives in USD are
transformable. Do not create an parent :usdcpp:Xform <UsdGeomXform> for
a :usdcpp:Mesh <UsdGeomMesh> for the sole purpose of transforming the
Mesh, since you can transform the Mesh directly using its
:usdcpp:Xformable <UsdGeomXformable>
properties. The :doc:Alembic file format plugin <plugins_alembic>
collapses "xform + shape" combinations into a single "transformable gprim"
in UsdGeom. Leveraging this feature of the UsdGeom schemas typically cuts
the prim count of scenes by 40% to 50%, depending on the scene's branching
structure.
Use Instancing at higher granularities .
Even if your renderer is only able to instance individual gprims, there is a compelling advantage to expressing instancing at a higher granularity and simply letting your renderer process the USD prototype redundantly for each instance to get at the instanceable gprims. While gprim-level instancing is expressible in USD, it provides zero prim-count reduction, and actually adds somewhat to the cost of opening a stage since you have introduced at least one instancing composition arc onto each "leaf" prim - the overhead should not be substantial enough to discourage you from instancing at the fine-grain, but rather encourage you to add another level of instancing on top of it, such as at the asset level.
Prefer Property namespaces for organization.
Properties in USD can be organized into "namespaces", similarly to "compound properties" in Alembic. Property namespaces are simply a reserved separator character, ":" with API for creating and enumerating properties by (multiple) levels of namespacing. User properties, primvars, and other "property schemas" leverage property namespaces effectively.