crates/optix/src/introduction.md
NVIDIA OptiX 7 is intended for ray tracing applications that use NVIDIA® CUDA® technology, such as:
NVIDIA OptiX 7 also includes support for motion blur and multi-level transforms, features required by ray-tracing applications designed for production-quality rendering.
OptiX uses a shorthand to describe some common program components and data structures that are worth memorizing as they crop up a lot.
RG - Ray generation - This is the entry point into the OptiX programming
model and is generally responsible for creating and tracing rays.IS - Intersection - Run to provide intersections with custom,
user-defined primitives (as opposed to built-in triangles).AH - Any-hit - Run during ray traversal for each potential intersection.
reports to OptiX whether the intersection should be considered valid and
whether to stop traversal.CH - Closest-hit - Run only for the closest hit found during ray
traversal. Can inspect and interpolating properties of the intersected
primitive.MS - Miss - Run whenever a ray exits the scene without hitting anything.EX - Exception - Run whenever an exception condition is found.DC - Direct callable - Can be called manually from another program.
May not itself continue ray traversal (i.e. may not call optixTrace).CC - Continuation callable - Can be called manually from another
program and may continue ray traversal.In this document and in the names of API elements, the “host” is the processor that begins execution of an application. The “device” is the GPU with which the host interacts. A “build” is the creation of an acceleration structure on the device as initiated by the host.
The NVIDIA OptiX 7 API is a CUDA-centric API that is invoked by a CUDA-based application. The API is designed to be stateless, multi-threaded and asynchronous, providing explicit control over performance-sensitive operations like memory management and shader compilation.
It supports a lightweight representation for scenes that can represent instancing, vertex- and transform-based motion blur, with built-in triangles, built-in swept curves, and user-defined primitives. The API also includes highly-tuned kernels and neural networks for machine-learning-based denoising.
An NVIDIA OptiX 7 context controls a single GPU. The context does not hold bulk CPU allocations, but like CUDA, may allocate resources on the device necessary to invoke the launch. It can hold a small number of handle objects that are used to manage expensive host-based state. These handle objects are automatically released when the context is destroyed. Handle objects, where they do exist, consume a small amount of host memory (typically less than 100 kilobytes) and are independent of the size of the GPU resources being used. For exceptions to this rule, see “Program pipeline creation”.
The application invokes the creation of acceleration structures (called builds), compilation, and host-device memory transfers. All API functions employ CUDA streams and invoke GPU functions asynchronously, where applicable. If more than one stream is used, the application must ensure that required dependencies are satisfied by using CUDA events to avoid race conditions on the GPU.
Applications can specify multi-GPU capabilities with a few different recipes. Multi-GPU features such as efficient load balancing or the sharing of GPU memory via NVLINK must be handled by the application developer.
For efficiency and coherence, the NVIDIA OptiX 7 runtime—unlike CUDA kernels— allows the execution of one task, such as a single ray, to be moved at any point in time to a different lane, warp or streaming multiprocessor (SM). (See section “Kernel Focus” in the CUDA Toolkit Documentation.) Consequently, applications cannot use shared memory, synchronization, barriers, or other SM-thread-specific programming constructs in their programs supplied to OptiX.
The NVIDIA OptiX 7 programming model provides an API that future-proofs applications: as new NVIDIA hardware features are released, existing programs can use them. For example, software-based ray tracing algorithms can be mapped to hardware when support is added or mapped to software when the underlying algorithms or hardware support such changes.
In NVIDIA OptiX 7, a program is a block of executable code on the GPU that represents a particular shading operation. This is called a shader in DXR and Vulkan. For consistency with prior versions of NVIDIA OptiX 7, the term program is used in the current documentation. This term also serves as a reminder that these blocks of executable code are programmable components in the system that can do more than shading. See “Program input”.
NVIDIA OptiX 7 implements a single-ray programming model with ray generation, any-hit, closest-hit, miss and intersection programs.
The ray tracing pipeline provided by NVIDIA OptiX 7 is implemented by eight types of programs:
The entry point into the ray tracing pipeline, invoked by the system in parallel for each pixel, sample, or other user-defined work assignment. See “Ray generation launches”.
Implements a ray-primitive intersection test, invoked during traversal. See “Traversing the scene graph” and “Ray information”.
Called when a traced ray finds a new, potentially closest, intersection point, such as for shadow computation. See “Ray information”.
Called when a traced ray finds the closest intersection point, such as for material shading. See “Constructing a path tracer”.
Called when a traced ray misses all scene geometry. See “Constructing a path tracer”.
Exception handler, invoked for conditions such as stack overflow and other errors. See “Exceptions”.
Similar to a regular CUDA function call, direct callables are called immediately. See “Callables”.
Unlike direct callables, continuation callables are executed by the scheduler. See “Callables”.
The ray-tracing “pipeline” is based on the interconnected calling structure of the eight programs and their relationship to the search through the geometric data in the scene, called a traversal. Figure 2.1 is a diagram of these relationships:
![Figure 2.1 - Optix Progams][optix_programs]
The shader binding table connects geometric data to programs and their
parameters. A record is a component of the shader binding table that is selected
during execution by using offsets specified when acceleration structures are
created and at runtime. A record contains two data regions, header and data.
SBT record packing is handled automatically by using the
SbtRecord generic struct:
use cust::prelude as cu;
use optix::prelude as ox;
#[derive(Copy, Clone, Default, cu::DeviceCopy)]
struct HitgroupSbtData {
object_id: u32,
}
type HitgroupRecord = ox::SbtRecord<HitgroupSbtData>;
let rec_hitgroup: Vec<_> = (0..num_objects)
.map(|i| {
let object_type = 0;
let rec = HitgroupRecord::pack(
HitgroupSbtData { object_id: i },
&pg_hitgroup[object_type],
)
.expect("failed to pack hitgroup record");
rec
})
.collect();
The ray payload is used to pass data between optixTrace and the programs
invoked during ray traversal. Payload values are passed to and returned from
optixTrace, and follow a copy-in/copy-out semantic. There is a limited number
of payload values, but one or more of these values can also be a pointer to
stack-based local memory, or application-managed global memory.
Attributes are used to pass data from intersection programs to the any-hit and closest-hit programs. Triangle intersection provides two predefined attributes for the barycentric coordinates (U,V). User-defined intersections can define a limited number of other attributes that are specific to those primitives.
NVIDIA OptiX 7 represents GPU information with a pointer to GPU memory. References to the term “buffer” in this document refer to this GPU memory pointer and the associated memory contents. Unlike NVIDIA OptiX 6, the allocation and transfer of buffers is explicitly controlled by user code.
NVIDIA OptiX 7 acceleration structures are opaque data structures built on the device. Typically, they are based on the bounding volume hierarchy model, but implementations and the data layout of these structures may vary from one GPU architecture to another.
NVIDIA OptiX 7 provides two basic types of acceleration structures:
To determine the intersection of geometric data by a ray, NVIDIA OptiX 7 searches a graph of nodes composed of acceleration structures and transformations. This search is called a traversal; the nodes in the graph are called traversable objects or traversables.
The following types of traversable objects exist:
For transformation traversables, the corresponding transformation applies to all descendant child traversables (the sub graph spanned by the child of the transformation traversable). The transformation traversables should only be used in case of motion as applying transformations to geometry is order dependent and motion transformations are time dependent. Static transformations are available as they cannot be merged with any motion transformation due to time-dependency, but should be merged with instance transformations (if desired as the child of an instance) or any other static transformation (i.e., there should be at most one static transformation following a motion transformation). For example, Figure 2.2 combines both types:
![Figure 2.2 - Traversables graph][traversables_graph]
OptiX uses handles as references to traversable objects. These traversable
handles are 64-bit opaque values that are generated from device memory pointers
for the graph nodes. The handles identify the connectivity of these objects.
All calls to optixTrace begin at a traversable handle.
A functional ray tracing system is implemented by combining four components as described in the following steps:
Ray tracing work can be interleaved with other CUDA work to generate data, move data to and from the device, and move data to other graphics APIs. It is the application's responsibility to coordinate all work on the GPU. NVIDIA OptiX 7 does not synchronize with any other work.
All OptiX functions return a return code, which is converted to a Rust
Result<T, OptixError>. You can also set a logging callback with
DeviceContext::set_log_callback
to have OptiX report additional information.
Functions that compile also return a String containing additional messages
for warnings and errors.
Given the same input, the same output should be generated. GPU state is not held by NVIDIA OptiX 7 internally.
In NVIDIA OptiX 7 functions, a CUDA Stream is associated
with the CUDA Context used to create the
DeviceContext. Some API functions take a
Stream as an argument. These functions incur work on
the device and require that the CUDA Context associated
with the DeviceContextis the current context when
they are called. Applications can expect the
CUDA Context to remain the same after invoking NVIDIA
OptiX 7 functions.
Work performed on the device is issued on an application-supplied CUDA Stream using asynchronous CUDA methods. The host function blocks execution until all work has been issued on the stream, but does not do any synchronization or blocking on the stream itself.
You must call optix::init() in order to load the function symbols from
the OptiX library in the driver before calling any other functions.