Overview of OptiX

NVIDIA OptiX 7 is intended for ray tracing applications that use NVIDIA® CUDA® technology, such as:

Film and television visual effects
Computer-aided design for engineering and manufacturing
Light maps generated by path tracing
High-performance computing
LIDAR simulation

NVIDIA OptiX 7 also includes support for motion blur and multi-level transforms, features required by ray-tracing applications designed for production-quality rendering.

Terms used in this documentation

OptiX uses a shorthand to describe some common program components and data structures that are worth memorizing as they crop up a lot.

Program Types

RG - Ray generation - This is the entry point into the OptiX programming model and is generally responsible for creating and tracing rays.
IS - Intersection - Run to provide intersections with custom, user-defined primitives (as opposed to built-in triangles).
AH - Any-hit - Run during ray traversal for each potential intersection. reports to OptiX whether the intersection should be considered valid and whether to stop traversal.
CH - Closest-hit - Run only for the closest hit found during ray traversal. Can inspect and interpolating properties of the intersected primitive.
MS - Miss - Run whenever a ray exits the scene without hitting anything.
EX - Exception - Run whenever an exception condition is found.
DC - Direct callable - Can be called manually from another program. May not itself continue ray traversal (i.e. may not call optixTrace).
CC - Continuation callable - Can be called manually from another program and may continue ray traversal.

Acceleration structures

geometry-AS/GAS/BLAS - Geometry/Bottom-level acceleration structure. An acceleration structure built over geometric primitives such as curves
instance-AS/IAS/TLAS - Instance/Top-level acceleration structure built over other acceleration structures and/or transform nodes in order to compose more complex scenes and implement instancing and rigid transformations.

In this document and in the names of API elements, the “host” is the processor that begins execution of an application. The “device” is the GPU with which the host interacts. A “build” is the creation of an acceleration structure on the device as initiated by the host.

Overview

The NVIDIA OptiX 7 API is a CUDA-centric API that is invoked by a CUDA-based application. The API is designed to be stateless, multi-threaded and asynchronous, providing explicit control over performance-sensitive operations like memory management and shader compilation.

It supports a lightweight representation for scenes that can represent instancing, vertex- and transform-based motion blur, with built-in triangles, built-in swept curves, and user-defined primitives. The API also includes highly-tuned kernels and neural networks for machine-learning-based denoising.

An NVIDIA OptiX 7 context controls a single GPU. The context does not hold bulk CPU allocations, but like CUDA, may allocate resources on the device necessary to invoke the launch. It can hold a small number of handle objects that are used to manage expensive host-based state. These handle objects are automatically released when the context is destroyed. Handle objects, where they do exist, consume a small amount of host memory (typically less than 100 kilobytes) and are independent of the size of the GPU resources being used. For exceptions to this rule, see “Program pipeline creation”.

The application invokes the creation of acceleration structures (called builds), compilation, and host-device memory transfers. All API functions employ CUDA streams and invoke GPU functions asynchronously, where applicable. If more than one stream is used, the application must ensure that required dependencies are satisfied by using CUDA events to avoid race conditions on the GPU.

Applications can specify multi-GPU capabilities with a few different recipes. Multi-GPU features such as efficient load balancing or the sharing of GPU memory via NVLINK must be handled by the application developer.

For efficiency and coherence, the NVIDIA OptiX 7 runtime—unlike CUDA kernels— allows the execution of one task, such as a single ray, to be moved at any point in time to a different lane, warp or streaming multiprocessor (SM). (See section “Kernel Focus” in the CUDA Toolkit Documentation.) Consequently, applications cannot use shared memory, synchronization, barriers, or other SM-thread-specific programming constructs in their programs supplied to OptiX.

The NVIDIA OptiX 7 programming model provides an API that future-proofs applications: as new NVIDIA hardware features are released, existing programs can use them. For example, software-based ray tracing algorithms can be mapped to hardware when support is added or mapped to software when the underlying algorithms or hardware support such changes.

Basic concepts and definitions

Program

In NVIDIA OptiX 7, a program is a block of executable code on the GPU that represents a particular shading operation. This is called a shader in DXR and Vulkan. For consistency with prior versions of NVIDIA OptiX 7, the term program is used in the current documentation. This term also serves as a reminder that these blocks of executable code are programmable components in the system that can do more than shading. See “Program input”.

Program and Data Model

NVIDIA OptiX 7 implements a single-ray programming model with ray generation, any-hit, closest-hit, miss and intersection programs.

The ray tracing pipeline provided by NVIDIA OptiX 7 is implemented by eight types of programs:

Ray generation (RG)

The entry point into the ray tracing pipeline, invoked by the system in parallel for each pixel, sample, or other user-defined work assignment. See “Ray generation launches”.

Intersection (IS)

Implements a ray-primitive intersection test, invoked during traversal. See “Traversing the scene graph” and “Ray information”.

Any-hit (AH)

Called when a traced ray finds a new, potentially closest, intersection point, such as for shadow computation. See “Ray information”.

Closest-hit (CH)

Called when a traced ray finds the closest intersection point, such as for material shading. See “Constructing a path tracer”.

Miss

Called when a traced ray misses all scene geometry. See “Constructing a path tracer”.

Exception

Exception handler, invoked for conditions such as stack overflow and other errors. See “Exceptions”.

Direct callables

Similar to a regular CUDA function call, direct callables are called immediately. See “Callables”.

Continuation callables

Unlike direct callables, continuation callables are executed by the scheduler. See “Callables”.

The ray-tracing “pipeline” is based on the interconnected calling structure of the eight programs and their relationship to the search through the geometric data in the scene, called a traversal. Figure 2.1 is a diagram of these relationships:

![Figure 2.1 - Optix Progams][optix_programs]

Shader Binding Table

The shader binding table connects geometric data to programs and their parameters. A record is a component of the shader binding table that is selected during execution by using offsets specified when acceleration structures are created and at runtime. A record contains two data regions, header and data. SBT record packing is handled automatically by using the SbtRecord generic struct:

no_run

use cust::prelude as cu;
use optix::prelude as ox;

#[derive(Copy, Clone, Default, cu::DeviceCopy)]
struct HitgroupSbtData {
    object_id: u32,
}

type HitgroupRecord = ox::SbtRecord<HitgroupSbtData>;
let rec_hitgroup: Vec<_> = (0..num_objects)
    .map(|i| {
        let object_type = 0;
        let rec = HitgroupRecord::pack(
            HitgroupSbtData { object_id: i },
            &pg_hitgroup[object_type],
        )
        .expect("failed to pack hitgroup record");
        rec
    })
    .collect();

Ray payload

The ray payload is used to pass data between optixTrace and the programs invoked during ray traversal. Payload values are passed to and returned from optixTrace, and follow a copy-in/copy-out semantic. There is a limited number of payload values, but one or more of these values can also be a pointer to stack-based local memory, or application-managed global memory.

Primitive attributes

Attributes are used to pass data from intersection programs to the any-hit and closest-hit programs. Triangle intersection provides two predefined attributes for the barycentric coordinates (U,V). User-defined intersections can define a limited number of other attributes that are specific to those primitives.

Buffer

NVIDIA OptiX 7 represents GPU information with a pointer to GPU memory. References to the term “buffer” in this document refer to this GPU memory pointer and the associated memory contents. Unlike NVIDIA OptiX 6, the allocation and transfer of buffers is explicitly controlled by user code.

Acceleration Stutures

NVIDIA OptiX 7 acceleration structures are opaque data structures built on the device. Typically, they are based on the bounding volume hierarchy model, but implementations and the data layout of these structures may vary from one GPU architecture to another.

NVIDIA OptiX 7 provides two basic types of acceleration structures:

Geometry acceleration structures - Built over primitives (triangles, curves, or user-defined primitives)
Instance acceleration structures - Built over other objects such as acceleration structures (either type) or motion transform nodes. Allow for instancing with a per-instance static transform

Traversing the Scene Graph

To determine the intersection of geometric data by a ray, NVIDIA OptiX 7 searches a graph of nodes composed of acceleration structures and transformations. This search is called a traversal; the nodes in the graph are called traversable objects or traversables.

The following types of traversable objects exist:

An instance acceleration structure
A geometry acceleration structure (as a root for graph with a single geometry acceleration structure (see “Traversal of a single geometry acceleration structure”)
Static transform
Matrix motion transform
Scaling, rotation, translation (SRT) motion transform

For transformation traversables, the corresponding transformation applies to all descendant child traversables (the sub graph spanned by the child of the transformation traversable). The transformation traversables should only be used in case of motion as applying transformations to geometry is order dependent and motion transformations are time dependent. Static transformations are available as they cannot be merged with any motion transformation due to time-dependency, but should be merged with instance transformations (if desired as the child of an instance) or any other static transformation (i.e., there should be at most one static transformation following a motion transformation). For example, Figure 2.2 combines both types:

![Figure 2.2 - Traversables graph][traversables_graph]

OptiX uses handles as references to traversable objects. These traversable handles are 64-bit opaque values that are generated from device memory pointers for the graph nodes. The handles identify the connectivity of these objects. All calls to optixTrace begin at a traversable handle.

Ray tracing with NVIDIA OptiX 7

A functional ray tracing system is implemented by combining four components as described in the following steps:

Create one or more acceleration structures over one or many geometry meshes and instances of these meshes in the scene. See Acceleration structures.
Create a pipeline of programs that contains all programs that will be invoked during a ray tracing launch. See “Program pipeline creation”.
Create a shader binding table that includes references to these programs and their parameters and choose a data layout that matches the implicit shader binding table record selection of the instances and geometries in the acceleration structures. See “Shader binding table”.
Launch a device-side kernel that will invoke a ray generation program with a multitude of threads calling optixTrace to begin traversal and the execution of the other programs. See “Ray generation launches”. Device-side functionality is described in “Device-side functions”.

Ray tracing work can be interleaved with other CUDA work to generate data, move data to and from the device, and move data to other graphics APIs. It is the application's responsibility to coordinate all work on the GPU. NVIDIA OptiX 7 does not synchronize with any other work.

Implementation Principles

Error Handling

All OptiX functions return a return code, which is converted to a Rust Result<T, OptixError>. You can also set a logging callback with DeviceContext::set_log_callback to have OptiX report additional information.

Functions that compile also return a String containing additional messages for warnings and errors.

Stateless Model

Given the same input, the same output should be generated. GPU state is not held by NVIDIA OptiX 7 internally.

In NVIDIA OptiX 7 functions, a CUDA Stream is associated with the CUDA Context used to create the DeviceContext. Some API functions take a Stream as an argument. These functions incur work on the device and require that the CUDA Context associated with the DeviceContextis the current context when they are called. Applications can expect the CUDA Context to remain the same after invoking NVIDIA OptiX 7 functions.

Asynchronous Execution

Work performed on the device is issued on an application-supplied CUDA Stream using asynchronous CUDA methods. The host function blocks execution until all work has been issued on the stream, but does not do any synchronization or blocking on the stream itself.

Function Table initialization

You must call optix::init() in order to load the function symbols from the OptiX library in the driver before calling any other functions.