docs/cuda_plugin_ep/QUICK_START.md
To build ONNX Runtime with the CUDA Plugin Execution Provider instead of the statically linked CUDA EP, use the --cmake_extra_defines "onnxruntime_BUILD_CUDA_EP_AS_PLUGIN=ON" flag with the build script.
Example command to build the CUDA Plugin EP in Windows:
build.bat --cmake_generator "Visual Studio 17 2022" --config Release --build_wheel ^
--parallel --nvcc_threads 1 --build_shared_lib ^
--use_cuda --cuda_version "12.8" --cuda_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8" ^
--cudnn_home "D:\path\to\cudnn-installation-root" ^
--use_vcpkg --use_binskim_compliant_compile_flags ^
--cmake_extra_defines "CMAKE_CUDA_ARCHITECTURES=native" ^
--cmake_extra_defines "onnxruntime_BUILD_CUDA_EP_AS_PLUGIN=ON"
When the plugin is built, it will produce libonnxruntime_providers_cuda_plugin.so (or .dll on Windows) in the build output directory alongside libonnxruntime.so.
The plugin EP is registered under the name CudaPluginExecutionProvider and uses the EP Plugin API (RegisterExecutionProviderLibrary / GetEpDevices / SessionOptionsAppendExecutionProvider_V2). It is not a drop-in replacement for the in-tree CUDAExecutionProvider — you must register the plugin library, enumerate its devices, and add them to the session.
Use Env::RegisterExecutionProviderLibrary to load the plugin, Env::GetEpDevices to discover the CUDA devices it exposes, and SessionOptions::AppendExecutionProvider_V2 to add the selected device to the session.
#include "onnxruntime_cxx_api.h"
Ort::Env env(ORT_LOGGING_LEVEL_WARNING, "PluginTest");
// 1. Register the plugin library.
env.RegisterExecutionProviderLibrary("CudaPluginExecutionProvider",
ORT_TSTR("libonnxruntime_providers_cuda_plugin.so"));
// 2. Enumerate available EP devices and pick the CUDA plugin device.
auto ep_devices = env.GetEpDevices();
std::vector<Ort::ConstEpDevice> plugin_devices;
for (const auto& dev : ep_devices) {
if (std::string(dev.EpName()) == "CudaPluginExecutionProvider") {
plugin_devices.push_back(dev);
break; // use the first CUDA plugin device
}
}
// 3. Add the plugin device to session options.
Ort::SessionOptions session_options;
session_options.AppendExecutionProvider_V2(env, plugin_devices, {});
Ort::Session session(env, "model.onnx", session_options);
Use onnxruntime.register_execution_provider_library to load the plugin, onnxruntime.get_ep_devices to discover devices, and SessionOptions.add_provider_for_devices to add the selected device.
Device-based approach (recommended):
import onnxruntime as ort
# 1. Register the plugin library.
ort.register_execution_provider_library(
"CudaPluginExecutionProvider",
"libonnxruntime_providers_cuda_plugin.so",
)
# 2. Enumerate devices and pick the CUDA plugin device.
devices = ort.get_ep_devices()
plugin_device = next(d for d in devices if d.ep_name == "CudaPluginExecutionProvider")
# 3. Create session with the plugin device.
sess_options = ort.SessionOptions()
sess_options.add_provider_for_devices([plugin_device], {})
sess = ort.InferenceSession("model.onnx", sess_options=sess_options)
Provider-name approach:
You can also pass CudaPluginExecutionProvider by name in the providers list
(the plugin library must already be registered):
import onnxruntime as ort
ort.register_execution_provider_library(
"CudaPluginExecutionProvider",
"libonnxruntime_providers_cuda_plugin.so",
)
sess = ort.InferenceSession(
"model.onnx",
providers=[
("CudaPluginExecutionProvider", {"device_id": "0"}),
"CPUExecutionProvider",
],
)
The focused validation script for the CUDA Plugin EP is onnxruntime/test/python/transformers/test_cuda_plugin_ep.py.
onnxruntime_BUILD_CUDA_EP_AS_PLUGIN=ON.test_cuda_plugin_ep.py uses PyTorch for CPU-side reference computations, so CPU-only PyTorch is sufficient.Example dependency install:
python -m pip install numpy onnx
python -m pip install torch --index-url https://download.pytorch.org/whl/cpu
The test helper tries to auto-detect the plugin library from the installed wheel or a local build tree. If you have multiple builds or want to be explicit, set ORT_CUDA_PLUGIN_PATH to the plugin library produced by your build.
Linux example:
export ORT_CUDA_PLUGIN_PATH=/path/to/build/Release/libonnxruntime_providers_cuda_plugin.so
Windows example:
set ORT_CUDA_PLUGIN_PATH=E:\path\to\build\Release\Release\onnxruntime_providers_cuda_plugin.dll
Run the script from a directory outside the repository checkout to avoid Python module shadowing.
cd onnxruntime/test/python/transformers
python test_cuda_plugin_ep.py
On Windows:
cd /d onnxruntime\test\python\transformers
python test_cuda_plugin_ep.py
The script validates plugin registration, device enumeration, provider options, operator coverage, and that key nodes are actually assigned to CudaPluginExecutionProvider.
You can generate a parity report comparing the kernels available in the plugin EP versus the statically linked CUDA EP.
# Check runtime registry parity:
python tools/ci_build/cuda_plugin_parity_report.py --runtime --plugin-ep-lib build/Linux/RelWithDebInfo/libonnxruntime_providers_cuda_plugin.so