python/Utilities/README.md
Common utilities for CUDA Python samples using the cuda.core API.
This module provides reusable utility functions for CUDA samples to reduce code duplication. Samples import from cuda_samples_utils.py using simple path-based imports (no package structure needed).
Install from the Python samples directory:
cd /path/to/cuda-samples/Python
pip install -r requirements.txt
This installs a common CUDA 13 stack (see python/requirements.txt):
cuda-python (>=13.0.0)cuda-core (>=1.0.0)cupy-cuda13x (>=14.0.0)numpy (>=2.3.2)Import utilities using path-based import:
import sys
from pathlib import Path
# Add Utilities directory to path
sys.path.insert(0, str(Path(__file__).parent.parent.parent / "Utilities"))
from cuda_samples_utils import verify_array_result
# Use the utility
if verify_array_result(result, expected):
print("Success!")
verify_array_result(result, expected, rtol=1e-5, atol=1e-8, verbose=True)Verify computed results match expected values. The helper detects whether both
arguments are NumPy arrays or both are CuPy arrays and uses the matching
library's allclose (no unnecessary cross-device transfers).
Parameters:
result: NumPy or CuPy array with computed resultsexpected: NumPy or CuPy array with expected values (same kind as result)rtol: Relative tolerance (default: 1e-5)atol: Absolute tolerance (default: 1e-8)verbose: Print test result (default: True)Returns:
True if results match within tolerance, False otherwiseExample:
expected = a + b
if verify_array_result(c, expected):
print("Computation correct!")
check_cuda_requirements()Check if required CUDA packages are available.
Returns:
True if requirements are met, False otherwiseExample:
if not check_cuda_requirements():
sys.exit(1)
These utilities focus on common operations that are not part of cuda.core API:
For CUDA operations like device initialization, kernel compilation, and grid size calculations, samples should use cuda.core API directly to demonstrate the proper usage patterns.
See ../1_GettingStarted/vectorAdd/vectorAdd.py for a complete example:
import sys
from pathlib import Path
# Import utility
sys.path.insert(0, str(Path(__file__).parent.parent.parent / "Utilities"))
from cuda_samples_utils import verify_array_result
import cupy as cp
from cuda.core import Device, Program, ProgramOptions, LaunchConfig, launch
# Use cuda.core directly for device and kernel operations
device = Device(0)
device.set_current()
program_options = ProgramOptions(std="c++17", arch=f"sm_{device.arch}")
program = Program(kernel_source, code_type="c++", options=program_options)
module = program.compile("cubin", name_expressions=("kernel_name",))
kernel = module.get_kernel("kernel_name")
# Calculate grid size inline
threads_per_block = 256
blocks_per_grid = (num_elements + threads_per_block - 1) // threads_per_block
# Launch kernel - pass cupy arrays directly
config = LaunchConfig(grid=blocks_per_grid, block=threads_per_block)
launch(stream, config, kernel, a, b, c, cp.int32(num_elements))
# Verify results using utility
verify_array_result(c, expected)