CUDA Python Utilities

Common utilities for CUDA Python samples using the cuda.core API.

Overview

This module provides reusable utility functions for CUDA samples to reduce code duplication. Samples import from cuda_samples_utils.py using simple path-based imports (no package structure needed).

Installation Requirements

Install from the Python samples directory:

bash

cd /path/to/cuda-samples/Python
pip install -r requirements.txt

This installs a common CUDA 13 stack (see python/requirements.txt):

cuda-python (>=13.0.0)
cuda-core (>=1.0.0)
cupy-cuda13x (>=14.0.0)
numpy (>=2.3.2)

How to Use in Samples

Import utilities using path-based import:

python

import sys
from pathlib import Path

# Add Utilities directory to path
sys.path.insert(0, str(Path(__file__).parent.parent.parent / "Utilities"))
from cuda_samples_utils import verify_array_result

# Use the utility
if verify_array_result(result, expected):
    print("Success!")

Available Functions

Result Verification

`verify_array_result(result, expected, rtol=1e-5, atol=1e-8, verbose=True)`

Verify computed results match expected values. The helper detects whether both arguments are NumPy arrays or both are CuPy arrays and uses the matching library's allclose (no unnecessary cross-device transfers).

Parameters:

result: NumPy or CuPy array with computed results
expected: NumPy or CuPy array with expected values (same kind as result)
rtol: Relative tolerance (default: 1e-5)
atol: Absolute tolerance (default: 1e-8)
verbose: Print test result (default: True)

Returns:

True if results match within tolerance, False otherwise

Example:

python

expected = a + b
if verify_array_result(c, expected):
    print("Computation correct!")

Package Check

`check_cuda_requirements()`

Check if required CUDA packages are available.

Returns:

True if requirements are met, False otherwise

Example:

python

if not check_cuda_requirements():
    sys.exit(1)

Design Philosophy

These utilities focus on common operations that are not part of cuda.core API:

Result verification for NumPy or CuPy arrays
Package requirements checking

For CUDA operations like device initialization, kernel compilation, and grid size calculations, samples should use cuda.core API directly to demonstrate the proper usage patterns.

Complete Example

See ../1_GettingStarted/vectorAdd/vectorAdd.py for a complete example:

python

import sys
from pathlib import Path

# Import utility
sys.path.insert(0, str(Path(__file__).parent.parent.parent / "Utilities"))
from cuda_samples_utils import verify_array_result

import cupy as cp
from cuda.core import Device, Program, ProgramOptions, LaunchConfig, launch

# Use cuda.core directly for device and kernel operations
device = Device(0)
device.set_current()

program_options = ProgramOptions(std="c++17", arch=f"sm_{device.arch}")
program = Program(kernel_source, code_type="c++", options=program_options)
module = program.compile("cubin", name_expressions=("kernel_name",))
kernel = module.get_kernel("kernel_name")

# Calculate grid size inline
threads_per_block = 256
blocks_per_grid = (num_elements + threads_per_block - 1) // threads_per_block

# Launch kernel - pass cupy arrays directly
config = LaunchConfig(grid=blocks_per_grid, block=threads_per_block)
launch(stream, config, kernel, a, b, c, cp.int32(num_elements))

# Verify results using utility
verify_array_result(c, expected)

Benefits

Code Reuse: Write common functionality once
Consistency: All samples use the same patterns
Maintainability: Bug fixes benefit all samples
Transparency: Samples show cuda.core API usage directly
Simplicity: No complex package structure needed