python/1_GettingStarted/vectorAdd/README.md
Run your first GPU kernel: add two vectors element-wise on the GPU using the cuda.core API with runtime compilation.
cuda.core for device management, programs, and launchescuda.core — Pythonic access to CUDA runtime and compilationcupy — GPU array library for Pythoncuda.coreDevice — Initialize and manage CUDA deviceProgram — Create program from kernel source codeProgramOptions — Set compilation options (C++ standard, architecture)LaunchConfig — Configure kernel launch parameterslaunch — Execute kernel on specified streamImport stable symbols from the top-level package (not cuda.core.experimental). See the cuda.core documentation.
cp.random.rand() — Generate random arrays on GPUcp.empty() — Allocate uninitialized GPU arrayscp.allclose() — Verify results with tolerancecuda_samples_utilsverify_array_result() — Verify computation resultscuda-python 13.x)cuda-python (>=13.0.0)cuda-core (>=1.0.0)cupy-cuda13x (>=14.0.0)Install the required packages from requirements.txt:
cd /path/to/cuda-samples/python/1_GettingStarted/vectorAdd
pip install -r requirements.txt
The requirements.txt installs:
cuda-python (>=13.0.0)cuda-core (>=1.0.0)cupy-cuda13x (>=14.0.0)cd samples/python/1_GettingStarted/vectorAdd
python vectorAdd.py
# Custom vector size
python vectorAdd.py --elements 1000000
# Use specific GPU
python vectorAdd.py --device 1
# Skip verification for benchmarking
python vectorAdd.py --no-verify
[Vector addition using CUDA Core API]
Device: <Your GPU Name>
Compute Capability: sm_<XX>
Compiling kernel 'vectorAdd<float>'...
Kernel compiled successfully
[Vector addition of 50000 elements]
CUDA kernel launch with 196 blocks of 256 threads
Verifying result...
Test PASSED
Done
Note: Device name and compute capability will vary based on your GPU.
vectorAdd.py — Python implementation using cuda.core APIREADME.md — This filerequirements.txt — Sample dependencies../../Utilities/cuda_samples_utils.py — Common utilities (imported by this sample)