python/2_CoreConcepts/memoryResources/README.md
This sample demonstrates the cuda.core memory management model: a
MemoryResource owns a pool of memory and hands out Buffer objects that
can be passed to kernels, copied between resources with
Buffer.copy_to(), and viewed as NumPy or CuPy arrays through DLPack. The
script exercises three common resources side-by-side:
DeviceMemoryResource - device-local GPU memory. Every Device
exposes a default pool via Device.memory_resource, and applications
can create additional pools explicitly.PinnedMemoryResource - page-locked host memory, used here as the
input and output staging buffers around a GPU kernel (the canonical
pinned-H2D / compute / pinned-D2H pattern).ManagedMemoryResource - unified memory that the driver migrates
between host and device on demand; host views see the GPU's writes
without an explicit copy.The same scale_and_bias kernel runs on each resource and every result is
verified on the host.
DeviceMemoryResource, PinnedMemoryResource, and
ManagedMemoryResourceBuffer objects from a resource with a bound streamBuffer.copy_to()Buffer via DLPackclose(stream) semanticscuda.core - Pythonic access to CUDA runtime, programs, and memory resourcescupy - GPU array views of device buffersnumpy - host array views of pinned and managed bufferscuda.coreDevice.memory_resource - default memory pool attached to a deviceDeviceMemoryResource, PinnedMemoryResource, ManagedMemoryResource - allocate buffers of the corresponding memory kindMemoryResource.allocate(nbytes, stream=...) - returns a BufferBuffer.copy_to(dst_buffer, stream=...) - async, stream-ordered copyBuffer.close(stream) - stream-ordered deallocationBuffer supports __dlpack__ for zero-copy viewscp.from_dlpack() / np.from_dlpack() - zero-copy array view of a Buffercuda_samples_utilsprint_gpu_info() - print device name and compute capabilitycuda-python 13.x)cuda-python (>=13.0.0)cuda-core (>=1.0.0)cupy-cuda13x (>=14.0.0)The ManagedMemoryResource demo in this sample exercises concurrent host
access to managed allocations while the GPU is active, which requires the
device property concurrent_managed_access=True. This is only supported on
Linux with HMM (Pascal and newer). On Windows (WDDM/MCDM/TCC) the property
is False, so the sample exits early with a waive message and exit code
2. The DeviceMemoryResource + PinnedMemoryResource demos in this
sample would still work on Windows on their own, but to keep the sample
self-contained the entire script waives when concurrent managed access is
unavailable.
Install the required packages from requirements.txt:
cd /path/to/cuda-samples/python/2_CoreConcepts/memoryResources
pip install -r requirements.txt
The requirements.txt installs:
cuda-python (>=13.0.0)cuda-core (>=1.0.0)cupy-cuda13x (>=14.0.0)cd cuda-samples/python/2_CoreConcepts/memoryResources
python memoryResources.py
# Larger buffer size
python memoryResources.py --elements 1048576
# Use a specific GPU
python memoryResources.py --device 1
Device: <Your GPU Name>
Compute Capability: <X.Y>
[1] DeviceMemoryResource + PinnedMemoryResource (staging)
Pinned staging, device kernel, and copy_to verified
[2] ManagedMemoryResource (unified memory)
GPU writes observed directly through the host-visible mapping
[3] Explicit DeviceMemoryResource
Explicit DeviceMemoryResource allocation verified
All memory resource demos passed.
Note: Device name and compute capability will vary based on your GPU.
memoryResources.py - Python implementation using cuda.core memory resourcesREADME.md - This filerequirements.txt - Sample dependencies../../Utilities/cuda_samples_utils.py - Common utilities (imported by this sample)cuda.core memory APIcuda.core example: memory_ops.pycuda.core example: memory_pool_resources.py