python/4_DistributedComputing/ipcMemoryPool/README.md
This sample demonstrates how to share GPU memory between Python
processes using CUDA Inter-Process Communication (IPC) and
cuda.core's IPC-enabled memory pools.
By default each process has its own CUDA virtual address space and
cannot see allocations made by another process. With an IPC-enabled
DeviceMemoryResource the parent allocates once, and the child
process maps that same physical GPU memory into its own address space
so both read and write the same bytes. The sample performs a
round-trip test:
DeviceMemoryResource and allocates
a Buffer.Buffer to a child process through an
multiprocessing.Queue. cuda.core's pickle reducers re-create the
memory resource and map the buffer in the child.DeviceMemoryResource with ipc_enabled=TrueBuffer objects across process boundaries via mp.Queuemultiprocessing must use the "spawn" start method with CUDAcuda.core - IPC-enabled memory resources and buffer reducerscupy - zero-copy views over the shared device memory via DLPackmultiprocessing - standard library process managementcuda.coreDeviceMemoryResource(device, options=DeviceMemoryResourceOptions(ipc_enabled=True)) - create an IPC-enabled memory poolDeviceMemoryResourceOptions(max_size=..., ipc_enabled=True) - configure the underlying poolmr.allocate(nbytes) - allocate a Buffer from the IPC poolBuffer.is_mapped - True when the buffer is usable in the current processDevice.properties.memory_pools_supported - runtime feature checkDevice.properties.handle_type_posix_file_descriptor_supported - runtime feature checkcuda_samples_utilsprint_gpu_info() - print device name and compute capabilitycuda-python 13.x)cuda-python (>=13.0.0)cuda-core (>=1.0.0)cupy-cuda13x (>=14.0.0)Install the required packages from requirements.txt:
cd /path/to/cuda-samples/python/4_DistributedComputing/ipcMemoryPool
pip install -r requirements.txt
The requirements.txt installs:
cuda-python (>=13.0.0)cuda-core (>=1.0.0)cupy-cuda13x (>=14.0.0)cd cuda-samples/python/4_DistributedComputing/ipcMemoryPool
python ipcMemoryPool.py
# Larger shared buffer
python ipcMemoryPool.py --elements 65536
# Use a specific GPU
python ipcMemoryPool.py --device 1
On platforms or devices that do not support CUDA IPC, the sample prints a diagnostic and exits cleanly with status 0.
Device: <Your GPU Name>
Compute Capability: <X.Y>
Created IPC-enabled DeviceMemoryResource (is_ipc_enabled=True)
Parent wrote pattern (first 5 values): [100. 101. 102. 103. 104.]
Parent sent buffer to child pid=<pid>; waiting...
[child pid=<pid>] received buffer: is_mapped=True, size=4096
Parent sees child's pattern (first 5 values): [-0. -1. -2. -3. -4.]
IPC round-trip: OK
Note: Device name, compute capability, and child PID will vary based on your system.
ipcMemoryPool.py - Python implementation using cuda.core IPC memory poolsREADME.md - This filerequirements.txt - Sample dependencies../../Utilities/cuda_samples_utils.py - Common utilities (imported by this sample)cuda.core memory APIcuda.core IPC tests: test_memory_ipc.py