python/2_CoreConcepts/processCheckpoint/README.md
This sample demonstrates how to use the CUDA process checkpoint API
via cuda.core.checkpoint.Process to suspend, capture, and restore the
CUDA state of a running Linux process.
CUDA process checkpointing is the driver-level primitive that powers
CRIU + cuda-checkpoint integration.
The sample:
lock → checkpoint → restore → unlock.The sample prints the CUDA process state after each step so the full state machine is visible:
lock() checkpoint() restore() unlock()
running ---------> locked ------------> checkpointed -----------> locked ---------> running
cuda.core.checkpoint.Process for the current process
by PID and observing its .state transitions.lock → checkpoint → restore → unlock cycle with
a lock timeout.restore() leaves the process in the locked state;
you must still call unlock() to return to running.cuda.core
checkpoint.Process wrapper.cuda.bindings
cuMemcpyDtoH.cuda.core.checkpointcheckpoint.Process(pid) - create a handle to a CUDA process by
PID. Accepts os.getpid() for the self-checkpoint case shown
here.Process.state - one of "running", "locked", "checkpointed",
or "failed".Process.lock(timeout_ms=…) - block further CUDA API calls on the
process; completes already-submitted work. Always pass a non-zero
timeout to avoid deadlocks.Process.checkpoint() - copy device memory to host-side driver
allocations and release GPU resources. Process state becomes
checkpointed.Process.restore(gpu_mapping=None) - re-acquire GPU resources and
copy memory back to device. Leaves the process in the locked
state.Process.unlock() - return the process to running.Process.restore_thread_id - thread ID that restore() must be
called from in the target process (not used in the self-checkpoint
case here).cuda.coreDevice.set_current() / Device.memory_resource.allocate(...) /
Stream, LaunchConfig, Program, launch - standard device,
compile, and launch primitives used to produce the buffer
contents.cuda.bindings.drivercuMemcpyDtoH(host_ptr, device_handle, nbytes) - synchronous D2H
copy into a pageable host buffer.cuda-core >= 1.0.0.Install the required packages from requirements.txt:
cd /path/to/cuda-samples/python/2_CoreConcepts/processCheckpoint
pip install -r requirements.txt
python processCheckpoint.py
python processCheckpoint.py --buffer-mib 512
python processCheckpoint.py --device 1
--device CUDA device ID (default: 0)
--buffer-mib GPU buffer size in MiB (default: 16)
--lock-timeout-ms Timeout passed to Process.lock in ms (default: 5000)
On an RTX 4090 with a 16 MiB buffer:
[Process Checkpoint Sample using CUDA Core API]
PID: 748330
Device: NVIDIA GeForce RTX 4090
Compute Capability: sm_89
Buffer size: 16 MiB
Lock timeout: 5000 ms
Compiling kernel ...
Writing deterministic pattern to GPU buffer ...
Buffer hash (before): b045f7975dc23352
Running checkpoint lifecycle on self ...
step duration (ms) state after
--------------------------------------------------
initial - running
lock 0.578 locked
checkpoint 268.369 checkpointed
restore 235.024 locked
unlock 1.648 running
--------------------------------------------------
total 505.618
Buffer hash (before): b045f7975dc23352
Buffer hash (after): b045f7975dc23352
PASS: GPU buffer contents survived checkpoint/restore.
Done
What to look for:
running → locked → checkpointed → locked → running. Note that restore()
leaves the process in locked, not running.--buffer-mib visibly increases the checkpoint time.lock and unlock steps are essentially free (sub-ms) - they
just flip the process state.Exact timings vary with GPU model, driver version, system load, and the size of the device memory footprint being captured.
processCheckpoint.py - Python implementation using cuda.core.checkpointREADME.md - This filerequirements.txt - Sample dependenciesNVIDIA/cuda-checkpoint
r570-features.c, r580-migration-api.c).