python/2_CoreConcepts/prefixSum/README.md
Demonstrates parallel prefix sum (scan) algorithms using cuda.compute with cuda.core stream management.
output[i] = [init_value] + input[0] + input[1] + ... + input[i]output[i] = init_value + input[0] + input[1] + ... + input[i-1]Stream.from_externalcuda-python (13.0.0+)cuda-core (>=1.0.0)cuda-cccl (1.0.0+)cupy-cuda13x (14.0.0+)numpy (>=2.3.2)# Create and activate virtual environment
python -m venv venv
source venv/bin/activate # Linux/macOS
# venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
# Run sample
python prefixSum.py
| Scan Type | Formula | First Element |
|---|---|---|
| Inclusive | output[i] = [init_value] + Σ input[0..i] | [init_value] + input[0] |
| Exclusive | output[i] = init_value + Σ input[0..i-1] | init_value (typically 0, the identity for sum) |
This sample demonstrates proper stream usage across libraries:
# Create stream with cuda.core
stream = device.create_stream()
# Wrap for CuPy compatibility (cuda.core Stream implements the __cuda_stream__ protocol)
cp_stream = cp.cuda.Stream.from_external(stream)
# Use with CuPy operations
with cp_stream:
d_input = cp.asarray(data)
d_output = cp.empty_like(d_input)
# Pass to cuda.compute
inclusive_scan(
d_in=d_input,
d_out=d_output,
op=OpKind.PLUS,
init_value=None,
num_items=len(d_input),
stream=stream,
)