cpp/4_CUDA_Libraries/cubDeviceSegmentedScan/README.md
This sample demonstrates cub::DeviceSegmentedScan. A segmented scan computes an independent scan over each of many contiguous segments in a single device-wide call. Two operations are shown: ExclusiveSegmentedSum across three independent segments, and InclusiveSegmentedScan with a custom binary operator (running maximum via cuda::maximum<>).
CUB Device Algorithms, Segmented Scan, Prefix Sum
SM 7.0 SM 7.5 SM 8.0 SM 8.6 SM 8.9 SM 9.0 SM 10.0 SM 11.0 SM 12.0
Linux, Windows
x86_64, aarch64
cub::DeviceSegmentedScan::ExclusiveSegmentedSum, cub::DeviceSegmentedScan::InclusiveSegmentedScan
cuda::maximum
cudaDeviceSynchronize, cudaGetDeviceProperties
CCCL 3.3+. Fetched automatically via CPM at configure time (pinned to v3.3.3). Override with -DCCCL_SOURCE_DIR=/path/to/cccl to use a local checkout.
Download and install the CUDA Toolkit for your corresponding platform. Make sure the dependencies mentioned in Dependencies section above are installed.