cpp/4_CUDA_Libraries/libcuxxMdspan/README.md
This sample demonstrates two mdspan-centric features CCCL: DLPack <-> cuda::std::mdspan bridging via cuda::to_device_mdspan / cuda::to_dlpack_tensor (the tensor-interchange protocol used by PyTorch, JAX, CuPy, and others), and cuda::shared_memory_mdspan for multi-dimensional views of shared-memory tiles with address-space-safe accessors. A small matrix is built, wrapped in a DLTensor, converted to a device_mdspan, scaled row-wise, and transposed through a shared_memory_mdspan tile. The output mdspan is converted back to DLPack and its metadata is printed.
CCCL 3.3, libcu++ mdspan, DLPack Interoperability, Shared Memory Views
SM 7.0 SM 7.5 SM 8.0 SM 8.6 SM 8.9 SM 9.0 SM 10.0 SM 11.0 SM 12.0
Linux, Windows
x86_64, aarch64
cuda::to_device_mdspan, cuda::to_dlpack_tensor, cuda::device_mdspan, cuda::shared_memory_mdspan, cuda::std::mdspan
cudaMalloc, cudaFree, cudaMemcpy, cudaMemset, cudaDeviceSynchronize, cudaGetDeviceProperties
CCCL 3.3+, DLPack 1.2+. Both fetched automatically via CPM at configure time (pinned to v3.3.3 and v1.3 respectively). Override with -DCCCL_SOURCE_DIR=/path/to/cccl and -DDLPACK_SOURCE_DIR=/path/to/dlpack to use local checkouts.
Download and install the CUDA Toolkit for your corresponding platform. Make sure the dependencies mentioned in Dependencies section above are installed.
CCCL 3.3 release notes, cuda::to_device_mdspan header, cuda::shared_memory_mdspan docs, DLPack specification