examples/python/CuTeDSL/notebooks/composed_layout.ipynb
A Composed Layout is a powerful abstraction in CuTe that enables complex data transformations through the composition of layouts and transformations. It provides a flexible way to manipulate memory layouts and coordinate systems.
A Composed Layout consists of three key components:
Inner Layout/Transformation (inner):
Offset (offset):
Outer Layout (outer):
The mathematical composition of these components is defined as:
$ R(c) := (inner \circ offset \circ outer)(c) := inner(offset + outer(c)) $
Where:
To create a Composed Layout in Python, use the make_composed_layout function:
layout = cute.make_composed_layout(inner, offset, outer)
This example demonstrates how to create a Composed Layout with a custom transformation function. We'll create a simple transformation that:
(x, y)The example shows how to:
import cutlass
import cutlass.cute as cute
from cutlass.cute.runtime import from_dlpack, make_ptr
@cute.jit
def customized_layout():
def inner(c):
x, y = c
return x, y + 1
layout = cute.make_composed_layout(
inner, (1, 0), cute.make_identity_layout(shape=(8, 4))
)
print(layout)
cute.printf(layout(0))
customized_layout()
Gather and Scatter operations are fundamental data access patterns in parallel computing and GPU programming. In CuTe, we can implement these operations elegantly using Composed Layout.
A gather operation collects elements from a source array using an index array (also called an indirection array). It's defined as:
output[i] = source[index[i]]
offset_tensor)data_ptr)shape)The inner transformation function reads from the offset tensor:
def inner(c):
return offset_tensor[c] # Returns the gather index
The composed layout maps input coordinates through the offset tensor:
gather_layout = cute.make_composed_layout(inner, 0, cute.make_layout(shape))
This creates an indirect access pattern where:
i → offset_tensor[i] → data_ptr[offset_tensor[i]]notably, layout operations like slice, partition can still be applied on outer layout
The example code prints pairs of numbers i -> j where:
i is the output indexj is the gathered source index from offset_tensorThis demonstrates how the composed layout transforms coordinates for indirect memory access.
Note: Scatter operations (writing to indirect locations) can be implemented similarly by reversing the data flow direction.
import torch
@cute.jit
def gather_tensor(
offset_tensor: cute.Tensor, data_ptr: cute.Pointer, shape: cute.Shape
):
def inner(c):
return offset_tensor[c]
gather_layout = cute.make_composed_layout(inner, 0, cute.make_layout(shape))
for i in cutlass.range_constexpr(cute.size(shape)):
cute.printf("%d -> %d", i, gather_layout(i))
# TODO: support in future
# gather_tensor = cute.make_tensor(data_ptr, gather_layout)
# cute.printf(gather_tensor[0])
shape = (16,)
offset_tensor = torch.randint(0, 256, shape, dtype=torch.int32)
data_tensor = torch.arange(0, 256, dtype=torch.int32)
gather_tensor(
from_dlpack(offset_tensor),
make_ptr(cutlass.Int32, data_tensor.data_ptr(), cute.AddressSpace.generic),
shape,
)