third_party/xla/docs/errors/error_2003.md
Category: Compile Time: Mosaic Unproven Memory Access Alignment
This error occurs when the compiler analyzes a memory access operation (such as
vector.load, vector.store, tpu.load, or tpu.store) and cannot
statically prove that the dynamic index used for a specific dimension is a
multiple of the required tiling size.
Sample error messages:
INTERNAL: Mosaic failed to compile TPU kernel: cannot statically prove that index in dimension 1 is a multiple of 128
at location: ...
The MLIR operation involved:
%14372 = "vector.load"(%14371, %93, %14363) : (memref<4x256xf32, #tpu.memory_space<vmem>>, index, index) -> vector<1x32xf32>
XLA backends: TPU
When your kernel loads or stores a vector, the memory address
(calculated from the base pointer plus the dynamic index) must align with the
vector's tiling size on the hardware. For example, if a dimension is tiled
by 128 elements, the dynamic index used to access it must be 0, 128, 256,
etc. Note that many operations (like vector loads and stores) have no such
requirements for static indices.
The compiler enforces this requirement using static analysis. It traces the history of the index variable back through the arithmetic operations that produced it (e.g., multiplications, additions). If the compiler cannot guarantee (at compile time) that the resulting value will always be divisible by the tiling size, it raises this error.
The compiler treats "proven misalignment" and "unknown alignment" identically.
So if you use an index that is mathematically guaranteed to be misaligned (e.g.,
i * 128 + 32), the compiler will raise the same error.
So this error can occur when
To resolve this error you have the following options:
If you know your index is valid but the compiler cannot prove it, use the
tpu.assume_multiple operation. This acts as a promise to the compiler that a
value is divisible by a specific factor.
In scenarios where the misalignment is intentional, instead of loading a small, unaligned vector segment:
These methods consume more memory but often simplify kernel logic and eliminate the need for manual alignment assertions.