packages/chip/docs/toolchain/iree-eliza-npu.md
elizanpu IREE backend specificationelizanpu is the MLIR dialect that lowers StableHLO / linalg / ExecuTorch
graphs into the e1 NPU descriptor-ring runtime defined in
docs/spec-db/e1-npu-runtime-contract.json.
It replaces the Python "lowering smoke" at
compiler/runtime/e1_npu_lowering.py.
That file is now demoted to test oracle status: the parity test at
compiler/iree-eliza-npu/tests/test_descriptor_parity.py
re-encodes 290 descriptors through both the Python oracle and the C runtime
to guarantee identical output. Production codegen ships through this
dialect, not through Python.
| Op | Hardware basis | Compile-time check | Lower from |
|---|---|---|---|
elizanpu.acquire_ring | DESC_BASE/DESC_STATUS programming | none | iree_hal.command_buffer.begin |
elizanpu.tile_dma | descriptor word0 stream_to_scratch[8] + word1 source addr | scratch_offset/byte_count 32-bit aligned, sum <= 64 | linalg input tensor transfer |
elizanpu.submit_descriptor | submit_descriptors MMIO sequence | writeback_request == false, opcode [0, 15], scratch bounds | end of dispatch region |
elizanpu.gemm_s8 | GEMM_S8 = 8 + GEMM_CFG/GEMM_BASE/GEMM_STRIDE | M<=3, N<=3, K<=7, scratch fit | linalg.matmul (int8) after tiling |
elizanpu.dot4_s8 | DOT4_S8 = 4 | pure | packed INT8 dot in attention tiles |
elizanpu.dot8_s4 | DOT8_S4 = 7 | pure | packed INT4 dot |
elizanpu.dot16_s2 | DOT16_S2 (scalar contract only) | pure | INT2 BitNet (BLOCKED on RTL tensor path) |
elizanpu.dot4_fp8_e4m3 | DOT4_FP8_E4M3 (scalar contract only) | pure | FP8 E4M3 LLM (BLOCKED on RTL tensor path) |
elizanpu.sparse_sdot4_s4_2_4 | SDOT4_S4_2_4 | pure | 2:4 structured sparse INT4 |
elizanpu.vrelu | RELU4_S8 / VRELU_S8 | pure | elementwise ReLU in INT8 epilogue |
iree-compile \
--iree-hal-target-backends=elizanpu \
--iree-input-type=stablehlo \
--iree-elizanpu-default-precision=int8 \
model.mlir -o model.vmfb
Internally:
convert-linalg-to-elizanpu — decompose matmul / conv / attention into
tile-shaped gemm_s8 + tile DMA, plus CPU fallback for unsupported ops
(softmax, layer-norm, FP16 matmul).elizanpu-assign-scratch — concrete scratch_offset / byte_count
attribute assignment per dispatch region. Fails closed when 64-byte
budget is exceeded.elizanpu-legalize-ring — 8-entry descriptor-ring fragmentation.
Fails closed if any region submits more than 8 in-flight descriptors.elizanpu-emit-descriptor-table — final flatbuffer + linker symbol
pointing at eliza_npu_runtime_submit_descriptor_table.For dialect-level FileCheck testing (no IREE in tree):
make iree-build STAGE=standalone
# equivalently:
cmake -G Ninja -S compiler/iree-eliza-npu -B build/elizanpu-standalone \
-DELIZANPU_BUILD_STANDALONE=ON \
-DMLIR_DIR=$LLVM_STAGE2/lib/cmake/mlir \
-DLLVM_DIR=$LLVM_STAGE2/lib/cmake/llvm
ninja -C build/elizanpu-standalone elizanpu-opt
scripts/build_iree_eliza_npu.sh:
compiler/iree-eliza-npu/iree-pin.json
under external/iree/.compiler/iree-eliza-npu into the IREE tree at
compiler/plugins/target/elizanpu.-DIREE_TARGET_BACKEND_ELIZANPU=ON,
pointing MLIR/LLVM at build/llvm-stage2.The IREE-emitted code calls into the C ABI declared in
compiler/iree-eliza-npu/runtime/eliza_npu_runtime.h.
The ABI mirrors the Python oracle's submit_descriptors and
pack_stream_descriptor_word0 byte-for-byte; the parity test ensures any
divergence is caught at PR review time.
compiler/llvm-build/llvm-pin.json.compiler/iree-eliza-npu/iree-pin.json.packages/chip/Dockerfile UBUNTU_DIGEST.docs/evidence/compiler/iree-backend-evidence.yaml
fails closed unless every artifact in the gate file is present.