packages/chip/docs/architecture-optimization/software-ci.md
Firmware boot claims require OpenSBI/U-Boot or equivalent source, build logs, device-tree handoff, boot transcript, and failure-mode evidence.
Android BSP claims require external AOSP tree logs, vendorimage output, checkvintf, SELinux neverallow/build logs, CTS/VTS intake, and virtual-device or target smoke transcripts.
Benchmark claims require real tool execution, calibrated metadata, model artifacts, power/thermal context, parsed metrics, unsupported op count, and CPU fallback percentage. Dry-run reports stay blocked.
CI gates must preserve fail-closed behavior: missing tools, missing external trees, and missing hardware evidence produce blocked status instead of inferred pass status.
Compiler claims require the stack defined in
docs/toolchain/llvm-trunk-pin.md and
docs/toolchain/autofdo-propeller-bolt.md.
The full stack is:
LLVM trunk (pinned SHA, RVA23U64 baseline)
+ RVV 1.0 intrinsics + ThinLTO
+ AutoFDO (-fprofile-sample-use=...)
+ Propeller (lld --symbol-ordering-file=... --no-keep-text-section-prefix)
+ BOLT (llvm-bolt --reorder-blocks=ext-tsp --reorder-functions=hfsort+
--split-functions --split-all-cold)
+ Machine Function Splitter (-fsplit-machine-functions, in-tree)
+ CFI defaults: -fcf-protection=full (Zicfilp / Zicfiss)
-fstack-clash-protection
-fstack-protector-strong
-fsanitize=shadow-call-stack
Spectre/SLS mitigations under Linux 6.19+ cost 5-10% in tight loops on RISC-V; the 12-18% raw uplift narrows to a 5-10% net win for security-on builds. Plan for the cost; do not disable the mitigations.
docs/evidence/compiler/llvm-build-evidence.yamldocs/evidence/compiler/iree-backend-evidence.yamldocs/evidence/compiler/executorch-evidence.yamldocs/evidence/compiler/autofdo-evidence.yamldocs/evidence/compiler/baseline-profile-evidence.yamldocs/evidence/compiler/quantization-evidence.yamldocs/evidence/compiler/rva23-compliance.yamldocs/evidence/compiler/aosp-branch-pin.yamlThe MLIR/IREE elizanpu dialect at
compiler/iree-eliza-npu/ is the only
production NPU codegen path. The Python "lowering smoke" at
compiler/runtime/e1_npu_lowering.py
is the test oracle, not the codegen path.
ExecuTorch is the PyTorch entry; LiteRT / TFLite is the second entry via NNAPI / AIDL HAL. Both lower through the elizanpu IREE backend.
Five formats target the elizanpu dialect, calibration toolkit at
compiler/quantization/ (PTQ INT8, AWQ INT4,
GPTQ INT4 fallback, FP8 E4M3, 2:4 structured sparse INT4, INT2 BitNet).
compiler/llvm-build/llvm-pin.json.compiler/iree-eliza-npu/iree-pin.json.compiler/executorch-eliza/executorch-pin.json.compiler/aosp/manifest.xml (BLOCKED until
Google's RVA23 Tier 1 branch is stable).Dockerfile UBUNTU_DIGEST.compiler/llvm-build/Dockerfile (derives from the
main container, adds lld + ccache + lit + libxml2/libzstd dev headers).docs/toolchain/riscv64-cross-host.md); the
canonical compiler environment is the Linux container built from this
repo's Dockerfile.The compiler stack pins must agree with every downstream consumer. This table is the single registry: any new consumer of a pinned SHA goes here and must reference the same value as the source-of-truth file. When a SHA is refreshed, every row must be updated in lockstep in the same commit.
| Pin | Source of truth | Current value (as of 2026-05-19) | Downstream consumers |
|---|---|---|---|
| LLVM trunk | compiler/llvm-build/llvm-pin.json | de3ee84346d6dcf77ac20fe5c8acc95594886cbc | benchmarks/cpu/spec/manifest.json::compiler_target.compiler_pin_commit_sha, benchmarks/cpu/coremark/manifest.json::build_target.primary_compiler.pin_commit_sha, docs/evidence/compiler/llvm-build-evidence.yaml::pinned_sha, scripts/build_llvm_riscv.sh (40-char hex validator) |
| IREE | compiler/iree-eliza-npu/iree-pin.json | d9a3dd15a552cdded3bda4fcfa65f1341d2b5f92 | docs/evidence/compiler/iree-backend-evidence.yaml::pinned_sha, scripts/build_iree_eliza_npu.sh (40-char hex validator), compiler/iree-eliza-npu/CMakeLists.txt (target backend selection) |
| ExecuTorch | compiler/executorch-eliza/executorch-pin.json | 5eb84927cb9380f2a56d1f39f28d799dd7573254 | docs/evidence/compiler/executorch-evidence.yaml::pinned_sha, compiler/executorch-eliza/backend/__init__.py |
| AOSP RISC-V platform | compiler/aosp/manifest.xml | BLOCKED_AOSP_RISCV_BRANCH_SHA_PENDING_UPSTREAM_TIER1 | docs/evidence/compiler/aosp-branch-pin.yaml, scripts/check_rva23_compliance.py::rva23.aosp_branch_pin |
| NPU C ABI hash | compiler/iree-eliza-npu/runtime/eliza_npu_runtime.h | sha256:75fef5a82295a5584dae44cb9d6ac145d2d2d6c90f1c3765fc70c2452ed5c6a5 (recomputed by scripts/check_compiler_versions.py) | compiler/runtime/e1_npu_runtime.py (Python oracle), rtl/npu/e1_npu.sv (AXI-Lite decode), compiler/iree-eliza-npu/tests/test_runtime_mmio_parity.py (drift sentinel) |
Cross-domain integration tests:
compiler/iree-eliza-npu/tests/test_descriptor_parity.py — 1280-case parity
test (16 opcodes × 4 offsets × 4 byte counts × 2 owner flags, with bounds
skip) covering the descriptor word-0 packing between Python oracle and C
runtime.compiler/iree-eliza-npu/tests/test_runtime_mmio_parity.py — 136 cases
covering register addresses, opcode values, DESC_STATUS bits, DESC_FLAG
bits, and constants across Python / C header / SystemVerilog RTL decode.compiler/quantization/tests/test_awq_int4_mlp_e2e.py — 5 cases wiring
the AWQ INT4 calibrator to a 2-layer MLP that fits the bounded GEMM_S4
prototype window (M,N <= 3, K <= 7, 64-byte scratchpad).compiler/autofdo-harness/coremark_roundtrip.sh — AutoFDO end-to-end
capture + reapply on CoreMark, BLOCKED until the LLVM stage-2 toolchain
is built. Produces build/reports/compiler/coremark-autofdo/coremark-autofdo-delta.json.benchmarks/compiler/autovec/kernels.{c,json} — 30 RVV autovec kernels;
scripts/run_rvv_autovec_suite.py --stock-clang=/usr/bin/clang writes
build/reports/compiler/autovec-trunk-vs-stock.{json,md} with geomean
delta between the trunk pin and the apt-installed clang.The e1 NPU MMIO contract has three encodings that must agree:
compiler/runtime/e1_npu_runtime.py::E1NpuRuntime — Python oracle. The
class constants are the canonical byte addresses (OP_A = 0x10020000,
RESULT_HI = 0x10020018, ...) and the canonical bit layouts
(DESC_FLAG_VALID_OWNER = 1<<31, DESC_STATUS_WRITEBACK_UNSUPPORTED
= 1<<7, ...).compiler/iree-eliza-npu/runtime/eliza_npu_runtime.h — C ABI. Every
register offset declared as ELIZA_NPU_REG_* and every opcode declared
as ELIZA_NPU_OP_* is matched against the Python oracle by
test_runtime_mmio_parity.py.rtl/npu/e1_npu.sv — AXI-Lite address decode. The SV case statements
use word indices (6'h00 through 6'h2C). Each Python byte address
maps to the SV word index byte_offset // 4. The parity test
enforces this with a regex check.A drift between any pair fails CI immediately at the parity test layer, before any LLVM build or IREE lowering is even attempted. This is the cheapest possible "contract drift detector" — no compilation required.