Back to Eliza

Eliza E1 2028 SOTA 14A Integration Shortlist

packages/chip/research/00_integration_shortlist.md

2.0.328.5 KB
Original Source

Eliza E1 2028 SOTA 14A Integration Shortlist

Date: 2026-05-19 Status: triage_complete_all_high_confidence_rows_landed Claim boundary: this is a research-triage planning document. Each row maps to implementation work tracked under the same packet's 03_implementation/ plan and the existing repo evidence gates. Nothing here promotes any silicon, boot, MLPerf, or PD claim.

Implementation snapshot (2026-05-19, both waves complete)

Every High-confidence shortlist row is now on develop. Each item carries an opt-in make target that fails closed when its evidence is missing.

ItemStatusArtifactValidator
A-1, A-2, A-4landeddocs/arch/npu.md + docs/spec-db/e1-npu-runtime-contract.json + docs/spec-db/npu-2028-roadmap.yaml (MX, group INT4, sparse-tile spec)make npu-runtime-contract-check
A-3 (BitNet ternary)landedRTL: rtl/npu/e1_npu.sv (dot16_ternary_mode_q, lane decode 00/01/10, reserved 11 rejected); cocotb: 5 cases in verify/cocotb/test_e1_npu.pymake cocotb-npu
A-5 (DMA writeback spec)landeddocs/arch/npu.md writeback semantics + descriptor_word_template + command_buffer_image + docs/spec-db/npu-2028-roadmap.yaml L1 TODOmake npu-roadmap-check
A-8 (perf counters)landedrtl/npu/e1_npu.sv PERF_STALL_CYCLES/PERF_SCRATCH_BYTES/PERF_THERMAL_THROTTLE + contract + check scriptmake npu-runtime-contract-check
B-1..B-5landedcompiler/runtime/{e1_npu_stablehlo,e1_npu_partitioner,e1_executorch_delegate,e1_litert_delegate}.py + e1_litert_delegate.h + CommandBuffer on e1_npu_runtime.py + 46 new pytest casesmake typecheck + pytest compiler/runtime
C-1..C-7landeddocs/spec-db/cpu-2028-target.yamlmake cpu-2028-target-check
D-1..D-9landeddocs/spec-db/memory-2028-target.yamlmake memory-2028-target-check
E-1..E-8landeddocs/spec-db/security-2028-target.yamlmake security-2028-target-check
F-1..F-5landeddocs/sw/{opensbi,u-boot,buildroot,linux}/README.md + sw/aosp-device/device/eliza/eliza_ai_soc/ skeleton + Makefile aosp-build-{preflight,riscv64}make aosp-bsp-check
G-1landedverify/formal/e1_*.sby + verify/formal/bpu/{ras,ftq}.sby — bitwuzla as second SBY engine alongside z3make formal
G-2 (cocotb-coverage)landedverify/cocotb/coverage_helpers.py + cover-points in 5 testbenches + scripts/check_cocotb_coverage.py + Makefile cocotb-coveragemake cocotb-coverage
G-3 (reset/CDC props)landedverify/properties/reset_properties.sv + cdc_properties.sv referenced from all .sbymake formal
G-4 (AXI-Lite props)landedverify/properties/axi_lite_protocol.sv + verify/formal/e1_axi_lite_{interconnect,dram}.sby + *_bind.svmake formal
G-5 (Accelergy/Timeloop)landedbenchmarks/sim/run_npu_timeloop.py + benchmarks/sim/configs/e1_npu_timeloop_arch.yaml + merge with SCALE-Simmake benchmark-sim-metrics (BLOCKED if Timeloop not installed)
G-6 (Hypothesis)landedbenchmarks/parsers/tests/test_parsers_hypothesis.py + scripts/test_check_cocotb_coverage_hypothesis.pyincluded in pytest run
G-7 (MLPerf Power schema)landeddocs/benchmarks/report-schema.yaml energy_joules_per_inference field + threaded through benchmarks/run_benchmarks.pymake benchmark-parser-test
H-1..H-4landedpd/openlane/config.sky130.json (DESIGN_REPAIR_MAX_SLEW_PCT=5, MAX_CAP_PCT=5, explicit FP_PDN_* topology, PSM + IR-drop enabled) + pd/signoff/run-manifest.schema.json (psm_ir_drop_report, pdn_topology, 8 tool-digest fields) + scripts/{check_pd_signoff.py,record_tool_digests.sh}make pd-signoff-manifest-check
H-5landedscripts/check_pd_utilization.py + pd/signoff/util_threshold.yamlmake pd-util-check
I-1..I-6landeddocs/spec-db/process-14a-effects.yaml (variant_requirements, library_variant_binding, reliability_derate_sources, sram_vmin_ecc_repair_plan, thermal_capture_phases, packaging_default)make process-14a-effects-check
J-1..J-6landedpackage/{display,pmic,usb-pd,charger,sensors,audio}/ + docs/board/{power-tree,pdn-budget,antenna-plan,thermal-stack}.md + board/kicad/e1-phone/ skeletonmake board-package-evidence-check
L-tiertrackedall 03_implementation/* Low-confidence items remain deferred to v1/v2n/a

Locally-verified gates (chip side)

After this wave the following all pass with no claim movement:

text
make lint                              PASS
make typecheck                         PASS
make docs-check                        PASS
make cpu-2028-target-check             PASS
make memory-2028-target-check          PASS
make security-2028-target-check        PASS
make npu-2028-target-check             PASS
make npu-runtime-contract-check        PASS
make npu-roadmap-check                 PASS
make process-14a-effects-check         PASS
make pd-util-check                     BLOCKED (no util JSON in any local OpenLane run — fail-closed)
make platform-contract-check           PASS
make project-plan-check                PASS
make prototype-status-dashboard-check  PASS

What stays BLOCKED (external dependencies, by design)

  1. Cuttlefish RV64 / AOSP boot transcript — Linux host + ~600 GB AOSP checkout + ART RV64 toolchain. Recipe lives in sw/aosp-device/build-aosp-riscv64.sh and Makefile aosp-build-{preflight,riscv64}.
  2. OpenSBI + U-Boot + Linux qemu-virt smoke — RV cross-toolchain + upstream trees. Recipes live in docs/sw/{opensbi,u-boot,buildroot,linux}/README.md.
  3. OpenLane silicon-class signoff — OpenLane 2 Docker + Volare PDK on disk. make openlane runs locally, takes hours.
  4. AOSP HAL evidence transcriptsdocs/evidence/android/*_smoke.log carries status=FAIL placeholders until real device boot transcripts are captured.
  5. MLPerf Mobile / MLPerf Power closed loop — L5/L6 evidence, requires fabricated silicon + Joulescope/Monsoon. Cannot exist pre-silicon.
  6. Foundry PDK selectionselected_process_option stays blocked_until_foundry_pdk_and_library_selection_from_shortlist; the shortlist covers TSMC N2P / A14, Samsung SF2, Intel 14A, Rapidus N2.

No chip claim has been promoted past its existing fail-closed status.

Goal

Build a chip that runs 2028 SOTA mobile AI models at the highest possible performance per watt in a 14A-class mobile process. Every shortlist item is screened against three filters:

  1. Useful? Does it move a numeric target in docs/spec-db/npu-2028-target.yaml, cpu-2028-target.yaml (to be authored), process-14a-effects.yaml, or docs/architecture-optimization/?
  2. Tractable in silicon? Is the change a small RTL delta, a spec-db contract update, or a verification harness extension we can execute now, versus a multi-year microarch program that only makes sense after Phase B?
  3. Power-per-watt benefit? Does the item reduce energy per inference, per token, per pixel, or per cycle for the modeled workload mix (LLM decode, transformer prefill, vision encoder, attention, KV-cache, framebuffer blits, camera pipeline)?

Items below carry a useful / tractable / benefit tuple plus the experiment list, the sub-agent owner, and the canonical files the work lands in.

Shortlist by subsystem

A. NPU datapath, opcodes, and tile architecture

IDItemUsefulTractableBenefitSource IDs
A-1OCP Microscaling (MXFP8/MXFP6/MXFP4/MXINT8) block-scale operand fetchYspec now, RTL L2Energy: MX block scale + low-precision FP → 2-3× perf/W vs INT8 GEMM on transformer prefill; aligns with Blackwell + Trilliumocp_mx_spec, mx_formats_paper, microxcaling_repo, ptq_mx_paper
A-2Group-scaled INT4 weights (W4A16) GEMM_S4_GS{32,64,128}Yspec + small RTLLLM decode dominant precision; cuts weight BW 4×, KV decode energy ~50%gptq_paper, awq_paper, omniquant_paper, hqq_repo
A-3BitNet ternary mode on DOT16_S2 (sign-flip + sum, no multiply)Ysmall RTL + cocotbHalves activation MAC energy; only viable INT2 path with deployed weights (BitNet b1.58, MediaTek NPU990)bitnet_b1_58_paper, bitnet_a4_8_paper, bitnet_2b4t_hf
A-4Tile-level 2:4 sparse INT4 GEMM (lifted from scalar SDOT4_S4_2_4)Ymedium RTLTrainium2 demonstrates 4× sparse-INT8 ratio; same pattern for INT4 doubles effective TOPS on pruned LLMssparsegpt_paper, wanda_paper, maskllm_paper, trainium2_aws_docs
A-5DMA writeback path (descriptor engine)Ymedium RTLBinding L1 phase gate; closes dma_trace_bytes_written / perf_counter_dma_bytes_written; no NPU can scale past scratchpad without itnvdla, mtia_v2_isca25, current docs/arch/npu.md
A-6FlashAttention-2-style streaming-softmax attention engine, INT8/FP8 KVYRTL L3 (planned)Eliminates O(N²) attention materialisation; mandatory for 2028 LLM-class context; KV BW dominates power on decodeflashattention2_paper, flashattention3_paper, fusemax_paper, spatten_paper
A-7Paged-KV block-table load pathYspec + small RTLConcurrent-context serving (npu-2028-target concurrent_contexts_min: 8) requires page indirection; also enables MLA + GQA + speculative decodingvllm_paged_attention, streamingllm_paper, h2o_paper, kivi_paper, deepseek_v2_mla
A-8Expanded perf counters (cycles, stall, SRAM BW, DMA BW, thermal throttle)Ysmall RTLRequired for power-per-counter attribution at L1/L2; closes the basic_performance_counters gap in npu-2028-target.yamlaccelergy_repo, timeloop_paper, MTIA papers

Verdict: A-1..A-3 + A-5 + A-8 implementable now (spec + RTL + cocotb). A-4 + A-6 + A-7 land later but get spec-db gates and integration plans now.

B. NPU compiler & runtime

IDItemUsefulTractableBenefitSource IDs
B-1StableHLO entry-IR canonicalisation for e1_npu_lowering.pyYnowConnects E1 to JAX / PyTorch export / LiteRT / IREE; replaces ad-hoc schemasopenxla_stablehlo, iree_repo, liteRT_blog
B-2ExecuTorch delegate skeletonYnowMobile PyTorch runtime; Exynos 2600 explicitly cites ExecuTorch; mandatory mobile pathexecutorch_repo, samsung_exynos_2600_page
B-3LiteRT (TFLite) delegate skeletonYnowLiteRT ingests StableHLO; shared internal compiler with B-1/B-2liteRT_blog, tflite_delegate_docs
B-4Descriptor-ring CommandBuffer runtime abstractionYnowBatched dispatch eliminates per-op MMIO sync; tracks IREE Stream dialect; prereq for B-5iree_stream_dialect, docs/arch/npu-microarch.md
B-5Partitioner with op-set + tile-bound tableYnowRequired to measure cpu_fallback_percent_max: 1 and unsupported_operator_percent_max: 1executorch_partitioner_docs, iree_repo
B-6Flash-Decoding split-K decode schedulingYmediumOn-device LLM decode is GEMV-shaped; split-K saturates tile fabricflashdecoding_paper, flashattention2_paper
B-7IREE backend as single compiler entry (HAL driver)YspecAvoid fragmenting compiler effort; declared software targetiree_repo, npu-2028-target.yaml#software_targets.compiler

Verdict: B-1..B-5 all implementable as new Python modules under compiler/runtime/. No RTL dependencies.

C. CPU subsystem & ISA

IDItemUsefulTractableBenefitSource IDs
C-1Author docs/spec-db/cpu-2028-target.yamlYnowSymmetry with NPU/process spec-db; gates Phase B selectionresearch/cpu_subsystem_2026 H1
C-2Pin RVV 1.0 as only accepted vector ISA (forbid RVV 0.7.1)YnowAndroid RV upstream requires RVA22U64+Vrvv_1_0_spec, rise_project
C-3Pin RVA22U64+V as Android baseline; RVA23 long-termYnowForecloses non-Android extension driftrva22_profile, rva23_profile
C-4Record Zicbom/Zicbop/Zicboz as required cache-maintenance ISAYnowLinux RV upstream DMA cache management uses these; replaces vendor CSRskernel.org RV cache-maintenance docs
C-5Add Ibex as named management/security hartYnowOpenTitan compatibility; aligns with security packet H1ibex_repo, opentitan_repo
C-6Track Saturn vector engine, BOOM Phase B, AIA/Sstc as deferred itemsYnowBring up plan without overcommittingsaturn_repo, boom_v4, aia_spec, sstc_spec
C-7Verification primitives list (Spike, Sail, RISCOF, riscv-formal, riscv-dv)YnowIndependent of core selection; survives a Rocket→BOOM swapspike_repo, sail_riscv_repo, riscof_repo, riscv_formal_repo, riscv_dv_repo

Verdict: all spec-db / docs work; no RTL changes. Implementable now.

D. Memory subsystem & coherent fabric

IDItemUsefulTractableBenefitSource IDs
D-1Replace AXI-Lite scaffold with TileLink-C (planning + spec-db)Yspec now, RTL L2+Coherent fabric is the binding gate on cache-coherent CPU submission, IOMMU isolation, SLC; AXI-Lite cannot scaletilelink_spec, chipyard_constellation, chi_e_spec
D-2LPDDR6 controller boundary spec (96-128 bit, 12.8-14.4 Gb/s, on-die ECC, link CRC, TRR+RFM)Yspec nowOnly path to ≥180 GB/s peak / 120 GB/s sustained; satisfies external_memory_bandwidth_gbps_min: 180jedec_lpddr6_pre_pub, samsung_lpddr5x_brief, sk_hynix_lpddr5t
D-3SMMU/IOMMU spec (per-master stream IDs: NPU CMD/DATA, GPU, display, camera, modem, audio)Yspec nowL3 gate iommu_isolated_command_buffers; baseline for confidential VMarm_smmuv3, riscv_iommu
D-432 MiB SLC bank spec (banked 4-8 ways, coherent)YspecCloses shared_system_cache_mib_min: 32; cache stash entry for NPU command submissionsram_2nm_isscc, tilelink_inclusive_cache, chi_e_spec
D-564 MiB NPU tiled SRAM spec (8-16 tiles, 4 MiB each, SECDED, ping/pong)YspecCloses local_sram_mib_min: 64 + local_sram_bandwidth_tbps_min: 20; weight-stationary throughput drivertsmc_2nm_sram_iedm2023, samsung_2nm_sram_isscc2024, eyeriss_v2_paper
D-6Compression-aware DMA spec (64-element block, INT8/INT4/INT2/FP8 modes)Yspec + medium RTL2-3× DRAM BW savings on ReLU-heavy feature maps; closes compression_aware_dmaafbc_arm, nvdla
D-7DRAM controller QoS classes (Isochronous/High/Normal/Best-effort)YspecCloses QoS_for_camera_display_audio_modem; required for sustained AI+camera+display contentionparbs_paper, atlas_paper, bliss_paper
D-8RowHammer policy (TRR + RFM + on-die ECC + link CRC counters)YspecReliability + security; aligns with M6 in security packetrowhammer_paper, jedec_rfm_prac
D-9Cache stash for CPU→NPU command submissionYRTL L2+Cuts CPU→NPU command latency by ~100 ns; closes cache_coherent_cpu_submissionchi_cache_stash

Verdict: D-1..D-8 all become spec-db updates now. RTL lands after CPU/AP Phase B.

E. Security / Root of Trust

IDItemUsefulTractableBenefitSource IDs
E-1Spec-db adoption of OpenTitan IP set (rom_ctrl, lc_ctrl, otp_ctrl, keymgr, aes, hmac, entropy_src/csrng/edn, Ibex sec-MCU)Yspec nowApache-2.0 silicon-proven IP; unblocks every BLOCKED row under docs/security/secure-boot-lifecycle-evidence.mdopentitan_rom_ctrl, opentitan_lc_ctrl, opentitan_otp_ctrl, opentitan_keymgr, opentitan_aes, opentitan_hmac, opentitan_entropy_src
E-2AVB 2.0 / libavb BL2 verifier specYspec nowAOSP-standard; rollback index landed in OTP partition; covers TC-BOOT-001…008libavb_repo, avb_2_0_spec
E-3Ed25519 verify on OTBN + SHA-256 via HMAC (no software-only crypto on boot path)Yspec nowConstant-time; OpenTitan reference programs already verifiedopentitan_otbn_ed25519, opentitan_hmac
E-4dm-verity hashtree + FEC for system/vendor/productYspec nowAOSP default; ~50 MB hashtree on 5 GB system fits BL2 budgetdm_verity_kernel_docs, fs_verity_kernel_docs
E-5ePMP/Smepmp on every hart + IOPMP on interconnect (deny-by-default)Yspec nowRatified RV standards; provides DMA isolation that the threat model assumesepmp_spec, iopmp_spec, smepmp_spec
E-6DICE / Open DICE measurement chainYspec nowKeyMint attestation root; ~1500 LOC Apache-2.0open_dice_repo, tcg_dice_spec
E-7Synthetic OTP for Sky130 prototype (clearly-labelled non-production)YnowUnblocks simulator transcripts without claiming production OTPresearch/security_2026 H8
E-8PQC verify (hybrid Ed25519 + ML-DSA-65) reserved header_version=2YspecHedges Ed25519; OpenTitan OTBN can run ML-DSA-65fips_204_ml_dsa, pqc_hw_paper

Verdict: all spec/docs items implementable now. RTL adoption is integration work after Phase B.

F. BSP / Linux / Android RV

IDItemUsefulTractableBenefitSource IDs
F-1OpenSBI 1.6 FW_DYNAMIC + U-Boot RV64 + Buildroot rv64gc qemu-virt smoke recipe (READMEs + capture scripts)YnowFirst real software-side execution path; unblocks F-2..F-5 chainopensbi_1_6, u_boot_rv64, buildroot_riscv64_virt
F-2aosp_cf_riscv64_phone template at docs/sw/aosp-device/device/eliza/eliza_ai_soc/Ynow (skeleton only)Closes the AOSP simulator-completion gate scaffold without claiming bootaosp_cuttlefish_rv64, vintf_spec
F-3libe1_npu + LiteRT delegate + ExecuTorch backend as canonical HAL storyYspecAligns Android NN path; NNAPI relegated to legacy compat onlyliteRT_blog, executorch_repo, aicore_android_16
F-4DT-only contract (no ACPI) declared in spec-dbYnowCloses any future ACPI ambiguity; matches mainline mobile RVkernel_riscv_dt
F-5SBI feature floor (v2.0 + Sscofpmf + DBCN + Sstc) recordedYnowReproducible OpenSBI buildsopensbi_sbi_3_0_draft, sscofpmf_spec, sstc_spec, dbcn_spec

Verdict: F-1..F-5 are docs/README + spec-db work. Actual qemu-virt runs need RV toolchain installed locally; we document the recipe and capture script paths so a future contributor can run them.

G. Benchmarks / simulators / formal

IDItemUsefulTractableBenefitSource IDs
G-1Add Bitwuzla as a second SBY engine across all .sbyYnowCloses Workstream E Bitwuzla gap; Boolector is EoM; second SMT solver catches different bugsbitwuzla_repo, boolector_eom_announcement
G-2cocotb-coverage + JSON merge stepYnowCloses "no coverage report" gap in Workstream A; per-block opcode/MMIO/IRQ/AXI cover-pointscocotb_coverage_repo
G-3Reset + CDC properties in verify/properties/YnowShort, high-catch-rate, currently absentverify/properties/ existing dir
G-4AXI-Lite open protocol properties + new .sby filesYnowWorkstream A names this explicitlyopen AXI-Lite property file refs
G-5Accelergy + Timeloop integration into NPU sim flowYnowEmits joules-per-inference column required by benchmark-matrix.mdaccelergy_repo, timeloop_paper
G-6Hypothesis-based property tests for parsers / check scriptsYnowReplaces example-based unit tests; better edge coveragehypothesis_python
G-7MLPerf Power-style integrated energy schema fieldYspecAdds energy_joules_per_inference with calibration metadatamlperf_power_spec

Verdict: G-1..G-6 are direct file changes in verify/, benchmarks/, scripts/. All implementable now.

H. Physical design / EDA

IDItemUsefulTractableBenefitSource IDs
H-1Tighten OpenROAD repair_timing / repair_design marginsYnowReduces 23k max-slew + 442 max-cap violations on 2026-05-19 release runopenroad_repair_timing_docs
H-2OpenROAD PSM static IR-drop step in OpenLane flow + signoff schemaYnowCloses the most-cited gap in physical-power-thermal.mdpsm_openroad_docs, physical_power_thermal_workorder
H-3Explicit PDN topology block in pd/openlane/config.sky130.jsonYnowAuditable PDN topology per runpdngen_openroad_docs
H-4Pin and record tool digests (OpenLane image, Volare PDK, KLayout/Magic/Netgen/OpenROAD/Yosys/ABC) per runYnowCloses Workstream E reproducibility blockerdocker_oci_spec
H-5Utilization regression gate (fail if util > 1.05)YnowPermanent fail-closed for the historical 771.788% incidentresearch/pd_eda_2026 H5
H-6(Tracked) AutoDMP / CircuitNet ML predictors as informational gatesYspecComparative baseline for macro placement when hard macros existautodmp_repo, circuitnet_2_0

Verdict: H-1..H-5 are direct config + script changes. Implementable now.

I. Process / packaging (spec-db only)

IDItemUsefulTractableBenefitSource IDs
I-1Keep frontside-PDN baseline + BSPDN as parallel variant in process-14a-effects.yaml; require IR/EM/thermal per variantYnowLocks the contract to the published foundry roadmap realityintel_powervia_vlsi2023, tsmc_super_power_rail, samsung_bspdn_iedm2023
I-2Bind NanoFlex/FinFLEX cell library variant selection in PD library manifestYnowCaptures cell-library DTCO choices the foundry exposestsmc_nanoflex, samsung_finflex
I-3Adopt nanosheet-specific reliability derates (BTI, self-heating, Mo/Ru EM)Ynow (spec)Replaces FinFET-era lifetime deratesbti_nanosheet_ted2023, self_heating_nanosheet_edl2024, em_advanced_beol_tdmr2024
I-4SRAM Vmin/ECC/repair plan: SECDED + bit-interleaving + repair-fuse policy + BISTYnow (spec)Closes sram_density_vmin_and_ecc blockertsmc_2nm_sram_iedm2023, samsung_2nm_sram_isscc2024, soft_error_advanced_node_iolts2024
I-5Thermal capture split: vapor-chamber transient vs post-saturation steady-stateYnow (spec)Sustained TOPS/W must come from post-saturation phasevapor_chamber_phone_review, aosp_thermal_hal
I-6Default monolithic die + InFO_oS memory-on-package; chiplet split is a separate variantYnow (spec)Locks the package contract; avoids CoWoS-class out-of-envelope plansintel_lunar_lake, snapdragon_x_elite, tsmc_info

Verdict: all spec-db / docs updates. No RTL or PD impact.

J. Mobile platform / board / package

IDItemUsefulTractableBenefitSource IDs
J-1package/display/v0-dsi-720x1280.yaml panel bindingYnowFirst concrete v0 panel evidence; mirrors package/wifi/ patternmipi_dsi_2, PinePhone Pro panel refs
J-2package/pmic/da9063.yaml + package/usb-pd/tps65987.yaml + package/charger/max77860.yamlYnowPMIC rail-to-power-island binding required before board layoutdialog_da9063, ti_tps65987, maxim_max77860
J-3docs/board/power-tree.md + pdn-budget.md + antenna-plan.md + thermal-stack.mdYnowCloses the explicit board-side blockers in phone-platform.mdresearch/mobile_platform_2026 H4..H7
J-4package/sensors/v0-sensors.yaml (BMI323 + BMP390 + AK09918 + TSL2591)YnowMainline-driven sensor BOMBosch/AK/AMS datasheets
J-5package/audio/v0-codec.yaml (Realtek/TI codec + Cirrus smart amp + Knowles PDM mics)YnowI2S + PDM bonded pins forecastRealtek/TI/Cirrus/Knowles datasheets
J-6KiCad 9 + IPC-2581 + kibot CI skeleton at board/kicad/e1-phone/YnowMirrors MNT Reform + PinePhone Pro reposkicad_9, ipc_2581_b, kibot_repo

Verdict: all yaml + docs work. No RTL or PD impact.

Items deferred (do not implement now)

These are real but premature for the current phase:

  • A-4 lifted tile-level 2:4 sparsity (waits for Phase B fabric / Gemmini wrapper).
  • A-6 full FlashAttention-2 attention engine (microarch L3).
  • A-9 (R-CIM-SLOT) CIM tile slot (waits for 14A PDK availability).
  • B-7 single IREE backend commitment (decision after B-1..B-5 land and exhibit pain).
  • D-1 actual TileLink-C RTL replacement (Phase B + Chipyard regen).
  • D-5 actual 64 MiB tile SRAM RTL (Phase B + foundry SRAM macro selection).
  • E-1 actual OpenTitan instantiation (Phase B + license accounting + DV).
  • F-1 actual qemu-virt boot capture (needs local RV toolchain run).
  • I-1..I-6 stay as spec-db-only; no PD change.
  • J-1..J-6 stay as planning yaml; no fabrication.

Implementation experiments by sub-agent

Each sub-agent below owns a non-overlapping path scope, may commit to the current develop branch, must respect packages/chip/CLAUDE.md and AGENTS.md, and must keep every claim evidence-backed.

Sub-agentPath scopeItems in scope
npu_rtl_opsrtl/npu/, verify/cocotb/test_e1_npu*, verify/verilator/test_npu*, compiler/runtime/e1_npu_runtime.py, docs/arch/npu.md, docs/spec-db/e1-npu-runtime-contract.json, scripts/check_e1_npu_runtime_contract.pyA-3 (BitNet ternary), A-8 (perf counters), A-5 (DMA writeback spec wiring)
npu_compilercompiler/runtime/, compiler/runtime/test_*B-1..B-5
npu_specdocs/spec-db/e1-npu-runtime-contract.json, docs/spec-db/npu-2028-target.yaml, docs/spec-db/npu-2028-roadmap.yaml, docs/arch/npu.md, docs/arch/npu-microarch.mdA-1 spec, A-2 spec, A-4 spec, A-6 spec, A-7 spec
cpu_specdocs/spec-db/cpu-2028-target.yaml (new), docs/arch/cpu-subsystem.md, docs/arch/linux-capable-cpu-contract.mdC-1..C-7
memory_specdocs/spec-db/memory-2028-target.yaml (new), docs/arch/memory-subsystem.md, docs/arch/interconnect.mdD-1..D-9 (spec-only)
security_specdocs/arch/security.md, docs/security/*.md, docs/spec-db/security-2028-target.yaml (new)E-1..E-8 (spec-only)
bsp_docsdocs/sw/opensbi/README.md, docs/sw/u-boot/README.md, docs/sw/buildroot/README.md, docs/sw/linux/README.md, docs/sw/aosp-device/device/eliza/eliza_ai_soc/F-1..F-5
bench_verifyverify/formal/*.sby, verify/properties/, verify/cocotb/, benchmarks/, scripts/check_cocotb_coverage.py, requirements.txtG-1..G-7
pd_flowpd/openlane/config.sky130.json, pd/signoff/run-manifest.schema.json, scripts/check_pd_signoff.py, scripts/check_pd_utilization.py (new)H-1..H-5
process_specdocs/spec-db/process-14a-effects.yaml, docs/manufacturing/, docs/arch/memory-subsystem.md, docs/arch/npu-microarch.mdI-1..I-6
platform_specpackage/display/, package/pmic/, package/usb-pd/, package/charger/, package/sensors/, package/audio/, docs/board/, board/kicad/e1-phone/ (skeleton only)J-1..J-6

Each agent must:

  1. Commit changes incrementally on develop per AGENTS.md git rules (no stash, no branch switching, no force-push).
  2. Run the relevant make target before its final commit: make lint, make typecheck, make docs-check, and the subsystem-specific check.
  3. Report what landed, what was deferred, and what was blocked.
  4. Stay inside the path scope above. If a change touches another scope, it is recorded as a follow-up item, not implemented in this pass.