Back to Eliza

Power Delivery SOTA — 2028 RISC-V Phone-Class AP

packages/chip/docs/architecture-optimization/sota-2028/power-delivery.md

2.0.321.1 KB
Original Source

Power Delivery SOTA — 2028 RISC-V Phone-Class AP

Sub-report of 2028-sota-integrated-report.md.

A. SOTA snapshot — mobile-class SoC PDN

A.1 PMIC / external regulator landscape

SoCProcessCompanion PMICPublic rail countNotes
Snapdragon 8 Elite Gen 5TSMC N3PPMK8550 / PM8550 family (multi-die set: 8550, 8550VS, 8550B, PMR735, PMX75, PM8010, plus separate Hexagon PMIC)~6-8 dies, 30-40 LDOs + ~12-16 SMPS bucksRPMh + RSC + VRM accelerators coordinate per-rail. Linaro DT binding qcom,rpmh-regulator lists discrete buck/LDO control.
MediaTek Dimensity 9500TSMC N3PMT6373 + MT6363 (+ MT6362 sub-PMIC)MT6363 ≈ 7 bucks + 4 VEMC + LDOs; MT6373 ≈ 4 bucks + 16 LDOsDual-PMIC partition. Some rails GPU/NPU-only fast-DVFS bucks; others AON / IO / analog LDOs.
Apple A19 ProTSMC N3PCustom Apple (no public part #; teardowns show ≥ 2 Apple PMIC + Dialog/STMicro for sub-systems)~12-18 primary rails on-die regulatedApple uses many fine-grained on-die LDOs per cluster; PMIC supplies pre-regulated mid-voltage rails (~1.0 V) that SoC post-regulates per domain.
Samsung Exynos 2600Samsung SF2 (2 nm GAA)S2MPS27 + S2MPB02-class~8-10 SMPS + 20+ LDOs across pairI²C / SPMI control.
Google Tensor G5TSMC 3 nmReused Samsung S2MPS-class~10-14 primary railsUpstream DT google,gs201-power-domain hints at coarse layout.

A.2 On-die regulator and droop sensor practice

  • Intel FIVR (4th-gen Core, 22 nm, 2013) — 140 MHz multi-phase buck with package-trace inductors; on-die MIM caps. 80 MHz unity-gain BW.
  • Intel MIA + FIVR2 (10th-gen, 2020) — magnetic inductor array on-package.
  • Distributed digital LDOs — 28 nm distributed dLDOs published with ~100 mV droop / 500 mA load / settling <20 ns. Newer 22 nm computational dLDOs target 10 A class for big cores with sub-20-ns transient.
  • Adaptive clocking for droop tolerance — IBM POWER9 and AMD 28 nm x86-64 publish full ACS systems: clock stretches on droop detection within 1-3 cycles, recovers within tens of cycles.
  • Apple specifics largely undocumented in primary literature, but iPhone-class chips widely understood (via patents and teardown) to use per-cluster LDOs and adaptive clocking — Apple holds multiple droop-detector / supply-droop-compensation patents (USPTO 10145868, 10320375, 10749513, 11397444).

A.3 Backside power delivery (BSPDN / PowerVia / SPR)

  • Intel PowerVia (production, 18A in 2025) — first production BSPDN. Internal E-core test vehicle with > 90% utilization shows: >30% platform voltage droop improvement, 6% frequency benefit at iso-voltage, looser frontside metal pitch reducing lithography cost.
  • TSMC A14 (production 2028) — first version is frontside-PDN only for mobile/client. Separate A14 + SPR (Super Power Rail) = informally "A12" with BSPDN ships in 2029.
  • Samsung SF2 has BSPDN roadmap; Exynos 2600 reported on SF2 without BSPDN.
  • Thermal trade-off: BSPDN puts power TSVs through bulk, removing thermal contact between active devices and substrate top side. SemiEngineering and IRDS 2024 flag higher local Tj because heat travels through BEOL of carrier rather than directly out back. Cooling at package and board must compensate.

A.4 On-die decap density

  • TSMC SHPMIM caps for N2: > 2× density vs SHDMIM, 50% lower sheet/via resistance.
  • TSMC iCAP (CoWoS interposer DTC) — 340 nF/mm².
  • Mobile-class on-die decap rule-of-thumb (teardown analysis): ~5-10× total Cdecap-to-Cload in package + die combined, with deep-trench MIM contributing the majority of high-frequency response (1-100 MHz); package + board caps cover 10 kHz-10 MHz.

A.5 Mobile SoC power envelope (public + measured)

SoCSustained TDP (independent throttle)Peak powerTypical Tj
Snapdragon 8 Elite Gen 5~6.5-7.5 W (Galaxy S26 thermal)~12-14 W peak burst95-110 °C
Dimensity 9500~6 W ("56% NPU peak power down" claim)~11 W peak95-105 °C
A19 Pro~6 W iPhone 17 Pro; Tom's Guide 15.5 h battery~10 W peak95-100 °C
Exynos 2600Provisional; early reviews show poor sustained vs peak~12 W peak (S26 tests)95-110 °C

B. Current state in packages/chip

AspectStateEvidence
PMICNone. No vendor, no IP, no board placement.docs/architecture-optimization/physical-power-thermal.md
Rail count2 (VDDCORE @ 1.8 V, VDDIO @ 3.3 V) for Sky130 demo pad ringpackage/e1-demo-pinout.yaml
On-die LDOs / IVRNonegrep returns 0
DVFS controllerNone. Only narrative in docs/arch/linux-capable-cpu-contract.md
Droop sensorsNone
AVFS / adaptive clockingNone
BSPDNfrontside_power_delivery_until_specific_bspdn_option_is_selected in docs/spec-db/process-14a-effects.yaml. Two variants planned: frontside_pdn_a14_class, backside_pdn_or_super_power_rail_follow_on
Decap strategyOpenLane defaults on Sky130 demo. No package-level or DTC plan.
IR-drop signoffOpenROAD irdrop.rpt only: VPWR worst 87.6 µV @ TT 25 °C 1.8 V; VGND 105.98 µV — 5.5 mW SkyWater 130 nm demo. Not mobile-class workload.
EM signoffNot produced
UPF / IEEE 1801Not authored
Power-management firmwareNot started. No SBI MPxy, RPMI client, SCMI server
Total budget5.5 mW (OpenLane demo) vs 4.57 W modeled in soc-optimized-operating-point.yaml

Gap: design-document level only on PDN; ~830× scaling distance between OpenLane demo current (3 mA on VDDCORE) and 2028 phone-class draw (~5 A across all core rails at ~0.7 V).

C.1 Process anchor

  • Primary release: TSMC A14 (frontside PDN), HVM 2028, 2nd-gen GAA nanosheet.
  • Stretch/follow-on: A14 + SPR ("A12") BSPDN variant, HVM 2029 — opt-in DTCO swap (separate pd/openroad/ config), not critical path. Budget shows +6% perf and 30% droop reduction if thermal mitigation closes.
  • Backup: Samsung SF2P (BSPDN) — second-source.

C.2 Rail topology — 16 rails (production 2028 SKU)

Aligned with soc-optimized-operating-point.yaml's 2-core CPU + NPU + memory layout, scaled to phone-class. Budget targets, not measurements.

#RailNominal V (TT)DVFS rangePeak I (A)Avg I (A)RegulatorDomain
1VDD_CPU_BIG0.70 V0.55-0.95 V3.51.0Ext buck + on-die dLDO/core2× big OoO cores @ 3.2 GHz base
2VDD_CPU_LITTLE0.65 V0.50-0.85 V1.50.4Ext buck + on-die dLDO4× little in-order
3VDD_NPU0.70 V0.55-0.90 V2.51.7Ext buck + on-die dLDO44 TOPS @ 1.2 W
4VDD_GPU0.70 V0.55-0.90 V2.00.6Ext buckFramebuffer + future GPU
5VDD_SOC_FABRIC0.75 V0.65-0.85 V1.20.5Ext buckNoC, IOMMU, system cache
6VDD_SRAM0.80 V0.70-0.90 V1.50.6Ext buckAll on-die SRAM
7VDD_LPDDR_VDDQ0.50 Vfixed0.80.5Ext buckLPDDR5X IO
8VDD_LPDDR_VDD11.80 Vfixed0.30.15Ext LDOLPDDR5X array
9VDD_LPDDR_VDD2H/2L1.05/0.50 Vfixed0.50.3Ext buckLPDDR controller
10VDD_PHY_ANALOG0.85 Vfixed0.40.2Ext LDOLPDDR PHY analog
11VDD_AON0.75 Vfixed0.050.02Ext LDO + on-die retention LDOAON island, RTC, mgmt
12VDD_PMC0.80 Vfixed0.10.05Ext LDOPower-mgmt RISC-V (Ibex)
13VDD_IO_181.80 Vfixed0.50.2Ext buckGPIO, audio, sensor IO
14VDD_IO_333.30 Vfixed0.20.1Ext buckSlow IO, eMMC fallback
15VDD_USB_PHY / PCIe0.85 / 1.20 Vfixed0.30.1Ext LDOUSB 3.x + PCIe Gen4 PHY
16VDD_RF_REF1.80 Vfixed0.20.05Ext LDOWiFi/BT analog ref

Sum: ~5.0 W peak, ~3.5 W sustained at 95 °C Tj, ~1.0 W idle. Matches soc-optimized-operating-point.yaml (max 4.57 W modeled).

C.3 On-die regulator strategy

  • Per-core dLDO on big-CPU and NPU. Target <20 ns droop response, ~5% drop @ full step.
  • No FIVR-class buck on-die in v0 — area cost too high for first open mobile SoC. Switched-capacitor 2:1 SC-DC in v1 if pd/openlane supports cap density.
  • AON retention LDO in always-on island to hold mgmt-core state during deep sleep (S3-equivalent).

C.4 Adaptive clocking + droop sensing

  • Droop sensor per voltage domain — ring-oscillator-based, <1 ns droop detect, sampled at 200 MHz. Reference: 22 nm all-digital ADCD (Bowman/Tokunaga).
  • Clock stretcher per CPU/NPU core, 1-cycle response. Implementation: programmable phase-blender.
  • AVFS loop — closed-loop voltage tuning driven by in-situ timing margin monitors (canary FFs); 100 µs update; 6.25 mV voltage delta.

C.5 Decap budget

  • On-die: SHPMIM-class (A14 equivalent) deep-MIM, target 150 nF/mm² average. Hot-rail CPU/NPU islands target 5× ICs/Iavg ratio = ~250 nF on 12 mm² die area-stretch.
  • Package: 0.1-10 µF cap bank between BGA balls, ≥40 caps for 5 W SoC.
  • Board: bulk 22-100 µF tantalum/MLCC near each external buck; ≥4 high-frequency 100 nF MLCCs per ball pair on each core rail.

C.6 Power-management firmware stack

S-mode Linux  --SBI MPXY (sysbus mailbox)-->  M-mode OpenSBI
                                                  |
                                                  | RPMI v1.0
                                                  v
                                          Eliza Power-Mgmt Core (Ibex-32)
                                                  |
                                                  |  SPMI / I2C / RPMSG-equivalent
                                                  v
                                              External PMIC set
  • RISC-V MPXY SBI extension (ratified SBI v3.0) + RPMI v1.0 (ratified) — drop-in equivalent to Arm SCMI; reuse Linux clk/regulator/cpufreq infrastructure with SBI MPXY mailbox drivers (merged for 6.x).
  • Power-mgmt core: Ibex-class RV32IMC, gated on AON, runs always-on. Owns: PMIC sequencing, DVFS table arbitration, thermal throttle policy, droop telemetry, secure-boot keys.
  • DVFS tables: per-corner (SS/TT/FF + 0/25/85/105 °C), generated at characterization from pd/signoff/sta/*.

D. Benchmarks / evaluation / testing

D.1 Pre-silicon

  1. Activity-traced power signoff — real workload VCDs (Geekbench-equivalent int trace, MLPerf Mobile NPU INT8, LLM-7B-INT4 token loop, sustained NPU CNN, idle, display refresh) through PrimePower (commercial) or OpenSTA + Capacitate (open) into pd/signoff/power.rpt. Replaces modeled numbers in soc-optimized-operating-point.yaml.
  2. Static IR-drop signoff — Voltus or RedHawk-SC at SS/TT/FF + 4 thermal corners, all 16 rails. Acceptance: <5% nominal Vdd droop.
  3. Dynamic IR-drop signoff — vector-driven dynamic with worst-case vectors per block (CPU integer burst, NPU GEMM saturation, CPU+NPU+display simultaneous). Acceptance: <10% droop, AVFS-corrected timing must close.
  4. EM signoff — foundry-mandated current density limits on all PG layers + clock/reset signal wires. Lifetime derate ≥10 years at 85 °C avg Tj.
  5. PDN impedance — Z_pdn(f) under 5 mΩ DC and <15 mΩ across 1 kHz-1 GHz. Resonance peaks tracked at package + board boundary.
  6. Anti-resonance + Ldi/dt — simulate worst slew (CPU NEON-equivalent saturation → idle in 1 cycle) with package + board model.

D.2 Post-silicon

  1. Power-virus workload — synthesize custom RTL "current bomb" (mprime + stressapptest + INT8-GEMM saturation analog) calibrated to peak modeled rail current.
  2. Sustained perf-vs-temp — 30-min runs at 25 °C ambient → measure Tj, throttle response, sustained Geekbench, MLPerf, LLM tok/s. Compare to:
    • A19 Pro sustained Geekbench multi (~8500 after ~10 min)
    • Snapdragon 8 Elite Gen 5 sustained Geekbench multi (~9200)
    • Dimensity 9500 sustained Geekbench multi (~8400)
  3. DVFS table tuning per silicon corner — characterize each chip at SS/TT/FF and bin into 3 voltage tables.
  4. Droop event capture — on-die telemetry of droop sensor events during workload transitions; expect <1 event/sec at production V/F.
  5. Skin-temperature correlation — chamber + free-air with phone-class enclosure; cross-reference benchmarks/power/manifests/e1-npu-sustained-capture.template.json.

E. Optimizations: has / should / needs

Has

  • Two-rail demo padframe with explicit IO/core separation.
  • Operating-point optimizer with corner sweep (make soc-optimization).
  • Process effects contract distinguishing FSPDN and BSPDN variants (docs/spec-db/process-14a-effects.yaml).
  • Local IR-drop reporting from OpenROAD on Sky130 demo.

Should (P1, 2028 target)

  • Per-domain power gating with retention FFs on caches/regfiles. UPF (IEEE 1801) for every island.
  • Per-cluster fast DVFS via on-die dLDOs (CPU big, NPU).
  • AVFS loop + droop sensors + clock stretchers.
  • Thermal-aware DVFS with on-die DTSs (≥8 sensors, one per power island).
  • SBI MPXY + RPMI power-management firmware on Ibex management core.
  • DTC + MIM on-die decap sized 5× Cload, per-rail allocation.

Definitely needs (P0, gates the chip)

  • Pick a PMIC vendor or design discrete bucks/LDOs. Open mobile-class PMIC IP does not exist publicly. Options:
    1. Buy Renesas/MPS/TI/Maxim mobile PMIC catalog parts and use 4-6 in parallel.
    2. License closed IP (Synaptics, Dialog) — slow, expensive.
    3. Custom analog design — requires analog team and separate older-node tapeout (realistic for v0: discrete PMIC daughtercard from 8-12 catalog regulators, hop to integrated for v1).
  • Authoritative rail list and UPF. Today's 2-rail padframe good only for Sky130 demo. 14A SKU must publish 16-rail map and freeze before RTL closes.
  • Power signoff EDA path. OpenROAD static IR-drop is triage. Buy Voltus or RedHawk-SC seats; gap to open EDA in dynamic IR/EM is years.
  • Activity-traced power — replace OpenLane metrics.json mW with VCD-driven power.rpt for: NPU INT8 GEMM saturation, CPU integer burst, idle, display refresh.
  • Package model with bond-wire / BGA inductance in PDN sim. Pad-frame R/L from pd/openlane/runs/.../padframe_inclusive_lvs must feed back into IR-drop.

F. Risks and open questions

RiskSeverityMitigation
No open mobile-class PMIC IP exists. Synopsys/Renesas/Maxim/TI/MPS catalog parts are closed; "open PMIC" in 2025 limited to academic ASICs and few RISC-V-controlled industrial parts (Silergy, Allwinner T536).Highv0 SKU uses 6-8 catalog buck/LDO ICs on daughtercard, controlled via I²C/SPMI by mgmt core. v1 internalizes.
A14 production 2028 ships without BSPDN. Mobile A14 is frontside-only; BSPDN ("A12") slips to 2029.MediumPlan FSPDN as primary release. Treat BSPDN as 2029 re-spin, not 2028 commitment.
Power signoff EDA is closed. OpenROAD lacks vector-driven dynamic IR-drop. Voltus / RedHawk-SC required for tapeout-grade.HighBudget for commercial EDA seats during signoff. Document open-EDA fallback gating release if Voltus unavailable: static-only IR + worst-case vectorless dynamic with 2× extra margin.
Droop response at >3.5 GHz requires fast custom loops. Public 22 nm dLDO numbers at lower clocks; mobile big-core 3.2 GHz + NPU peak switching events stress the loop.MediumMandate adaptive clocking (1-cycle stretch) so droop tolerance is not solely on regulator response.
BSPDN thermal penalty if 2029 variant. Active layer buried in BEOL of carrier; local Tj rises 5-10 °C at same power.MediumReserve 5 °C thermal headroom in BSPDN variant; require enclosure rework before that SKU.
SBI MPXY + RPMI is new — kernel drivers landed 2025 in 6.x, ABI settling.Low-MediumTrack Linux mainline carefully; pin OpenSBI release used at silicon bring-up; document fallback to direct PSCI-style calls.
No droop sensor IP today. All-digital droop detectors well-published but Eliza has none in RTL.HighAllocate one engineer-month to port public 22 nm-style ADCD design into our PDK; bind into rtl/power/.
PMIC-to-SoC interface (SPMI vs I²C vs RPMSG) — must be chosen before pad ring close.MediumStandardize on SPMI v2.0 for v0 (industry default), plus I²C fallback for bring-up board.
Decap density at 14A — actual DTC area cost competes with logic placement.MediumFloor-plan decap allocations in pd/openroad/ early; mark hot rails for DTC priority.

Concrete next moves (≤ 4 weeks)

  1. Author docs/pd/rail-plan-2028.yaml listing 16 rails, nominal V, DVFS range, peak/avg I, regulator type, decoupling target. Bind to pd/signoff/manifest.yaml.
  2. Add docs/pd/pmic-selection.md with 3 candidate paths (catalog-daughtercard / closed-IP / custom analog); pick v0 path.
  3. UPF skeleton in pd/upf/e1_soc_top.upf: 16 power domains, isolation cells, retention per island. UPF gate added to make pd-check.
  4. Stand up rtl/power/droop_sensor.sv + rtl/power/clock_stretcher.sv ports of public 22 nm designs; cocotb tests for droop event injection.
  5. scripts/check_pdn_workload_signoff.py — fails closed if pd/signoff/<RUN>/reports/ir_drop.rpt is not vector-driven, multi-corner, signed by Voltus/RedHawk OR explicitly waived with open-flow fallback margin.
  6. Wire soc-optimized-operating-point.yaml to rail-plan; gate operating-point report on rail-plan hash, so future modeled-power changes invalidate the claim.
  7. RFC: choose SBI MPxy + RPMI as power-management ABI. Open issue against docs/project/spec-rtl-sw-pd-handoff-work-order.yaml.

Sources