Back to Eliza

Process Node SOTA — 2028 RISC-V Phone-Class AP

packages/chip/docs/architecture-optimization/sota-2028/process-nodes.md

2.0.320.6 KB
Original Source

Process Node SOTA — 2028 RISC-V Phone-Class AP

Sub-report of 2028-sota-integrated-report.md.

A. SOTA snapshot

The 2025-2028 leading-edge logic cohort consolidates four shifts: (1) FinFET → GAA / RibbonFET / MBCFET, (2) frontside → backside power delivery (PowerVia, Super Power Rail, SF2Z BSPDN), (3) NanoFlex / NanoFlex Pro DTCO cell mixing, (4) High-NA EUV adoption — only at Intel 14A; TSMC skips through A14.

NodeFoundryHVMTransistorBSPDNHigh-NAHD density (MTr/mm²)HD SRAM bitcell (µm²) / densityPerf-or-power gain vs priorWafer (300 mm)Lead customer
N3 / N3E / N3PTSMC2022 / 2023 / 2H 2025FinFETnono~215-220 (N3E HD)0.021 (≈ N5)N3P: ~5% perf or ~5-10% power vs N3E~$19.5-25 kApple A17/A18/A19, S8E Gen 4/5, D9400/9500, Tensor G5
N2TSMC2H 2025GAA NanoSheet (NanoFlex)nono313 (HD)0.021 (HD), 38.1 Mb/mm² macro~10-15% perf @ iso-power or ~25-30% power @ iso-perf vs N3E~$30 kApple (>50% of initial N2); NVIDIA, AMD follow 2026-2027
N2PTSMC2H 2026GAA NanoSheetnono~313+same as N2Modest lift over N2~$30-33 kApple A20-class 2026-2027, Qualcomm/MTK transition 2027
A16TSMC2027 (slipped from 2H 2026)GAA NanoSheetyes (Super Power Rail)no~1.07-1.10× N2Pminor8-10% perf @ iso-power or 15-20% power @ iso-perf, +7-10% density vs N2Pnot publicHPC / AI first (NVIDIA), then mobile
A14TSMC2028 (ahead of plan)2nd-gen GAA (NanoFlex Pro)A14 baseline frontside; A14P (SPR) 2029no (TSMC explicit)>1.20× N2 logicscaling resumed at N2 family+15% speed @ iso-power, or -30% power @ iso-speed, +20% logic density vs N2est. $40-45 kApple, NVIDIA, AMD; mobile mid-cycle 2028-2029
A14PTSMC20292nd-gen GAA + SPR BSPDNyesnoA14 + density gainminoradditional perf @ iso-power over A14 baselineest. $45 k+HPC, AI, then flagship mobile
A13 / N2UTSMC2029optical shrink of A14 / DTCO refresh of N2inheritsno~+6% over A14 (A13)minorN2U: +3-4% perf or 8-10% power, +2-3% densityTBDcost-down follow-on
18AIntelHVM Dec 2025RibbonFET GAAyes (PowerVia, 1st-gen)no238 (HD)competitive, less than N2PowerVia ~30% IR drop reduction, +6% Fmax, +5-10% std-cell utilisation vs frontside; ~10% perf / 25% power vs Intel 3privateIntel Panther Lake; foundry ramping
18A-PTIntel2026RibbonFET + 3D stackingyesnosimilarsimilarenables Foveros / hybrid-bond stackingprivateIntel HPC, foundry
14AIntelrisk 2027, HVM 2028+RibbonFET 2nd genyes, 2nd-gen PowerViayes — industry firstnot public; targets > N2/A16n/aIntel claim: ~15-20% perf @ iso-power, ~25-30% power @ iso-perf vs 18AprivateDARPA, US gov, hyperscaler diversification
SF3 / SF3PSamsung2022 / 2024GAA MBCFET (1st)nono~170-200 (est)~0.026yield issues; limited external uptakelower than TSMCExynos 2400/2500, internal
SF2 / SF2PSamsung2H 2025 / 2026GAA MBCFET (3rd)nono231 (HD)n/a public+25% power efficiency @ iso-clock vs SF3PcompetitiveExynos 2600 (Galaxy S26), Exynos 2800 on SF2P+
SF2ZSamsung2027GAA MBCFET + BSPDNyesnodensity lift via BSPDN cell-height shrinkn/afurther IR / powern/afoundry play vs TSMC A16
SF1.4Samsungdelayed 2028-2029, public 2029"Vertical-GAA"inheritsoptionalnot publicn/ade-prioritised for 2nm yield in 2025; slippedn/aExynos late-2029 flagship at earliest

Three SOTA observations:

  1. TSMC N2 HD density (313 MTr/mm²) is ahead of Intel 18A (238) and Samsung SF2 (231). But Intel 18A delivers PowerVia ~12 months before TSMC's first BSPDN node (A16), so 18A's perf/W is closer to N2/N2P than density alone suggests.
  2. SRAM scaling stalled at N3 (0.021 µm² bitcell, ~5% over N5) and only resumed at N2 — 38.1 Mb/mm² macro using same 0.021 µm² HD bitcell with reorganised assist. Cache-heavy mobile designs that scale through N3 do not gain SRAM area unless they go to N2+.
  3. High-NA EUV is not on TSMC's path to A14. TSMC stated repeatedly A14 ships without High-NA; Intel 14A is the only public node committing to High-NA, only in 2027-2028. Caps Intel-foundry capacity scaling on 14A in our 2028 window.

Reference 2025-class flagship dies:

  • Apple A19 Pro: 98.68 mm², P-core 2.97 mm² (5.49 mm² with L2+shared), E-core 0.78 mm² (2.22 mm² w/ L2), SLC 11.03 mm² — N3P.
  • Apple A19: ~81.9 mm² on N3P.
  • Snapdragon 8 Elite Gen 5: ~126.2 mm² on N3P.
  • Dimensity 9500: N3P, 1+3+4 ARM cores at 4.21 / 3.5 / 2.7 GHz.

B. Current state in packages/chip

  • pd/openlane/config.sky130.json points at sky130A PDK, sky130_fd_sc_hd, met5, 2500 × 2500 µm die, 100 ns clock. Real, runnable on open tooling, but 130 nm — three-four generations below mobile flagship, ~six below 2028 target.
  • docs/spec-db/process-14a-effects.yaml is a fail-closed planning contract: forbids any "14A tapeout ready" / "1.4 nm power/performance" / "Pixel-class 2028 efficiency" claim until pd/signoff/manifest.yaml, benchmarks/power/workload-plan.yaml, NanoSheet variability evidence, and frontside-vs-backside PDN tradeoffs are populated. Selected option blocked_until_foundry_pdk_and_library_selection.
  • research/alpha_chip_macro_placement/06_e1_notes/openlane_full_release_2026-05-19.md: 3.24 mm² die, 142,274 std-cells, 0 macros, clean DRC/LVS, 23,099 max-slew + 442 max-cap + small hold-TNS violations. No real SRAM/CPU/NPU hard macros.
  • docs/spec-db/competitor-2028-target.md sets envelope: 4-8 RV64GC Linux-capable cores, 16-24 GB LPDDR5X/6, 120-180 GB/s sustained, 16-32 MB SLC, 80 TOPS dense INT8 sustained / 160 TOPS peak, 64 MiB NPU SRAM.

Summary: Sky130 PD scaffold and complete claims-gate skeleton for a 14A target, but zero advanced-node access, zero qualified hard IP, zero LPDDR/USB/MIPI PHY, zero characterised SRAM macro at any target node, no Linux-capable RV64GC AP integrated (only the tiny CVA6 wrapper).

Primary: TSMC N2P (HVM 2H 2026, mobile-ready by 2028)

  • N2P has broadest 2028 mobile customer surface — Apple A20 / A21, Qualcomm flagship after Elite Gen 5, MediaTek post-D9500. Tool/IP/PHY ecosystem co-evolves with these customers.
  • HD density 313 MTr/mm² + 38.1 Mb/mm² SRAM — first node since N5 where SRAM scaling resumes. Mandatory for 64 MiB NPU SRAM + 16-32 MB SLC envelope.
  • Frontside power delivery — debug, thermal modelling, DFM tractable. BSPDN tax (warpage during HPA, thermal coupling through thinned silicon, two-sided test access) is real and adds 6-12 months bring-up risk.
  • Wafer ~$30-33 k/wafer is the cheapest 2 nm-class entry point.

Stretch: TSMC A14 (HVM 2028, baseline frontside) or Intel 14A (HVM 2027-2028, BSPDN+High-NA)

A14 baseline (no SPR) delivers +15% perf @ iso-power or -30% power @ iso-perf vs N2 with +20% logic density, without the BSPDN tax. The realistic 2028-flagship sweet spot if the project has Apple/NVIDIA-tier wafer allocation and willing-to-pay $40-45k pricing. A14P (with SPR) variant pushes to 2029.

Intel 14A is a strategic second source — Intel courts non-Apple customers for foundry diversification and is the only path to 18A-class PowerVia BSPDN in our 2028 window if foundry-level subsidy or government program participation is available. Process unproven for mobile AP class; hard-IP ecosystem (LPDDR PHY, MIPI, USB) much thinner at 14A.

Multi-process portability requirements

The PD flow must abstract PDK-specific assumptions into a single configuration surface:

  • pd/openlane/config.<node>.json per target (sky130, gf180, ihp-sg13, asap7-predictive, n2p-stub, a14-stub).
  • Per-node corner manifest: 5 PVT corners minimum at advanced node (SS/TT/FF × low/high V × extreme T) plus aging, EM, SI/IR; multi-Vt mix (LVT/SVT/HVT).
  • Per-node hard-IP manifest: SRAM compiler version, LPDDR PHY version, USB / MIPI PHY versions, PLL/PMIC.
  • Encapsulated cell-library swap so synthesis/place/route tooling differences (OpenLane → Cadence Innovus / Synopsys Fusion Compiler at advanced node) isolated to single adapter layer.

Hard-IP partnerships (process-matched, non-negotiable)

IP2028 requirementSource
LPDDR5X / LPDDR6 PHY+ctrl9600-10667 Mbps, 64-bitSynopsys DesignWare LPDDR5X (proven at 9600 on 3 nm; N2-ready), Cadence LPDDR6/5X (10.7 Gbps), Rambus
USB 3.2 / USB4 PHY20-40 GbpsSynopsys / Cadence / Rambus
MIPI D-PHY v3.x + C-PHY v2.x + DSI-2 / CSI-2flagship cameras + displaySynopsys / Mixel / Lattice
PCIe Gen4/5 PHYoptional, NVMeSynopsys / Cadence
Multi-port SRAM compilerup to 32 MB SLC, 64 MiB NPU localTSMC SRAM compiler at selected node (closed)
PLL / clockmulti-domain, low-jitterSynopsys / Cadence
Analog (PMIC, ADC, temp, eFuse)mobile-classfoundry reference + 3rd-party

Reticle / package / cost assumptions

  • Monolithic die assumed at 2028 mobile. CoWoS-class 3D stacked SLC (Apple-style fused cache stack) is stretch and out of cost envelope for open project.
  • Reticle limit at N2/A14 is ~858 mm² (26 × 33 mm) — well over a single mobile AP die.
  • Mobile AP die-area budget: 90-130 mm² for flagship envelope.
    • At N2 density (313 MTr/mm² HD logic, ~38 Mb/mm² SRAM):
      • Big OoO RISC-V (A19-Pro P-core equivalent): 2.5-3.5 mm² each
      • Little IoT/efficiency: 0.6-1.0 mm² each
      • NPU: 4-6 mm² compute logic + 64 MiB local SRAM at ~1.7 mm²/MiB = ~110 mm² for SRAM alone if naively flat — forces NPU memory hierarchy (small dense SRAM 8 MiB local + connection to 16-24 MB SLC and LPDDR)
      • SLC: 8-12 mm² for 16-24 MB
      • LPDDR PHY: 6-10 mm² (PHY does not scale with logic)
      • GPU (Imagination or RISC-V SIMT): 6-10 mm²
      • Modem, ISP, codecs, AON: 6-12 mm² combined
  • Mask + NRE: $40 M mask set at N2/A14 + design/verification/IP licensing → single-tapeout NRE $250-400 M for SoC-class, $542 M IBS-2018-style upper bound at 5 nm adjusted up for 2 nm. The economic blocker; only realistic vehicles are (a) hyperscaler/government anchor customer, (b) MPW shuttles (effectively closed at N2), or (c) partnership with existing high-volume customer.

D. Benchmarks / evaluation / testing

What we can do today (no advanced-node PDK access):

  1. DTCO sensitivity at open nodes. Same RTL through Sky130A, GF180MCU, IHP SG13G2 — confirms tooling portable. Already partly demonstrated.
  2. PPA modelling against ASAP7 predictive PDK. ASAP7 is academic-only, not manufacturable, but only open PDK with FinFET-era device physics; gives credible relative PPA scaling. Run e1 CPU+NPU+SLC through ASAP7, apply vendor scaling factors (N5 → N3 → N2) to project N2P-class envelope. Document as projections, not signoff.
  3. Process-variation Monte Carlo at open nodes. Sky130 has SS/TT/FF Liberty corners — characterise e1 RTL sensitivity to ±20% Vt shift and ±10% Vdd droop. Shapes translate to advanced nodes.
  4. Die-shot calibration. Compare projected block-area vs published die-shots (TechInsights, Locuza, Cardyak A19 Pro, AnandTech archive). A19 Pro P-core ≈ 2.97 mm² logic in N3P → at N2 density (313 / ~215 = ~1.45×) ≈ 2.04 mm²; RV64GC OoO with vector should land 2.5-3.5 mm² depending on issue width and L2.
  5. Multi-PDK signoff matrix. Sign off same RTL under Sky130A (open), GF180MCU (via Wafer.Space), IHP SG13G2 (Tiny Tapeout 2025/26 shuttle), ASAP7 (predictive). Shows physical-design discipline across PDKs before advanced-node access.
  6. Block-level evidence gates. Per block (CPU big, NPU, SLC, LPDDR PHY interface, MIPI), produce per-block PPA target with pd/signoff/manifest.yaml schema. Track four numbers: max-freq, area, dynamic-power-per-MHz, static leakage.

What we cannot do without commercial PDK:

  • Real signoff timing under foundry corners.
  • Real LPDDR / USB / MIPI / PCIe PHY layout.
  • DFM / antenna / fill at 2 nm.
  • BSPDN-aware PDN / IR analysis.

E. Optimisations: has / should / needs

Has

  • OpenLane Sky130A end-to-end flow, clean DRC / LVS at 130 nm, runnable release-mode.
  • Fail-closed claim gates: process-14a-effects.yaml, competitor-2028-target.md, pd/signoff/run-manifest.schema.json, OpenLane release-baseline doc — prevent unjustified 2028-class claims.
  • Minimal RV32 e1 datapath + AXI-lite interconnect + MMIO NPU + bootrom + interrupt controller + DMA.

Should (next 6-12 months, no advanced-node spend)

  1. CVA6 (or BOOM / XiangShan) integrated as actual application core in OpenLane release flow — not wrapper. Linux boot on QEMU/Renode/FireSim.
  2. ASAP7 predictive sign-off of CPU big core, NPU compute tile, small SLC slice.
  3. Real SRAM macros integrated into OpenLane — currently 0 macros. OpenRAM Sky130 → move same RTL to GF180/IHP for sanity.
  4. Multi-corner signoff manifest per existing pd/signoff/run-manifest.schema.json — populate SS/TT/FF corners with real Liberty data.
  5. NoC + IOMMU + cache-coherent fabric RTL — RTL-layer work not needing advanced PDK.
  6. NanoFlex-Pro-style cell-mix DTCO study. Sky130 has HD/HS variants (hd, hs, hdll, ms). Same block with cell swaps to characterise design-time tradeoffs NanoFlex Pro automates at N2/A14.

Definitely needs (foundry-wall items)

  • Foundry PDK access at N2P or A14. Realistic paths: CHIPS Act / DARPA programme (Intel 18A/14A has DARPA/RAMP-C subsidy for non-traditional customers); customer-of-record under hyperscaler or major IP vendor; multi-project shuttle at N5/N3 first (Efabless is closed; very few alternatives), then private MPW at N2.
  • Qualified hard IP for selected node: LPDDR5X/6 PHY, USB, MIPI, PLL, SerDes. Process-matched and re-licensed per shrink.
  • 5+ PVT signoff corners + multi-Vt cell mix.
  • BSPDN-aware sign-off methodology if A16 / A14P / 18A / 14A — adds two-sided power-grid extraction, thermal-coupling models for thinned silicon, two-sided DFM rules.
  • DFM, reliability (BTI, HCI, TDDB, EM), aging derate, scan/MBIST/boundary-scan, fuse policy, secure-debug lock — all enumerated in process-14a-effects.yaml; all require PDK.
  • Real package & PCB stackup model — FCBGA mobile substrate + thinned ~100 µm die, full thermal path (TIM, mid-plate, frame, skin), measured-not-modelled correlation.

F. Risks and open questions

  1. Foundry access is binary risk. TSMC N2 booked through 2027-Q2 with Apple holding >50% of initial wafers. Open RISC-V has no leverage at TSMC outside major customer relationship. Intel Foundry more open to non-Apple but unproven at mobile AP class.
  2. NRE economics. $250-400 M single N2/N2P tapeout, possibly $300-500 M at A14. Open-source funding models don't reach this scale.
  3. EUV / High-NA scarcity. Only ASML ships High-NA scanners; first units reserved for Intel and TSMC. Even if 14A nominally accessible, scanner-allocation isn't.
  4. BSPDN test/debug. Two-sided power changes probe/test access; boundary-scan, thermal-IR camera methods, rework all need new methodology. Multi-quarter learning curve even with PDK in hand.
  5. Hard-IP-availability-vs-node coupling. Synopsys LPDDR5X PHY shown at 3 nm and N2; LPDDR6 at N2P/A14 on roadmap, not all available. A project picking A14 in 2028 may be PHY-limited even with logic PDK.
  6. SRAM density realism. N3-family SRAM did not scale (bitcell 0.021 µm² ≈ N5). N2 resumes via macro-level density (38.1 Mb/mm²), not bitcell shrink, requiring redesigned assist circuitry. Cache/NPU-SRAM area estimate at N2P/A14 must use macro density, not bitcell shrink, or will under-budget area by 10-20%.
  7. Yield + defect-density curves. D0 at new 2 nm-class node in early ramp ~0.20-0.30/cm²; improves slowly. 100-130 mm² mobile AP die at D0 = 0.25 has yield 55-65% during 2025-2027 ramp.
  8. Open-silicon pipeline ends at Sky130/GF180/IHP SG13. 130 nm open-PDK frontier was real achievement. Not moving up the node ladder fast enough — Efabless shut down March 2025, taking ChipIgnite shuttle. Tiny Tapeout migrating onto IHP SG13G2 (130 nm) via SwissChips. ASAP7 exists as predictive academic PDK but not manufacturable. No path from open PDKs to flagship-class mobile AP that does not go through commercial foundry NDA.
  9. Software / Android side. Even with silicon, open RISC-V Android AP needs full CTS/GMS pass, kernel BSP, vendor HAL, GPU driver (open Mali / IMG / RISC-V SIMT), camera HAL, modem integration.

TSMC N2P primary, TSMC A14 stretch, Intel 14A strategic second-source / subsidy-driven option. Update docs/spec-db/process-14a-effects.yaml:

  • marketing_name becomes a range: "N2P / A14 / 14A class".
  • selected_process_option adds per-node short list with three nodes.
  • node_target.transistor_architecture stays nanosheet_or_successor_gate_all_around_required.
  • power_delivery_variant keeps frontside-vs-BSPDN bifurcation; default is frontside (matching N2P / A14-baseline) rather than implying BSPDN.

Sources