docs/research/snap-wayland-gpu-fix-research.md
A subset of Super Productivity Snap users hit a GPU initialization failure on
launch where the app either (a) shows a tray icon with no window, (b)
segfaults, or (c) launches but floods logs with GL errors. The likely root
cause is Mesa ABI drift between Electron's bundled libgbm/Mesa stack and the
Mesa shipped by the gnome-42-2204 content snap's core22-mesa-backports
PPA. The December 2025 spike in user reports correlates with upstream
Chromium 140 (Aug 2025) / Electron 38 (Sept 9, 2025) flipping the default
--ozone-platform-hint to auto, so Electron now runs as a native Wayland
client in any Wayland session (detection via XDG_SESSION_TYPE=wayland).
This exposed the pre-existing Mesa ABI mismatch to far more users.
The recommended fix is to widen the existing Snap-gated --ozone-platform=x11
guard in electron/start-app.ts to cover Snap + Wayland sessions, not only
Snap with a missing/empty gnome-platform directory. This preserves hardware
acceleration via X11/GLX, stays inside electron-builder's snap target (no
snapcraft.yaml rewrite, no auto-connect review), and matches the empirical
breakage pattern reported for peer Electron apps on Snap + Wayland.
Long term, migration to core24 + gpu-2404 is the correct fundamental fix
and should be scheduled for 18.3 or 19.0.
High confidence on direction and on the upstream Electron/Chromium timing (see Section 9).
libgl1-mesa-dri is present in the content
snap."DRI driver not from this Mesa build" (snapcraft forum
#40975). Forum
#49173
reports a related mesa-core22 ABI breakage but with a different error
string ("Failed to initialize GLAD") — same root cause, different
symptom.gnome-42-2204's core22-mesa-backports PPA does
not reliably match the Mesa/libgbm ABI expectations of recent Electron
Chromium builds.package.json). SP subsequently downgraded to Electron 37.10.3 at
v17.0.0 (2026-01-23) and held that version until bumping to 41.2.0 on
2026-04-17 (one day before this doc was drafted). So the December 2025
reports originated on Electron 39 — which already inherits Chromium 140's
Wayland-auto default from Electron 38. The upstream trigger is Chromium
140 (Aug 2025) flipping --ozone-platform-hint=auto, inherited by
Electron ≥38 (with a regression window in 38.0.0/38.1.0 fixed by
electron/electron#48301;
users on electron-builder#9452
cite Electron ≥38.2.0 as the practical trigger). Combined with ongoing
mesa-backports churn, this exposed the ABI mismatch to many more Snap
users who had previously been silently running X11.| Population | Affected rate | Confidence |
|---|---|---|
| Snap + Electron with Wayland-default + Mesa GPU + Wayland session | ~95–100% | High |
| Snap + X11 | ~0–5% | High |
| Snap + Nvidia proprietary | Likely unaffected (uses nvidia EGL, not Mesa) | Medium |
| Non-snap (.deb, AppImage, AUR) | Unaffected | High |
The bug is conditional, not universal: Snap + Mesa + Wayland is the trigger combination.
Three observed modes:
Confirmed with nuance.
core22 has been announced; Canonical's documented
direction is "move to core24 + gpu-2404" (see the
Canonical RFC).
We did not find an explicit Canonical statement ruling out a
core22 Mesa-ABI fix — absence of engagement, not a formal position.graphics-core22 is not formally deprecated. Canonical's own wording
is that gpu-2404 is an "evolution" of graphics-core22 (per
canonical.com/mir/docs/the-gpu-2404-snap-interface).
Migration requires a base bump to core24, not an interface swap.--disable-gpu / --ozone-platform=x11 are community workarounds, not
endorsed.Verification note: Entries below were verified in a follow-up pass (2026-04-18) against peer-app source repos, GitHub issues, and Flathub/snapcrafters packaging. File:line citations linked where applicable.
| App | Approach | Verification |
|---|---|---|
| Signal Desktop (snap) | Community-maintained snapcrafters/signal-desktop snap: wrapper at snap/local/usr/bin/signal-desktop-wrapper defaults --disable-gpu ON unless user runs snap set signal-desktop enable-gpu=true. Upstream Signal has no snap packaging. | Verified (snapcrafters repo). |
| Mattermost Desktop (snap) | Community-maintained snapcrafters/mattermost-desktop: command-chain runs fix-hardware-accel-with-no-renderer; it probes glxinfo, and on llvmpipe match patches ${SNAP_USER_DATA}/.config/Mattermost/config.json with jq '.enableHardwareAcceleration = false'. | Verified (snapcrafters repo). |
| VS Code (snap) | No explicit X11 force. The snap crashes on Wayland (sandbox missing Mesa drivers / GLib schemas) and falls back to XWayland implicitly. See microsoft/vscode#202072. | Claim contradicted: outcome is X11, mechanism is not a wrapper. |
| electron-builder #9452 | Title: "Snap package of Electron ≥ 38 crashes at startup under GNOME on Wayland". Maintainer @mmaietta engaged; users andersk and valkirilov confirm --ozone-platform=x11 as the working workaround. Trigger identified as Electron ≥38.2.0. | Verified — strongest external reference. |
| Teams-for-Linux | Sets build.linux.executableArgs: ["--ozone-platform=x11"] and build.snap.executableArgs: [...] in electron-builder config; no afterPack wrapper. The snap-side setting is dead code per electron-builder#4587 — executableArgs is silently ignored for snap builds. | Claim partly contradicted: intended mechanism is executableArgs, which is broken on snap. |
| Obsidian (Flatpak) | Wrapper obsidian.sh probes for Wayland socket; adds --ozone-platform-hint=auto under Wayland, else --ozone-platform=x11; respects OBSIDIAN_DISABLE_GPU env var. Not snap, but illustrates the compositor+GPU-probe wrapper pattern. | Verified (flathub repo). |
What is solid: every peer Electron app with a Wayland/GPU workaround on
Snap uses either an X11 fallback or a GPU-disable; the only maintainer-
endorsed workaround (electron-builder #9452) converges on
--ozone-platform=x11. The dominant actually-working mechanism among
peer snaps is a command-chain wrapper script (Signal, Mattermost).
snap.executableArgs in electron-builder config is broken for snap builds
(electron-builder #4587). SP's existing pattern —
app.commandLine.appendSwitch from the Electron main process — is a third
working mechanism and the one PR #7264 extends.
Three mechanisms exist for applying Chromium flags in an electron-builder snap build, ranked by reliability:
app.commandLine.appendSwitch(...) inside the Electron main process
(before app.whenReady()). SP's existing guard at electron/start-app.ts
uses this pattern; PR #7264 extends it. Works for any flag Chromium reads
during init, including --ozone-platform. No packaging changes.afterPack hook renames the real binary and drops a wrapper script
at the same name → a full pre-Electron wrapper, no snapcraft.yaml
changes. Useful for flags that must be set before the Electron main
process starts. (Referenced as a pattern in community sources;
Teams-for-Linux does not actually use it — see Section 5.)snap.executableArgs in electron-builder config is broken for snap
builds per electron-builder#4587 —
the flags are silently ignored. Teams-for-Linux's config illustrates
this: they set executableArgs: ["--ozone-platform=x11"] for both
build.linux and build.snap, but only the non-snap side takes effect.
Do not use.The dominant pattern among peer snaps (Signal, Mattermost) is a
command-chain entry in snap/snapcraft.yaml invoking a wrapper shell
script — equivalent to mechanism #2 but expressed via snapcraft rather than
electron-builder. All three working approaches (mechanism #1 plus the two
wrapper variants) avoid auto-connect requests, store-review friction, and a
base bump.
| # | Option | Fixes errors | Keeps HW accel | Scope | Effort | Evidence alignment |
|---|---|---|---|---|---|---|
| 1 | Narrow: --ozone-platform=x11 via app.commandLine.appendSwitch when Snap + Wayland | Yes for ~95% | Yes (X11/GLX) | Snap only, conditional | ~1 file, ~20 LOC | Strongest — electron-builder #9452 maintainer + users converge on --ozone-platform=x11; matches SP's existing mechanism |
| 2 | Disable GPU default on Snap, opt-in via env/config | Yes | No — loses HW accel for working users | Snap only, unconditional | One-liner + doc | Evidence-backed but blunt |
| 3 | afterPack wrapper: detect GPU at launch, conditionally add flags | Yes when detection works | Yes when works | Snap only | afterPack script + wrapper | GL-probe false negatives are a known failure mode |
| 4 | Migrate to core24 + custom snapcraft.yaml + gpu-2404 | Yes (fundamental) | Yes | All Snap users | 1–2 days + auto-connect wait | Best long-term; orthogonal to this PR |
| 5 | Runtime detection + relaunch (app.on('child-process-gone')) | Yes after 1 bad launch | Yes for working users | Snap only | Medium | Clever, but first-launch UX is bad |
| 6 | Status quo + FAQ | No | Yes | — | Zero | Abandons affected users (issue #5672) |
Option 1: --ozone-platform=x11 conditional on Snap + Wayland, via the
existing guard in electron/start-app.ts.
--disable-gpu,
X11 + GLX still uses the GPU. Users only lose Wayland fractional scaling
(a known, documented trade-off).electron/start-app.ts
via app.commandLine.appendSwitch. SP already has Snap-gated
ozone-platform=x11 logic in electron/start-app.ts (pre-PR: gated on
an empty gnome-platform directory). The only change needed is to
extend the gate to "Snap + Wayland session," with the gnome-platform
probe retained as a secondary OR fallback (belt-and-suspenders for any
non-Wayland Snap users who still hit the ABI drift).
electron-builder.yaml's snap.executableArgs is broken for snap
builds (electron-builder#4587) —
app.commandLine.appendSwitch is the only reliable mechanism for this
from inside electron-builder.This is what the existing migration plan partially implemented. The plan's
defense-in-depth was intended to catch exactly this scenario; the
gnome-platform emptiness probe doesn't catch the common case because
gnome-platform is populated — just ABI-drifted. Widening the guard to
SNAP + Wayland (with the gnome-platform probe retained as OR fallback)
matches the empirical breakage pattern.
Disabling GPU entirely makes sense for apps where stability dominates over
compositing quality. Super Productivity is a productivity app — it benefits
from GPU compositing, and forcing --disable-gpu on ~95% of Snap users is a
worse UX than forcing X11.
Runtime GL probes (e.g., glxinfo) produce false negatives when the GPU
content interface isn't connected in Snap, so a launch-time detector can
disable GPU on machines where GPU would in fact have worked. Not a pattern to
build on.
Correct long-term, but 1–2 days of work + auto-connect wait + store review + risk of new regressions right after shipping 18.2.x. Schedule for 18.3 or 19.0.
| Claim | Confidence |
|---|---|
| Direction (X11 fallback for Snap + Wayland) | High — converged from multiple independent threads (peer app community reports, GitHub issues, scope matrix, Canonical position, escape hatches) |
| Exact gating predicate (Snap + Wayland vs. just Snap) | Medium-high — Wayland is the proximate trigger, but a few X11 reports exist. Keeping the gnome-platform-empty probe as a fallback is the belt-and-suspenders move |
core24 migration as the real long-term fix | High on direction, medium on timing |
| Dec 2025 reports correlate with Chromium 140 / Electron ≥38.2 Wayland-default | High — SP was on Electron 39.2.5 in Dec 2025 (verified via tagged package.json); Chromium 140 (Aug 2025) flipped --ozone-platform-hint=auto; electron-builder#9452 independently identifies Electron ≥38.2.0 as the trigger |
| Peer-app implementation details in Section 5 | High — verified in follow-up pass against snapcrafters repos, microsoft/vscode#202072, electron-builder#4587, flathub/md.obsidian.Obsidian; several original claims contradicted and reframed |
Widen the existing guard in electron/start-app.ts (pre-PR: lines 70–88;
post-PR #7264: lines 75–98):
gnome-platform directory missing or empty.XDG_SESSION_TYPE === 'wayland'
or WAYLAND_DISPLAY set), with the existing gnome-platform probe retained
as a secondary fallback.Estimated diff: ~20 functional LOC in electron/start-app.ts (~35 lines
including comments). No electron-builder.yaml changes required
(snap.executableArgs is broken for snap builds —
electron-builder#4587).
--ozone-platform=wayland CLI override — the PR checks process.argv for
an existing --ozone-platform to avoid overriding the user).Follow-up to issue #7270. Filed
against v18.2.2, which shipped before the Snap+Wayland widening from
PR #7266. Timeline correction (verified 2026-04-19): PR #7266 was
merged to master but is NOT in the v18.2.3 tag
(git merge-base --is-ancestor ac7cf7b853 v18.2.3 returns NOT ANCESTOR;
the v18.2.3:electron/start-app.ts only contains the original
gnome-platform-empty probe). The v18.2.3 release was cut from a branch
that didn't pick up #7266. So 7270's reporter on v18.2.2 is not
helped by updating to v18.2.3 — they need 18.2.4 (or whatever ships
next with #7266 included). This changes PR #7273's positioning: not a
"tail 5%" fallback on top of a shipped primary fix, but potentially the
first released recovery path for confined-Linux users until #7266 ships.
Empirical confirmation (2026-04-19): issue 7270's reporter
(GoZilla192)
verified that superproductivity --ozone-platform=x11 resolves their
launch failure on Ubuntu 22.04 / 18.2.2-snap. Direct evidence that the
Mesa ABI-drift diagnosis is correct and #7266's X11 widening is the
right primary fix. As a result, PR #7273 was initially closed in
favor of #7266, with a revisit condition: reopen only if a real report
came in that X11 widening did not rescue.
Revisit (2026-04-20): the revisit condition fired. Two post-v18.2.4 field reports (§16) show #7266's guard firing correctly and still not rescuing the user — one on Intel Arrow Lake / Ubuntu 24.04, one on AMD Raphael / Ubuntu 25.10. Same Mesa DRI load failure in both. The "speculative defense-in-depth" framing is inverted: #7273 is the mechanism that rescues the population #7266 provably does not.
Presence-based crash marker in userData:
SNAP || FLATPAK_ID), check for
.gpu-launch-incomplete. If present → previous launch failed →
append --disable-gpu this time.IPC.APP_READY (after Angular init), unlink the marker.Env overrides SP_DISABLE_GPU / SP_ENABLE_GPU work on all platforms
(useful for debugging and for non-Snap/Flatpak Linux users with broken
GPUs).
Option 5 proposed app.on('child-process-gone') + relaunch — which
requires the first-launch GPU crash to actually fire the event (unreliable
when the process hangs) and a subsequent relaunch inside the same boot
(bad UX, visible flicker, tray races). PR #7273's marker-file approach:
APP_READY).child-process-gone — arguably the dominant failure mode per the
symptom breakdown in Section 3 ("tray icon appears, no window ever
renders").The "first-launch UX is bad" objection to Option 5 only partly applies: launch #1 still fails, but launch #2 auto-recovers without user action. That's strictly better than status quo (permanent failure) and better than Option 2 (blanket disable on all Snap users).
--disable-gpu (not --ozone-platform=x11) hereThis is the crucial mechanism difference. --ozone-platform=x11 keeps
the GPU process alive on the X11/GLX path — it only dodges the Wayland
EGL/GBM init. --disable-gpu avoids the hardware GPU / Mesa DRI
driver load path, which is the ABI-drift source on confined Snap.
Correction per §13 verification: --disable-gpu does NOT guarantee
"no GPU process at all" — Chromium may still run a GPU process in
SwiftShader or DisplayCompositor mode (see §13.1§1). But those modes
don't dlopen Mesa DRI drivers, which is what matters for this bug.
Trade-off: software rendering only. For Super Productivity (mostly DOM
and text, little WebGL), the perf loss is negligible; for a broken user
it's strictly better than a non-launching window.
| Layer | Where | Who it helps |
|---|---|---|
Snap + Wayland → --ozone-platform=x11 | start-app.ts (shipped in 18.2.3) | ~95% of Snap Wayland users; keeps HW accel |
Snap/Flatpak + previous crash → --disable-gpu | gpu-startup-guard.ts (PR #7273) | Remaining users: Snap X11 with Mesa ABI drift, Flatpak, any future GPU-init regression |
Env overrides (SP_DISABLE_GPU, SP_ENABLE_GPU) | Both | Debugging, user escape hatches |
core24 + gpu-2404 migration | Packaging | All Snap users, long term (18.3 / 19.0) |
The research doc's Section 7 framed the options as exclusive. PR #7273 demonstrates they are composable: Option 1 handles the common case with no UX regression; PR #7273 handles the tail with one failed launch as the cost.
| Risk | Likelihood | Mitigation in the PR |
|---|---|---|
| User force-quits during normal boot → marker stays → next launch unnecessarily disables GPU | Medium (OS updates, system sleep, SIGKILL on crash elsewhere) | Marker is removed on next APP_READY, so cost is capped at one GPU-disabled launch |
APP_READY IPC doesn't fire (renderer hangs post-Angular-init) → marker never cleared → permanent --disable-gpu | Low | Manual escape: SP_ENABLE_GPU=1 env var or delete .gpu-launch-incomplete |
| Marker write fails (read-only userData, NFS quirks) → guard silently skips, but legacy cleanup still runs | Very low on Snap (SNAP_USER_COMMON is always writable) | Errors caught and logged, launch proceeds without guard |
| False positive on first install after upgrade from a build without the guard | None | Fresh install has no marker; upgrade path writes a new marker on first launch only |
FLATPAK_ID detection misses edge cases (e.g., custom Flatpak manifests that unset the env) | Low | The env override (SP_DISABLE_GPU) still works for those users |
--disable-gpu breaks a renderer feature we depend on (e.g., WebGL-backed chart) | None identified | SP UI is DOM+text; no WebGL path confirmed |
Marker path races with the app.setPath('userData', ...) call for Snap (line 149 of start-app.ts) | None — PR places evaluateGpuStartupGuard after the Snap userData redirect | PR comment explicitly flags this invariant |
PR #7273 does not add tests. The logic is pure (input: userData path + env,
output: decision) and trivially unit-testable. Copilot's arena entry
demonstrates the pattern (should-force-snap-ozone-platform-x11.spec.ts).
Recommended before merge: extract evaluateGpuStartupGuard into a
pure function over { userDataPath, env, platform, fs } and add a spec
covering:
SP_ENABLE_GPU=1 overrides a present markerSP_DISABLE_GPU=1 without marker → disableGpu=true, reason='env'core24 + gpu-2404 migration scheduled for 18.3/19.0
as the long-term root-cause fix.Copilot's arena approach (widening the Snap X11 guard to unconditional on Snap, regardless of Wayland detection) is a defensible alternative to Option 1's Snap+Wayland gate — it handles the "a few X11 reports exist" case flagged at medium-high confidence in Section 9. But it sacrifices Wayland-native features for every Snap user unconditionally, whereas PR #7273 only degrades (and only to software rendering) for users who actually failed. PR #7273 is the better defense-in-depth.
Parallel investigation by two independent research agents. A codex-CLI and gemini-CLI were also fired; codex exhausted its budget in search without producing a structured section and gemini returned empty — their findings are not represented below. Treat the two subsections as complementary (13.1 = correctness/mechanism, 13.2 = prior-art/testing/long-term).
--disable-gpu actually prevent Mesa/libgbm loading?Partially — Section 12's claim "no GPU process = no Mesa load" is
overstated. Chromium's content/browser/gpu/fallback.md
documents a fallback stack
HARDWARE_VULKAN → HARDWARE_GL → SWIFTSHADER → DISPLAY_COMPOSITOR.
--disable-gpu pops the hardware entries but does not eliminate the
GPU process — it is re-spawned in SwiftShader (CPU, no DRI) or
DISPLAY_COMPOSITOR mode. Corroborated by
chromium-discuss: "The GPU process still runs with --disable-gpu"
and electron/electron#28164.
The standard workaround is --disable-gpu --disable-software-rasterizer
together (the PR uses only the first).
What this means for SP: in SwiftShader mode the GPU process does not
open /dev/dri/* or load Mesa DRI drivers (SwiftShader is a pure-CPU JIT
rasterizer; see Chromium SwiftShader docs),
which is the ABI-drift source we care about. However, Ozone platform
init (Wayland client) and GL-context probing still occur before fallback
— whether libgbm.so is dlopen'd on the SwiftShader path specifically
on Linux/Ozone is unverified; the fallback doc is silent on Linux-
desktop specifics. On the evidence we have, --disable-gpu is likely
sufficient to avoid the core22-mesa-backports DRI-driver ABI mismatch
signature (which is Mesa DRI, not GBM), but Section 12's "no Mesa, no
libgbm, no DRI" bullet should be softened to "no hardware Mesa DRI
driver load" — not "no GPU process."
Recommendation: append --disable-software-rasterizer alongside
--disable-gpu in evaluateGpuStartupGuard's positive branch to
genuinely suppress GPU-process spawn, eliminating the theoretical
SwiftShader-GPU-process-init path. Cost is nil (SP has no WebGL
dependency).
APP_READY lifecycleVerified from SP source:
APP_READY synchronously from AppComponent's
constructor via this._startupService.init() →
window.ea.informAboutAppReady()
(src/app/core/startup/startup.service.ts:136, called from
src/app/app.component.ts:195). This is before deferred init
(plugins, storage checks) — DEFERRED_INIT_DELAY_MS = 1000 runs
after.electron/main-window.ts:278 (unchanged by PR
#7273).ready-to-show
(electron/main-window.ts:245-246), which fires when the first frame
is ready regardless of Angular bootstrap success. So Section 12's
claim that the marker doesn't clear "on blank/broken renderers that
still fire ready-to-show" is correct.Consequence: if Angular boots but any later feature crashes the
renderer after APP_READY, the marker is already gone — next launch
is treated as clean (correct behavior: Angular init succeeded, so GPU
init also succeeded). If the renderer crashes before APP_READY but
after the window appears, user sees a broken window and next launch
disables GPU. This is desired for GPU init failures, but the same
signal fires for any crash during Angular bootstrap (dependency
injection error, CSP violation, corrupt IndexedDB). False-positive rate
is non-zero but bounded — one GPU-disabled next launch, then
self-heals.
| Signal | More precise? | Verdict |
|---|---|---|
app.on('child-process-gone', {type:'GPU', reason:'launch-failed'}) | Yes — distinguishes GPU-init from generic renderer crashes (electronjs.org/docs/latest/api/app — launch-failed = "Process never successfully launched") | Useful complement, but unreliable when the GPU process hangs rather than exits (Section 12 notes this is the dominant failure mode per Section 3). Also fires mid-launch, forcing a relaunch with its own UX costs. |
app.on('render-process-gone', reason:'crashed') | No — fires for any renderer crash | Same false-positive surface as the marker, fires mid-launch. |
app.getGPUInfo('complete') at startup | No — promise is reported to never settle on some broken systems (electron#17187); Electron docs don't guarantee this behavior | Reject — would hang the app on affected systems. |
gpu-info-update + getGPUInfo('basic') | No — basic info always reports softwareRendering: false (electron#17447) | Reject. |
Best pattern: marker as primary + child-process-gone with
type:'GPU' writing a second marker with reason: 'launch-failed' to
distinguish genuine GPU crashes from generic bootstrap failures in logs.
The PR's current design is sound; adding the event listener is additive
and low risk.
Concrete ways the marker gets left behind without a GPU-init crash:
DefaultTimeoutStopSec (90s default) during logout/reboot; see
systemd#4206,
Arch forum on session-c1.scope.
Common during OS updates and hibernation-resume cycles.snap refresh mid-session).APP_READY.kill -9 during dev: leaves marker.Cost per incident: one unnecessary --disable-gpu launch. Marker
self-heals on next APP_READY. Section 12's risk table rates this
"Medium" — accurate.
The pattern — "write a sentinel on entry, clear on success; if present next launch, take a safer path" — is well-established but has no single canonical name. Common terms in the literature: "launch-crash detection" (BugSnag), "crash loop breaker" (Sentry), and "startup-crash marker" (Firefox internals).
| Implementation | Mechanism | Source |
|---|---|---|
| Firefox | toolkit.startup.recent_crashes pref is incremented on startup-without-clean-shutdown and compared against max_resumed_crashes to auto-offer Troubleshoot/Safe Mode. Handled in nsAppRunner.cpp via XRE_mainInit. | Bugzilla 294260, Bugzilla 745154, nsAppRunner.cpp (searchfox) |
| Chromium | GpuProcessHost::RecordProcessCrash() maintains an in-process crash counter; after kGpuFallbackCrashCount crashes it pops the next mode off GpuDataManagerImplPrivate::fallback_modes_ (HW Vulkan → HW GL → SwiftShader → DisplayCompositor). State is not disk-persisted across browser restarts — this is the gap PR #7273 fills for Electron apps. | fallback.md |
| BugSnag | 5-second window after Bugsnag.start(); exposes lastRunInfo.crashedDuringLaunch so apps can self-remediate. | BugSnag — Identifying crashes at launch (Android) |
| Sentry Cocoa | Open feature request for a native crash-loop detector; ecosystem confirms the pattern is general. | sentry-cocoa #3639 |
| VS Code / Discord / Slack / Obsidian / Figma | No automatic self-healing found. All rely on manual user action (--disable-gpu, settings toggle, delete GPUCache). | vscode FAQ, microsoft/vscode #214446 |
SP PR #7273 is therefore novel in the Electron ecosystem but follows
an established browser-native pattern (Firefox's recent_crashes,
Chromium's in-process GpuMode stack). Confidence: high.
FLATPAK_ID is reliably set by Flatpak's run machinery and is used
throughout Flatpak's own docs to construct ~/.var/app/$FLATPAK_ID
paths (Flatpak sandbox-permissions docs).
A more authoritative signal is the presence of /.flatpak-info inside
the sandbox (same doc); recommend adding it as an OR fallback for the
few manifests that unset env vars. AppImage/.deb are not worth
guarding — they don't have the Mesa-ABI-drift failure mode (they use
the host's Mesa, not a bundled content snap).
Snap+NVIDIA-proprietary (nvidia-core22) uses Nvidia's EGL
implementation, not Mesa (canonical/nvidia-core22)
— the same class of crash-at-init failure can occur (driver/X-server
mismatch), so the guard triggering there is acceptable collateral, not
a bug.
SP has no electron/*.spec.ts files today and Karma runs in
ChromeHeadless (src/karma.conf.js), which lacks Node fs. Two viable
options: (a) inject fs/env/platform as parameters so the function
is pure and testable under Karma with jasmine.createSpyObj; (b) add a
dedicated Node-side test runner. Option (a) is lower-risk and matches
existing SP util test patterns (see src/app/util/real-timer.spec.ts).
// electron/gpu-startup-guard.spec.ts (requires the DI refactor below)
import { evaluateGpuStartupGuard } from './gpu-startup-guard';
type FakeFs = Pick<
typeof import('fs'),
'existsSync' | 'writeFileSync' | 'unlinkSync' | 'mkdirSync'
>;
const makeFs = (initial: Record<string, boolean> = {}) => {
const files: Record<string, boolean> = { ...initial };
const fs: FakeFs = {
existsSync: (p) => !!files[p as string],
writeFileSync: (p) => {
files[p as string] = true;
},
unlinkSync: (p) => {
if (!files[p as string]) {
throw Object.assign(new Error('ENOENT'), { code: 'ENOENT' });
}
delete files[p as string];
},
mkdirSync: () => undefined as any,
};
return { fs, files };
};
describe('evaluateGpuStartupGuard', () => {
const USER = '/u';
const CONFINED = { SNAP: '/snap/sp', XDG_SESSION_TYPE: 'wayland' };
it('confined + no marker → writes marker, does not disable GPU', () => {
const { fs, files } = makeFs();
const d = evaluateGpuStartupGuard({
userDataPath: USER,
env: CONFINED,
platform: 'linux',
fs,
});
expect(d.disableGpu).toBeFalse();
expect(files[`${USER}/.gpu-launch-incomplete`]).toBeTrue();
});
it('confined + marker present → disables GPU with reason=crash-recovery', () => {
const { fs } = makeFs({ [`${USER}/.gpu-launch-incomplete`]: true });
const d = evaluateGpuStartupGuard({
userDataPath: USER,
env: CONFINED,
platform: 'linux',
fs,
});
expect(d).toEqual(
jasmine.objectContaining({ disableGpu: true, reason: 'crash-recovery' }),
);
});
it('SP_ENABLE_GPU=1 overrides a present marker', () => {
const { fs } = makeFs({ [`${USER}/.gpu-launch-incomplete`]: true });
const d = evaluateGpuStartupGuard({
userDataPath: USER,
env: { ...CONFINED, SP_ENABLE_GPU: '1' },
platform: 'linux',
fs,
});
expect(d.disableGpu).toBeFalse();
});
it('SP_DISABLE_GPU=1 on non-confined Linux → env reason, no marker', () => {
const { fs, files } = makeFs();
const d = evaluateGpuStartupGuard({
userDataPath: USER,
env: { SP_DISABLE_GPU: '1' },
platform: 'linux',
fs,
});
expect(d.disableGpu).toBeTrue();
expect(d.reason).toBe('env');
expect(files[`${USER}/.gpu-launch-incomplete`]).toBeUndefined();
});
it('non-confined Linux → noop, markerPath=null', () => {
const { fs } = makeFs();
const d = evaluateGpuStartupGuard({
userDataPath: USER,
env: {},
platform: 'linux',
fs,
});
expect(d).toEqual({ disableGpu: false, reason: null, markerPath: null });
});
it('unlinks legacy marker files on confined Linux', () => {
const { fs, files } = makeFs({
[`${USER}/.gpu-startup-state`]: true,
[`${USER}/.gpu-startup-state.json`]: true,
});
evaluateGpuStartupGuard({ userDataPath: USER, env: CONFINED, platform: 'linux', fs });
expect(files[`${USER}/.gpu-startup-state`]).toBeUndefined();
expect(files[`${USER}/.gpu-startup-state.json`]).toBeUndefined();
});
it('fs.writeFileSync throwing does not break the decision', () => {
const { fs } = makeFs();
fs.writeFileSync = () => {
throw new Error('EROFS');
};
expect(() =>
evaluateGpuStartupGuard({
userDataPath: USER,
env: CONFINED,
platform: 'linux',
fs,
}),
).not.toThrow();
});
});
Required refactor: change evaluateGpuStartupGuard(userDataPath) to
evaluateGpuStartupGuard({ userDataPath, env, platform, fs }) with
defaults from process/fs at the call-site in start-app.ts. No
behavior change, fully unit-testable.
SP_ENABLE_GPU=1 with a present marker: PR #7273 returns early
before the marker is written — but the marker-path is still
computed (isConfinedLinux branch above the env checks), so
markStartupSuccess() later unlinks it on APP_READY. Net effect:
the override does clear the marker on success. That is correct:
the user asserted "GPU is fine now," a successful boot confirms it,
and fresh crash tracking starts from zero. If the override is
removed and GPU fails again, the next launch writes a fresh marker
and the one after that triggers recovery — two failed launches to
re-trigger, one more than without the override. Acceptable tradeoff;
document it. Confidence: high.unlinkSync with swallowed ENOENT is safe. Recommend
time-limiting the cleanup: keep it through 18.3, remove in 19.0 —
leaving unused fs.unlinkSync calls in a hot startup path is
clutter. Risk of leaving it permanent: near zero (two extra stat
calls on Snap/Flatpak launch).PR #7273 is a genuine stopgap, not a replacement for core24 +
gpu-2404 migration. Rationale:
--disable-gpu forces software rendering — fine for SP's DOM/text
UI but still a visible perf regression vs. HW-accelerated X11/GLX
(SP's v18.2.3 path).core24 + gpu-2404 fixes the root cause (Mesa ABI drift),
keeping HW accel for all Snap users without the one-failed-launch
penalty.gpu-2404) is robust: even after the migration, the
marker guard remains cheap insurance for future Chromium/Mesa
regressions (e.g., the recurring Electron 38/Tahoe-style
breakages —
AppleInsider 2025-10).Recommendation: keep 18.3/19.0 gpu-2404 migration scheduled; treat
PR #7273 as permanent defense-in-depth, not a delete-later hack.
Confidence: high.
A third independent agent (codex CLI, read-only) reviewed the same material and converged on the same core findings as 13.1 and 13.2. Notable agreement:
--disable-gpu overclaim — codex independently cites Chromium's
own GPU integration tests
which expect a GPU process under --disable-gpu on Linux and test
--disable-gpu --disable-software-rasterizer together as the
"no GPU process" case. Three independent sources (Claude agents 1 +
2, codex) converge on the same recommendation: append
--disable-software-rasterizer.APP_READY framing: codex proposes clearer wording —
APP_READY means "startup succeeded enough to use the app," not
"all later renderer failures are covered." Recommend applying this
in-line in Section 12.userData variant isn't in peer Electron apps.
Confidence: medium, not high.SP_ENABLE_GPU=1 marker-clearing: codex independently reaches
the same conclusion as 13.2 (successful boot acknowledges prior
crash; matches browser convention). If a one-shot diagnostic
override is ever wanted that does not acknowledge success, it
should be a separate env var.Codex's unique contribution:
{ ts, reason?, gpuChildGone? } populated
from app.on('child-process-gone', {type:'GPU'}) and
render-process-gone listeners fired before APP_READY. Cost:
same fs call path, same semantics; gain: post-incident forensics
without a telemetry system. This is a cleaner upgrade path than
the two-marker scheme proposed in 13.1§3.max_resumed_crashes
behavior. Downside: delays recovery by one extra failed launch.
Defer unless warranted by real reports.Ordered by importance:
--disable-software-rasterizer alongside --disable-gpu in
start-app.ts when the guard triggers. Without it, Chromium
respawns the GPU process in SwiftShader mode — still a GPU process,
still runs Ozone init. See 13.1§1.evaluateGpuStartupGuard to take an options object so
fs, env, and platform can be injected — enables Karma unit
tests (see 13.2§Testing). Then add the .spec.ts above./.flatpak-info existence check as an OR fallback to the
FLATPAK_ID detection. Cheap, covers manifests that unset the env
var.app.on('child-process-gone', …) listener that
logs reason to the main-process log when type: 'GPU' — gives
telemetry (in logs) without building a telemetry system, and
confirms the guard is firing for the intended cause.{ ts, reason?, gpuChildGone? } populated from
child-process-gone/render-process-gone listeners. Cost: same
code path; gain: post-incident forensics without telemetry.None of these block merging. #1 is the highest-impact correctness fix —
it closes the gap where --disable-gpu alone still lets Chromium respawn
the GPU process in SwiftShader mode (independently identified by all
three research agents).
Four independent agents (two Claude research-architects, one Claude code-reviewer, one codex CLI) adversarially reviewed Sections 12–13 and PR #7273. The findings below are verified (citations fetched, code grepped) or explicitly rejected where agents disagreed.
git merge-base).--disable-gpu claim: softened from "not spawn a GPU process
at all" to "avoids the hardware GPU / Mesa DRI driver load path."
Chromium still spawns a GPU process in SwiftShader or
DisplayCompositor modes.getGPUInfo('complete') from
"documented to never settle" to "reported."--disable-gpu / SwiftShader
behavior claim is currently attributed to Chromium's fallback.md.
It should be attributed to the chromium-discuss thread
and the GPU process integration test
(which explicitly expects a GPU process under --disable-gpu on
Linux, and tests --disable-gpu --disable-software-rasterizer as
"no GPU process"). fallback.md documents the mode stack but not
the Linux --disable-gpu behavior.--disable-software-rasterizer strength: codex's
verification cautions that DISPLAY_COMPOSITOR is still a GPU-process
mode, so that flag doesn't guarantee "no GPU process" either.
Keep the flag as cheap belt-and-braces (no WebGL dep in SP) but
drop the framing that it fully suppresses the GPU process."SP_ENABLE_GPU=1 crash leaves no marker → no recovery" (Agents
B and C): rejected. A stale marker from a previous crash
persists across the override path — the early return at pr7273.diff:54
does not clear the marker, it just returns early before potentially
writing a fresh one. Sequence: crash → marker written →
override-launch with SP_ENABLE_GPU=1 → early return, marker stays →
crash again → next launch without override → existing marker
triggers recovery. Codex correctly traced this. Agents B and C
overstated the problem.
Remaining edge case: first-ever launch where the user sets
SP_ENABLE_GPU=1 AND a crash occurs AND no marker has ever been
written — costs +1 extra crashed launch before recovery. Acceptable;
document in PR.
Oscillation for genuinely broken GPU with no user action: every-
other-launch pattern (crash → recover → retry → crash → recover…).
That's the designed retry-after-recovery behavior — cost is half of
launches are bad until root cause is fixed or user sets
SP_DISABLE_GPU=1. Reasonable tradeoff; note in the PR docs.
"APP_READY fires from AppComponent constructor" (Agent B,
corrected by Agent D): reworded. APP_READY is sent from the
synchronous body of StartupService.init() (async function; no
awaits precede it on the Electron path today). The constructor
calls init() but doesn't fire APP_READY itself. Brittle to
upstream refactor (adding an await earlier would shift timing).
"_initBackups() is awaited and can strand APP_READY" (Agent
C): rejected. startup.service.ts:104 is this._initBackups();
(no await) — fire-and-forget. informAboutAppReady() at line 136
runs in the same microtask.
5 minutes — suggests systemd shutdown SIGKILL, not a fast GPU crash).
mkdirSync is load-bearing: on first-ever Snap
install, $SNAP_USER_COMMON/.config/superproductivity does not
exist. Electron's app.setPath('userData', …) does NOT create the
directory. The PR's fs.mkdirSync(userDataPath, {recursive: true})
on line 80 is what makes first launch work. Worth a
comment/invariant. Add a test case.markerPath not reset on non-confined path:
pr7273.diff:27 is let markerPath: string | null = null; but the
non-confined early return on line 61 returns markerPath: null
without resetting the module variable. Adds a subtle bug if the
function is called twice (tests, reinit). Set markerPath = null
on the non-confined branch.isTruthyEnv asymmetry: SP_DISABLE_GPU=0 is treated as unset
(regex /^(1|true|yes|on)$/i). Users may intuitively set
SP_DISABLE_GPU=0 expecting to force GPU back on — wrong; use
SP_ENABLE_GPU=1. Document.--disable-gpu-sandbox as intermediate step: on Snap-confined
Electron, GPU sandbox init can fail independently of Mesa ABI drift.
A 2-step ladder (first crash → --disable-gpu-sandbox; second
crash → --disable-gpu) would preserve HW accel for sandbox-only
failures. Defer unless reports come in.Agent A verified the high-risk citations. Summary:
fallback.md stack order; GPU
integration test (_GpuProcess_disable_gpu_and_swiftshader +
_GpuProcess_disable_gpu); electron/electron #28164, #17187,
#17447; Bugzilla 294260; BugSnag docs; chromium-discuss thread;
canonical gpu-2404 ("evolution" wording); electron-builder
#9452; snapcrafters signal-desktop wrapper (--disable-gpu default
ON); snapcrafters mattermost-desktop (glxinfo llvmpipe probe +
jq config patch); AppleInsider Tahoe article (confirmed via
secondary sources).fallback.md does NOT document the
--disable-gpu Linux behavior. Reattribute to chromium-discuss +
integration test (see "Outstanding corrections" above).build.linux.executableArgs,
not build.snap.executableArgs directly — the snap-scoped
brokenness is inferred from the same root cause. Clarify in §6.nsAppRunner.cpp (file too
large). Logic exists per Bug 294260; cite a specific searchfox
anchor instead of the whole file.Verdict: Approve with changes. The design is sound; the implementation has three real bugs, two documentation gaps, and one mechanically-wrong code comment. None are blockers.
PR code comment in start-app.ts (lines 145–154 of the diff) is
mechanically wrong. It says --disable-gpu "suppresses GPU-process
spawn." That's false on Linux — Chromium respawns the GPU process in
SwiftShader or DisplayCompositor mode. Reword to: --disable-gpu
avoids the hardware Mesa DRI driver load path, which is the source
of the Snap ABI-drift crash. The GPU process may still run in
software mode.
Module-level markerPath not reset on the non-confined early
return (pr7273.diff:27, 60-62). Add markerPath = null; before
the non-confined return. Makes the function idempotent — otherwise
a second call from a test or reinit retains the previous value.
First-launch mkdirSync invariant undocumented. The
fs.mkdirSync(userDataPath, {recursive: true}) on line 80 is
load-bearing for fresh Snap installs (Electron's app.setPath
doesn't create the directory). Add a comment; add a test case.
SP_ENABLE_GPU=1 semantics: document that (a) overriding
with a crash during that launch means +1 extra bad launch before
recovery kicks in on the next normal launch, not infinite
oscillation; (b) SP_DISABLE_GPU=0 does NOT turn recovery off —
it's parsed as unset; use SP_ENABLE_GPU=1 for that.
Oscillation behavior for genuinely broken GPU: the every-other-
launch pattern is by design (retry after each recovery). Note it
in the PR body so users understand the expected experience until
they fix the root cause or set SP_DISABLE_GPU=1 persistently.
Add --disable-software-rasterizer alongside --disable-gpu
(§13.4 item 1): cheap belt-and-braces; SP has no WebGL dependency.
Drop the "fully suppresses GPU process" framing — at most claim
"avoids software-GL fallback initialization."
Extract evaluateGpuStartupGuard to a pure function with
injected fs/env/platform and add the unit test file from
§13.2. This is the largest correctness gap — there are no tests
today. The refactor is mechanical and doesn't change behavior.
Time-bound the marker: if fs.statSync(markerPath).mtime is
older than N minutes (5–10), assume systemd SIGKILL / snap refresh
rather than a GPU crash and skip recovery. Cut false-positive rate
on suspended laptops. Defer until reports confirm this is noisy.
--disable-gpu-sandbox as intermediate step: Chromium-style
2-step ladder. Defer until a sandbox-specific failure is reported.
Structured JSON marker payload (§13.3 codex suggestion):
{ ts, reason?, gpuChildGone? } populated from
child-process-gone/render-process-gone listeners. Cleaner
forensics than a zero-byte marker. Defer — can be added without
breaking compatibility.
Add /.flatpak-info existence check as OR fallback to
FLATPAK_ID. Covers manifests that unset env vars. Cheap.
Time-box the legacy-marker cleanup (remove in 19.0) with a TODO comment.
core24 + gpu-2404 migration for 18.3/19.0 as the
root-cause fix.child-process-gone, Flatpak, X11 users with ABI-drifted Mesa).--disable-software-rasterizer meaningfully
improves the recovery path — evidence base is a single
chromium-discuss thread and an integration test, both of uncertain
currency against Chromium 146.Two post-release field reports on the Snap+Wayland X11 widening shipped
in v18.2.4 (PR #7266). First reporter
DerEchteKoschi
labels their install as 18.2.3, but the attached log contains the
"Snap: forcing X11 (wayland=true, gnomePlatformMissing=false, ..."
string which only exists in v18.2.4 (verified via
git show v18.2.3:electron/start-app.ts vs v18.2.4). Treat this as a
v18.2.4 report. Second reporter
nekufa
is on snap revision 3482 (latest/edge, v18.2.4) — the same log string
confirms the guard is active.
DerEchteKoschi:
i915/xe) — Intel's late-2024 GPU arch,
not covered by the core22-mesa-backports PPA's Mesa.XDG_SESSION_TYPE=wayland, WAYLAND_DISPLAY=wayland-0).nekufa:
questing) — even further from core22's mesa baseline.latest/edge), confined.amdgpu) — a 2022 part, not new hardware.XDG_SESSION_TYPE=wayland, WAYLAND_DISPLAY=wayland-0).The two reports span both GPU vendors and two Ubuntu releases newer than 22.04. The failure pattern is identical; host-GPU generation is not the discriminator.
Both logs share the same failure pattern:
Snap: forcing X11 (wayland=true, gnomePlatformMissing=false, XDG_SESSION_TYPE=wayland, WAYLAND_DISPLAY=set).MESA-LOADER: failed to open dri: /usr/lib/x86_64-linux-gnu/gbm/dri_gbm.so: cannot open shared object file — repeated N times on both the pre-X11-init and post-X11-init
log lines.GPU process exited unexpectedly: exit_code=139 (SIGSEGV) at
least 3 times within ~400ms. Even with ozone-platform=x11 applied,
the GPU process is segfaulting because Mesa DRI can't load.[ERROR:ui/base/x/x11_software_bitmap_presenter.cc:147] XGetWindowAttributes failed for window 1 — X11 presenter also
fails; system compositor context is not usable to Chromium from
inside this snap sandbox.vaInitialize failed: unknown libva error — VA-API broken on
both (DerEchteKoschi via i965/Intel path; nekufa via
radeonsi_drv_video.so/AMD path).dbus-send: ... libdbus-1.so.3: version LIBDBUS_PRIVATE_1.12.20 not found (required by dbus-send) — bundled libdbus in the
snap is older than what the copied dbus-send expects. Runtime
mismatch inside the snap itself. Reproduces on both 24.04 and 25.10.nekufa-specific caveat: the user's CLI invocation was
superproductivity --ozon-platform=x11 (typo: missing e). Per
electron/start-app.ts:73-75, hasOzoneOverride only matches
--ozone-platform, so the programmatic appendSwitch still ran. The
log therefore reflects the default/programmatic path, not a CLI
override — it's a clean test of what v18.2.4 ships. A correctly-spelled
retest has been requested on the thread.
Section 2 Scope table — lower-bound correction. "Snap + Electron with Wayland-default + Mesa GPU + Wayland session: ~95–100% fixed" is optimistic. A more honest framing:
| Population | Fixed by #7266 alone | Needs #7273 or manual flag |
|---|---|---|
| Snap+Wayland, core22-mesa-backports Mesa aligned with Electron's libgbm | ~high | — |
| Snap+Wayland, host Mesa/libgbm drifted from core22 baseline (any vendor, any Ubuntu ≥ 24.04) | No | Yes |
| Snap+Wayland, Ubuntu 24.04+ host + core22 snap runtime mismatch (libdbus, libva, pixbuf) | Partially | Likely yes |
| Snap+X11 users with drifted Mesa | No (guard doesn't fire) | Yes |
The "~95%" estimate in §2/§7/§9 was derived from peer-app reports, not from SP field data. The two reports together are evidence that the tail is larger than assumed on any Ubuntu ≥ 24.04 host whose Mesa/libgbm has drifted from the core22 baseline — vendor (Intel/AMD) and GPU generation are not the discriminator.
Section 8 recommendation — stands. X11 widening is still the right primary fix because it preserves HW accel for everyone it rescues. This report doesn't invalidate the primary; it validates the need for layered defense (§12–15, PR #7273).
Section 13.1§1 --disable-gpu correctness prediction — supported.
The log shows the gbm/dri_gbm.so load attempt fires regardless of
ozone platform. A --disable-gpu (+--disable-software-rasterizer)
path would skip that load entirely. This report strengthens the case
for appending --disable-software-rasterizer in #7273 (§13.4 item 1).
PR #7273 value — upgraded from "tail defense" to "load-bearing coverage for the Ubuntu 24.04+ / drifted-Mesa tail." Without #7273, users in this population currently need the manual CLI flag as a permanent workaround.
core24 + gpu-2404 urgency — upgraded. Ubuntu 24.04 is now 1
year released (LTS) and 25.10 is shipping with the same core22 snap
mismatch pattern (n=2 reports, one on each release). Users on 24.04+ +
a core22-runtime snap will continue to accumulate host/snap mismatches
(dbus, libva, Mesa, pixbuf). Recommend moving the migration from
"18.3 / 19.0" to explicitly 18.3 and tracking it as a scoped task,
not a long-term aspiration.
appendSwitchDerEchteKoschi states superproductivity --ozone-platform=x11 launches
successfully (n=1; nekufa's CLI attempt used the --ozon-platform
typo and so doesn't count toward this question either way). Per the
code at electron/start-app.ts:73-77, passing that flag on the CLI
skips the programmatic appendSwitch block (hasOzoneOverride
short-circuit) — Chromium sees the ozone flag only from argv in that
path. In the "plain call" path, Chromium sees the ozone flag from
app.commandLine.appendSwitch (called before app.whenReady()). Per
Electron docs these should be equivalent. Three hypotheses for the
behavioral difference:
enable-speech-dispatcher (line 56) and gtk-version=3 (line 61)
before the ozone switch. Unlikely to interact with ozone, but not
proven.process.argv parsing difference: Chromium's argv parser may
pick up --ozone-platform=x11 before Electron's
app.commandLine.appendSwitch is applied, giving the CLI path a
marginal timing advantage on slow-startup snaps. Unverified.Not worth a code change until reproduced in a controlled environment. Documenting as an open question for future diagnosis.
--disable-software-rasterizer) from
"recommended" to "do before 18.2.5". Two independent logs are
direct evidence that the unmitigated DRI load path is what's
crashing.core24 migration.--ozon-platform typo and request a retest
with --ozone-platform=x11 (correct spelling). This is the only
way to distinguish whether they're in the "X11 widening would
rescue them" bucket or the "X11 path still segfaults" bucket.core24 + gpu-2404 migration for 18.3, not
18.3/19.0 with open end.Ctrl+Shift+X global shortcut failure
on Ubuntu 25.10 (nekufa log). Orthogonal to #7270 — likely a
GNOME 46/47 binding collision in questing. Do not let it pollute
the GPU thread.After the n=2 field data (§16) triggered the revisit condition and PR #7273's 5 commits were cherry-picked onto the working branch, a two-layer verification pass was run:
R2 — --disable-gpu flag pair has a Flatpak+Wayland gap. The
research on content/browser/gpu/ + electron-builder#9452 +
Kong/insomnia#9346 showed that on Chromium 140+/Electron 38+, Ozone
Wayland auto-detection in the browser process can dlopen libEGL,
which transitively triggers the GBM driver load — before the
GPU-process gate that --disable-gpu actually governs. On Snap the
existing X11 widening block at start-app.ts:80-100 already fires on
process.env.SNAP and closes this, but Flatpak users get neither the
X11 widening (the block requires SNAP) nor full coverage from the
flag pair alone. Fix (shipped): append
--ozone-platform=x11 inside the if (gpuDecision.disableGpu) branch.
Redundant on Snap (last-flag-wins), load-bearing on Flatpak.
R1 — Content-based marker with staleness bound + version gating.
The research on Mesa's alternate-GBM-backend discovery in
src/gbm/main/backend.c (commit 21ce1ca8) plus the pre-Mesa-24.3
libgbm ABI instability reports (NixOS discourse 61015, Canonical
mesa-core22/mesa-2404 testing thread) confirmed the root cause is
ABI mismatch between core22 Mesa 23.2.1 and host Mesa 24.x/25.x,
not simply "Mesa version drift." The corollary for the marker: a
systemd-SIGKILL mid-boot or a post-upgrade residue from a different
Electron version should not force GPU-disabled forever. Fix
(shipped): write JSON { ts, electronVersion } and ignore markers
older than 5 min or from a different Electron version. Drops two
false-negative classes without adding dependencies.
Codex W-C2 — Silent unlink failure left the app stuck in recovery
mode. markGpuStartupSuccess's catch {} swallowed every error
including EACCES/EROFS/NFS-quirks. A non-ENOENT failure meant the
marker stayed, next launch re-entered recovery, and nothing in the
log explained it. Fix (shipped): log non-ENOENT errors at warn.
Same pattern applied to the legacy-marker cleanup and the marker
write.
Architecture S1+S2 — Naming and precondition documentation.
markStartupSuccess at a call site in main-window.ts gives no clue
it's gated on a module-level state set by another file's function.
Fix (shipped): renamed to markGpuStartupSuccess and added a
JSDoc line spelling out the precondition (must follow
evaluateGpuStartupGuard in the same process; no-op otherwise).
Codex W-C1 vs. Research R2 + six Claude agents — --disable-gpu +
--disable-software-rasterizer pair. Codex flagged that pairing the
flags "removes Chromium's software fallback on the very launch that
is supposed to recover" and could leave users with a blank window.
Three independent sources (Chromium discuss thread, CEF forum #11953,
OpenFin docs) corroborated by electron/electron#17180/#20702/#28164
confirm the pair is what Chromium's own GPU integration tests treat as
"no GPU process" — DisplayCompositor mode still renders 2D in the
browser process without spawning a GPU child. The project's §13.1 §1
already predicted this correctness. Decision: keep the pair.
Verify with a live SP_DISABLE_GPU=1 npm start once before a formal
release; if Codex turns out to be right for this specific Electron
build, drop --disable-software-rasterizer and the guard degrades
gracefully to SwiftShader.
Architecture vs. Simplicity — module-level markerPath state.
Architecture review said "defensible exception, document the
precondition." Simplicity review said "return markerPath in the
decision object and pass it to markGpuStartupSuccess(markerPath) to
make both functions pure." Git history shows this was already
refactored away once (810f6bffa6 refactor(electron): eliminate GPU guard module-level state) and walked back in the current form. The
alternative threads a handle through createWindow across three
files for no runtime benefit. Decision: module state stays;
precondition now documented on the JSDoc (S2 applied).
IPC.APP_READY is the right clearing signal. Read
startup.service.ts:136 (fires after Angular DI + translations),
preload.ts:168 (_send('APP_READY')), and
main-window.ts:279-286 (ipcMain handler). ready-to-show would
trade bounded false-negatives for unbounded false-positives
(blank/broken renderers paint a first frame and would clear the
marker on broken boots). §13.1 #2 is correct.electron-builder.yaml:77-106. Migration to base: core24 +
gpu-2404 + gnome-46-2404 is medium cost (~3–5 engineering days),
blocked partially by electron-builder#8548 (the generator is still
core22-shaped) and would require moving to a hand-written
snap/snapcraft.yaml with explicit content plugs and a
gpu-2404-wrapper command-chain. No published Electron-app success
story found — SP would be an early adopter. Scoped out of PR
#7273; tracked as 18.3 target.gpu-2404-wrapper, Firefox snap. Prevailing
patterns are manual --ozone-platform=x11, proactive env-sniff
(Obsidian), edge-channel rebuilds (Firefox), or do-nothing
(Canonical). PR #7273's reactive self-healing fallback appears
novel in this ecosystem — not merely defense-in-depth but the
first instance of the pattern for an Electron snap.fs.accessSync(/usr/lib/x86_64-linux-gnu/dri/{driver}_dri.so) and
compare libgbm.so.1 major versions between snap and host; on
mismatch, apply the fallback bundle on the first launch without
waiting for a crash. ~2–3 hours. Separate PR — benefits from more
field data first.app.on('child-process-gone') forensic listener: logging-only
complement to the marker, useful for post-incident triage. Low
priority.$SNAP_USER_COMMON write access
already owns SP data); would need O_NOFOLLOW + lstatSync guard.
Defensive polish, not a shipping blocker.SP_DISABLE_GPU=1 npm start to verify the flag pair actually
renders (Codex W-C1 test).--disable-gpu + --disable-software-rasterizer pair
correctness: Medium-High — three sources agree, Codex dissents,
needs one live test to close.Two post-v18.2.5 reports on issue #7270 escalated §16's open question from "probably a reporting artifact" to "the defining signal." Summary of the new evidence:
DerEchteKoschi, v18.2.5 second launch (2026-04-20, Ubuntu 24.04 + Intel Arrow Lake):
Snap: forcing X11 (wayland=true, ...)
in the log).Disabling GPU acceleration (reason: crash-recovery) line).exit_code=139.IPC.APP_READY fired and
markGpuStartupSuccess() cleared the marker on the first launch
despite the user never seeing a window. The second launch therefore
enters with no marker and no recovery.superproductivity --ozone-platform=x11 (CLI flag) continues to work
on the same machine, unchanged.nekufa, v18.2.5 (2026-04-21, Ubuntu 25.10 + AMD Raphael):
--disable-gpu became optional in v18.2.5 (PR #7273 shipped).--ozone-platform=x11 (CLI) is still required to get a
visible window.The §16 "Open question — CLI flag vs programmatic appendSwitch" is
now settled by independent reports on two different machines, two
vendors, two Ubuntu releases. The programmatic
app.commandLine.appendSwitch('ozone-platform','x11') inside the main
process is not equivalent to the CLI flag for this class of
failure. Hypothesis 1 ("User reporting artifact") from §16 is rejected.
The evidence supports hypothesis 3 in a generalized form:
Chromium's Ozone init in the browser process begins dlopen'ing the libEGL/libgbm/DRI stack before
app.commandLine.appendSwitchtakes effect — the switch is applied to the in-memory CommandLine singleton, but some Ozone subsystems read their backend from what amounts to the argv-seeded initial CommandLine, not the post-modify view. When the flag comes in via argv it is visible to every subsystem from process startup; when it arrives viaappendSwitch, the earliest Ozone probes have already run against the auto-detected Wayland path.
The exact Chromium source path for this divergence is not yet
pin-cited; the empirical signature (appendSwitch log present, window
absent, CLI flag works) is reproducible on n=2 machines and aligns
with the observation in
electron-builder#9452
that --ozone-platform=x11 works as a CLI flag across affected users.
The §13/§15/§17 reactive GPU-disable guard (PR #7273) has a design
gap surfaced only by field data: its clear signal (IPC.APP_READY)
fires on Angular bootstrap, not on a user-visible window. On the
affected machines Angular does bootstrap — idle tracking, style
probing, and plugin init all run — but Chromium's compositor path
never produces a displayed frame. The marker clears, the guard
exits recovery on next launch, and the user is still looking at an
invisible window. §13.1 #2's framing ("ready-to-show would trade
bounded false-negatives for unbounded false-positives") was correct
in principle but missed this specific shape: APP_READY has the
opposite false-negative problem (clears on broken-but-frontend-alive
renderers).
This is not a reason to revert PR #7273 — it still rescues the
Flatpak case, and the Snap flag bundle inside it
(--disable-gpu --disable-software-rasterizer --ozone-platform=x11)
is still the correct last-resort ladder. But it does mean the guard
cannot carry the Snap+Wayland tail on its own.
Mechanism: rename the main Electron binary to superproductivity-bin
during the build (tools/afterPack.js) and install a shell wrapper
at the original name (build/linux/snap-wrapper.sh). The wrapper
decides whether to inject --ozone-platform=x11 into argv based on
the runtime environment:
if [ -n "$IS_OUR_SNAP" ] && { [ "$XDG_SESSION_TYPE" = "wayland" ] || [ -n "$WAYLAND_DISPLAY" ]; }; then
exec "$BIN" --ozone-platform=x11 "$@"
fi
exec "$BIN" "$@"
Four properties:
process.argv[1] before
Electron or Chromium starts. No ambiguity about when Ozone reads
the CommandLine.$SNAP_NAME = "superproductivity", not just $SNAP set — this
protects .deb/.rpm installs launched via xdg-open from a
sibling snap (where $SNAP leaks into the child env). X11
sessions pass through untouched. Non-Snap Linux targets pass
through untouched.--ozone-platform=..., the wrapper passes through and lets the
user's choice win. The scan stops at -- so positional args that
resemble flags aren't misread.app.relaunch(). The IPC.RELAUNCH handler explicitly
points execPath at the sibling wrapper; otherwise Electron would
default to process.execPath (the renamed ELF) and a relaunched
instance would lose the flag injection on Snap+Wayland. See
electron/ipc-handlers/app-control.ts.Peer precedent: snapcrafters/signal-desktop and
snapcrafters/mattermost-desktop use the same shape
(command-chain script in snap/local/usr/bin/). SP's wrapper is
equivalent in mechanism but lives in afterPack rather than a
hand-written snapcraft.yaml — electron-builder regenerates
snapcraft.yaml each build, so the wrapper-via-rename route is
more robust than hooking the generated yaml.
linux.executableArgselectron-builder supports executableArgs for linux deb/rpm targets
via the .desktop Exec= line, but
#4587
confirms snap.executableArgs is silently ignored (see §6). Even if
it worked, it would bake the flag in unconditionally for all sessions —
X11 users would get the flag too, which is wasteful. The shell
wrapper is runtime-conditional and target-agnostic.
The programmatic guard
(app.commandLine.appendSwitch('ozone-platform', 'x11')) still runs
redundantly on the Snap+Wayland path once the wrapper is in place.
Chromium's argv parser is last-wins for duplicate --ozone-platform,
so the combination is harmless. Keeping it provides defense-in-depth
for two classes of future regression:
The cost is ~30 lines of start-app.ts. Keep.
Similarly, the reactive GPU-disable guard stays. It covers Flatpak
(no $SNAP), AppImage on hosts with broken GL drivers, and future
Chromium/Mesa regressions that affect users the wrapper doesn't
redirect. The guard's false-clear flaw documented above is a
known-bounded cost.
| Risk | Mitigation |
|---|---|
Snap refresh / update path: snapd expects a specific command: target in snap.yaml. Renaming breaks if snapd verifies ELF magic. | snapd's snap pack / snap run treats the command: entry as a file path; no ELF verification. Confirmed by Signal and Mattermost snaps running the same pattern for years. |
User invoked via /usr/bin/superproductivity symlink (deb/rpm install) resolves $0 to /usr/bin/... and dirname misses superproductivity-bin. | Wrapper calls readlink -f "$0" to resolve through symlinks before deriving BIN_DIR. Available in GNU coreutils and BusyBox — guaranteed on every Linux target. |
| Forces XWayland on Snap users whose Wayland currently works (and who lose fractional scaling, per-monitor HiDPI, native IME). | Accepted trade-off: the Snap runtime is core22 / gnome-42-2204 and cannot reliably support native Wayland on post-core22 Mesa hosts. The Wayland-native experience is migrating to the core24 + gpu-2404 target. |
Chromium duplicate---ozone-platform resolution is not documented as last-wins. | Empirically last-wins in all tested Chromium versions; the programmatic guard redundantly sets the same value, so the duplication is value-identical and the order doesn't matter. Re-verify after Electron bumps. |
| afterPack hook silently fails in CI and no one notices until a user reports. | Hook logs [afterPack] Installed argv wrapper: ... on success. Add a CI smoke assertion: after npm run dist -- -l, fail if superproductivity-bin is not present in the linux appOutDir. |
A future electron-builder version changes afterPack semantics (e.g., fires per-target instead of per-platform) and double-invokes the hook. | Hook is idempotent: it checks for the renamed -bin before renaming and short-circuits if already installed. |
| First-install wrapper permission stripped by snapd squashfs packaging. | fs.chmod(0o755) on both wrapper and renamed binary. snapd preserves the +x bit during squashfs construction. |
| The hypothetical root cause (Chromium Ozone reads argv-seeded CommandLine before appendSwitch) is not source-cited. If the real mechanism is different, the wrapper still works but for a reason we don't fully understand. | The fix is empirically validated by the field data (n≥3 reports, CLI flag works) even without a pinned Chromium source ref. Source trace can follow; it doesn't block shipping. |
Static:
npm run checkFile tools/afterPack.js (N/A for .js — use
npx prettier which already checks clean).sh -n build/linux/snap-wrapper.sh passes.node -e "require('./tools/afterPack.js')" loads the hook.Runtime (pending, before shipping):
npm run dist -- -l snap — confirm the resulting
.tmp/app-builds/linux-unpacked/superproductivity is the shell
wrapper (check first 2 bytes are #!) and
superproductivity-bin is the ELF.ps aux | grep superproductivity shows --ozone-platform=x11
in argv (not just from app.commandLine.appendSwitch).When to retire the wrapper:
$SNAP, and post-migration
Wayland is no longer the crash path).appendSwitch now
reaches the Ozone backend reliably, the wrapper is redundant with
the programmatic guard. Unlikely in the short term — see §18.7 for
the source-level reason this divergence is structural, not a bug.The CLI-vs-appendSwitch divergence was previously listed as an
"open question" with three candidate hypotheses (§16). A source-trace
pass resolved it: the divergence is strict initialization-order,
not async timing or env-var interaction.
Call order (verified against Electron master + Chromium source):
ElectronBrowserMainParts::PreEarlyInitialization()
calls SetOzonePlatformForLinuxIfNeeded(*base::CommandLine::ForCurrentProcess())
and then ui::OzonePlatform::PreEarlyInitialization()
(electron#48301).ui::OzonePlatform::PreEarlyInitialization reads
--ozone-platform from base::CommandLine::ForCurrentProcess(),
resolves the platform name, and memoizes it in the static
g_selected_platform (see
ui/ozone/platform_selection.cc).main.js later, during PostEarlyInitialization().app.commandLine.appendSwitch('ozone-platform', 'x11').
The write succeeds, but the read has already happened and the
value is memoized — nobody reads it again.Consequence: no Electron-main-process code path can affect Ozone platform selection. The argv wrapper is structurally the only fix available from outside the Electron binary.
Rejected alternatives:
ELECTRON_OZONE_PLATFORM_HINT env var — removed as dead code in
Electron 39 (electron#47983).
Does nothing on Electron ≥39.start-app.ts before
require('electron') — the C++ main() has already returned from
PreEarlyInitialization by the time any JS runs. Too late.XDG_SESSION_TYPE=x11 in electron-builder.yaml's
snap.environment: block — would work (this is the officially
documented replacement for ELECTRON_OZONE_PLATFORM_HINT), but
would also fool SP's own IdleTimeHandler, which reads
XDG_SESSION_TYPE to choose an idle-detection method. Forcing it
to x11 would silently break GNOME Wayland idle detection on
affected hosts. The argv wrapper is preferred because it only
touches argv, leaving env vars intact.Residual unknown: the GPU child process inherits its
CommandLine from the parent after user JS has run. A late
appendSwitch in the parent might propagate to the GPU child
even though the parent's Ozone selection is already locked. This
could explain partial-success reports (e.g., nekufa's original
"appendSwitch works, but…"). Not verified from source in this pass.
Does not change the conclusion: the wrapper is structurally correct.
$SNAP_NAME = superproductivity
and Wayland; passthrough branches tested locally (including the
sibling-snap bleed-through scenario).PreEarlyInitialization, before V8 loads
main.js) is accurate: high (~85%) — upgraded from ~40% after
§18.7 source trace. Verified via electron#48301 diff + Chromium
ui/ozone/platform_selection.cc. Residual uncertainty is in GPU
child-process propagation (not the platform-selection path).superproductivity-bin exists in the
linux-unpacked output — guards against silent afterPack
regressions.appendSwitch vs CLI-flag divergence for --ozone-platform
with the §18 reproduction steps. Useful for other Electron
projects even if SP no longer needs it.--ozone-platform=x11 confirmed workingsnap.executableArgs silently ignored for snap builds (why mechanism #3 in Section 6 is unusable){driver}_gbm.so lookup that produces the dri_gbm.so error string; §17.1 R1dri_gbm.so (OpenChrom, ogra reply) — Canonical staff confirmation of gpu-2404 regressiong_selected_platform memoization; §18.7LD_LIBRARY_PATH + libproxy issues; §17.3 R4obsidian.sh — peer app's proactive env-sniff approach (no crash-loop marker); §17.3 R3gpu-2404-wrapper — Canonical's own wrapper lacks crash detection; §17.3 R3content/browser/gpu/fallback.md — documents HARDWARE_VULKAN → HARDWARE_GL → SWIFTSHADER → DISPLAY_COMPOSITOR fallback stack; why --disable-gpu alone doesn't eliminate the GPU process--disable-gpu doesn't suppress the GPU processgetGPUInfo('complete') never settles on broken systemsgetGPUInfo('basic') always reports softwareRendering: falseapp API docs — child-process-gone event, launch-failed reasontoolkit.startup.recent_crashesrecent_crashes auto-safe-mode in debug builds (weak reference; 294260 is the authoritative source)nsAppRunner.cpp (searchfox) — implementation of startup-crash markerlastRunInfo.crashedDuringLaunch patternFLATPAK_ID env and /.flatpak-info inside sandboxsnapctl get enable-gpu toggle)obsidian.sh — Flatpak wrapper with compositor+GPU probepackage.json)electron/start-app.ts — existing Snap guard widened by PR #7264