plugins/plugin-local-inference/MLX_IN_PROCESS_PLAN.md
Status (2026-05-19): Decided — watch upstream node-mlx; no in-process
MLX runtime today. mlxBackendEligible() returns eligible: false
with a reason citing this document, which is the runtime behavior on
develop. No further work in this repo is planned until either (a) a
usable node-mlx Node binding stabilizes with mlx_lm text-generation
coverage, or (b) somebody picks up the libelizainference MLX backend
path described below.
Local inference must stay in-process: no subprocesses, no TCP. The
previous mlx-server.ts that spawned python -m mlx_lm.server and
spoke HTTP to it has been deleted outright — the file is gone, not
stubbed. No production callsite ever invoked it.
plugins/plugin-mlx/ is an independent plugin that targets a
user-managed external mlx_lm.server. It's unrelated to this
in-process surface.
MLX is Apple's Python-first ML framework. There is no public C/C++ inference API we can wrap directly. To run MLX inference inside the agent process we'd need one of:
libelizainference MLX backend (preferred when picked up)Add an mlx target under plugins/plugin-local-inference/native/configs/gpu/.
Link against mlx-c (the upstream C API for the MLX framework) and
implement the streaming/sampling glue against eliza_token_trie_sampler.h.
Expose the same FFI symbols the llama.cpp backend exposes, so
FfiStreamingRunner can drive it without a code change.
Constraints:
ELIZA_LOCAL_ALLOW_STOCK_KV=1.ELIZA_LOCAL_MLX=1 / ELIZA_LOCAL_BACKEND=mlx-server.Effort: 1–2 weeks of native + JS work.
Add MLXSwift as a SwiftPM dep in the Capacitor host. Wire a new
ComputerUse method (e.g. mlxGenerate) analogous to
foundationModelGenerate. Build an adapter under
plugins/plugin-local-inference/src/backends/ that delegates through
that bridge. Stays in-process (Capacitor is not a subprocess — it's
the same app process).
Effort: ~1 week of Swift + JS work. iOS/macOS only; useless on Linux/Windows. Only consider if iOS/macOS MLX is a product priority.
node-mlx / mlx-c Node binding (passive)Watch upstream. If a usable Node binding lands with mlx_lm
text-generation coverage (sampling loop, KV cache, tokenizer glue),
wire it as a third option. Don't depend on this — it's external.
Verified absent today: rg -E "(mlx-c|node-mlx|mlx-swift|mlx-js)" --include=package.json → no hits across the monorepo.
Path 3 — wait for upstream. Rationale: MLX is not a kernel-aware path (it can never satisfy §3's TurboQuant/QJL/PolarQuant contract), so the marginal value of building a custom integration is low. The llama.cpp Metal backend already covers Apple Silicon for the verified-kernel publish path. MLX-in-process is a "nice to have" for unverified text-only generation, not a blocker.
If product priorities change (e.g. an iOS/macOS app specifically needs MLX models for some reason), Path 2 is the most direct unblock. Path 1 is the right architectural fit but is the largest effort.
ELIZA_LOCAL_MLX=1 / ELIZA_LOCAL_BACKEND=mlx-server
and outside the verified-kernel contract.mlx-server.ts deleted (commit 3f38613fd8b).mlxBackendEligible() lives in… well, nowhere now — it was inlined
into the diagnostic surface and the deletion took its callers with it.
If a future MLX integration lands, it'll reintroduce eligibility
reporting under its own naming.ELIZA_LOCAL_MLX=1 / ELIZA_LOCAL_BACKEND=mlx-server env vars are
recognized by the engine config but have no effect — there's no MLX
backend to activate. Set values are silently ignored. (If we want
these to throw instead of being silent no-ops, that's a 5-line
change in engine.ts's env parsing.)If you're hitting this doc because you want MLX inference, pick Path 1
or Path 2 above based on your platform constraint. The integration
seam (FfiStreamingRunner consuming LlmStreamingBinding) is ready;
plugging a new backend into it is mechanical once the C/Swift side
exists.