Back to Llamafile

Updates

README_0.10.0.md

0.10.14.7 KB
Original Source

llamafile 0.10.0 has been a work in progress for a while. Now that we are merging its code with main, we want to leave this document available to document both the reasons and the process behind it.

Everything started with the goal of replicating a cosmopolitan llama.cpp build from scratch, so we could get the best of two worlds. On the one hand, some of the characteristic features of llamafiles, that is portability across different systems and architectures and the possibility of bundling model weights within llamafile executables. On the other hand, the features and the model support made available by the most recent versions of llama.cpp.

We realise that what makes a llamafile is not just an APE executable, so before merging this code with main we wanted to bring back other of its features into the new build. We believe there's still work to do, but now that the main features are there we can let you play with a more modern llamafile and directly ask you what you'd like to see the most in its future versions.

Older builds (and llamafiles built on them) will still be available, check out our releases and our Example Llamafiles page.

Updates

Here are the features we brought into our development branch before merging with main. Most of them were brought in from previous versions of llamafile, and all credit goes to their original authors <3. Some (including new build for easier sync with upstream llama.cpp, mtmd API support, intregration tests, skill docs, HTTP chat client for combined mode) are new.

20260317

20260219

20260202

  • Added zipalign as a GitHub submodule (so we can get the latest updates from Justine’s repo)
  • Brought back cuda support on Linux
  • Added support for the mtmd API in the TUI (so you can now directly access modern multimodal models from the llamafile chat)
  • Tested new llamafiles running models trained for tool calling (e.g. Qwen3, gpt-oss-20b) and multimodal models such as llava 1.6, Qwen3-VL and Ministral 3

20251218

  • added Metal support: GPU on MacOS ARM64 is supported by compiling a small module using the Xcode Command Line Tools, which need to be installed. Check our docs at our support docs for more info.
  • Metal works both in llamafile (called either as TUI or with the --server flag) and in llama-server.

20251215

  • added TUI support: you can now directly chat with the chosen LLM from the terminal, or run the llama.cpp server using the --server parameter
  • simplified build by removing all tools/deps except those required by the new llamafile code (they will be added back in as soon as we reintroduce functionalities)

20251209

  • added BUILD.mk so we can do without cmake
  • build works with cosmocc 4.0.2
  • dependencies are all taken from llama.cpp/vendor directory
  • building now works both on linux and mac

20251208

  • updated to llama.cpp commit dbc15a79672e72e0b9c1832adddf3334f5c9229c

20251124

  • first version, relying on cmake for the build

What's missing

  • GPU support for Windows (and for whisperfile)
  • stable diffusion (the code is there, but has not been ported to the new build format yet)
  • some features triggered by extra arguments in CLI mode
  • pledge() SECCOMP sandboxing
  • localscore
  • llamafiler for embeddings (we rolled back to llama.cpp's embeddings endpoint instead)
  • ... please help us track if there's anything missing you wish to see in the new build!