docs/example_llamafiles.md
We provide example llamafiles for a variety of models, so you can easily try out llamafile with different kinds of LLMs. The following table lists llamafiles bundled with the latest available version of the server (v0.10.*). The smaller the file is, the more easily it will run on your computer, even if no GPU is present (as a reference, Qwen3.5 0.8B Q8 generates text on a Raspberry Pi5 at ~8 tokens/sec).
If you prefer the "classic llamafile experience" from previous versions (0.9.*), here's a list of llamafiles bundled with the older server executable.
As described in the Getting Started section, macOS, Linux, and BSD users will need to use the "chmod" command to grant execution permissions to the file before running these llamafiles for the first time.
Unfortunately, Windows users cannot make use of many of these example llamafiles because Windows has a maximum executable file size of 4GB, and all of these examples exceed that size. (The LLaVA llamafile works on Windows because it is 30MB shy of the size limit.) But don't lose heart: llamafile allows you to use external weights; this is described in the Getting Started section.
Having trouble? See the Troubleshooting page.
The example llamafiles provided above should not be interpreted as endorsements or recommendations of specific models, licenses, or data sets on the part of Mozilla.