Back to Llamafile

Example Llamafiles

docs/example_llamafiles.md

0.10.113.1 KB
Original Source

We provide example llamafiles for a variety of models, so you can easily try out llamafile with different kinds of LLMs. The following table lists llamafiles bundled with the latest available version of the server (v0.10.*). The smaller the file is, the more easily it will run on your computer, even if no GPU is present (as a reference, Qwen3.5 0.8B Q8 generates text on a Raspberry Pi5 at ~8 tokens/sec).

ModelSizeLicensellamafile
Qwen3.5 0.8B Q8_01.6 GBApache 2.0Qwen3.5-0.8B-Q8_0.llamafile
Qwen3.5 2B Q8_03.2 GBApache 2.0Qwen3.5-2B-Q8_0.llamafile
Ministral 3 3B Instruct 2512 Q4_K_M3.4 GBApache 2.0Ministral-3-3B-Instruct-2512-Q4_K_M.llamafile
Qwen3.5 4B Q5_K_S4.1 GBApache 2.0Qwen3.5-4B-Q5_K_S.llamafile
llava v1.6 mistral 7b Q4_K_M5.3 GBApache 2.0llava-v1.6-mistral-7b-Q4_K_M.llamafile
Apertus 8B Instruct 25095.9 GBApache 2.0Apertus-8B-Instruct-2509.llamafile
Qwen3.5 9B Q5_K_S7.4 GBApache 2.0Qwen3.5-9B-Q5_K_S.llamafile
Ministral 3 3B Instruct 2512 BF167.8 GBApache 2.0Ministral-3-3B-Instruct-2512-BF16.llamafile
llava v1.6 mistral 7b Q8_08.4 GBApache 2.0llava-v1.6-mistral-7b-Q8_0.llamafile
gpt-oss 20b mxfp412 GBApache 2.0gpt-oss-20b-mxfp4.llamafile
gpt-oss 20b Q5_K_S12 GBApache 2.0gpt-oss-20b-Q5_K_S.llamafile
LFM2 24B A2B Q5_K_M16 GBlfm1.0LFM2-24B-A2B-Q5_K_M.llamafile
Qwen3.5 27B Q5_K_S19 GBApache 2.0Qwen3.5-27B-Q5_K_S.llamafile

Legacy llamafiles

If you prefer the "classic llamafile experience" from previous versions (0.9.*), here's a list of llamafiles bundled with the older server executable.

ModelSizeLicensellamafileother quants
LLaMA 3.2 1B Instruct1.11 GBLLaMA 3.2Llama-3.2-1B-Instruct-Q6_K.llamafileSee HF repo
LLaMA 3.2 3B Instruct2.62 GBLLaMA 3.2Llama-3.2-3B-Instruct.Q6_K.llamafileSee HF repo
LLaMA 3.1 8B Instruct5.23 GBLLaMA 3.1Llama-3.1-8B-Instruct.Q4_K_M.llamafileSee HF repo
Gemma 3 1B Instruct1.32 GBGemma 3gemma-3-1b-it.Q6_K.llamafileSee HF repo
Gemma 3 4B Instruct3.50 GBGemma 3gemma-3-4b-it.Q6_K.llamafileSee HF repo
Gemma 3 12B Instruct7.61 GBGemma 3gemma-3-12b-it.Q4_K_M.llamafileSee HF repo
QwQ 32B7.61 GBApache 2.0Qwen_QwQ-32B-Q4_K_M.llamafileSee HF repo
R1 Distill Qwen 14B9.30 GBMITDeepSeek-R1-Distill-Qwen-14B-Q4_K_MSee HF repo
R1 Distill Llama 8B5.23 GBMITDeepSeek-R1-Distill-Llama-8B-Q4_K_MSee HF repo
LLaVA 1.53.97 GBLLaMA 2llava-v1.5-7b-q4.llamafileSee HF repo
Mistral-7B-Instruct v0.34.42 GBApache 2.0mistral-7b-instruct-v0.3.Q4_0.llamafileSee HF repo
Granite 3.2 8B Instruct5.25 GBApache 2.0granite-3.2-8b-instruct-Q4_K_M.llamafileSee HF repo
Phi-3-mini-4k-instruct7.67 GBApache 2.0Phi-3-mini-4k-instruct.F16.llamafileSee HF repo
Mixtral-8x7B-Instruct30.03 GBApache 2.0mixtral-8x7b-instruct-v0.1.Q5_K_M.llamafileSee HF repo
OLMo-7B5.68 GBApache 2.0OLMo-7B-0424.Q6_K.llamafileSee HF repo
Text Embedding Models
E5-Mistral-7B-Instruct5.16 GBMITe5-mistral-7b-instruct-Q5_K_M.llamafileSee HF repo
mxbai-embed-large-v10.7 GBApache 2.0mxbai-embed-large-v1-f16.llamafileSee HF Repo

As described in the Getting Started section, macOS, Linux, and BSD users will need to use the "chmod" command to grant execution permissions to the file before running these llamafiles for the first time.

Unfortunately, Windows users cannot make use of many of these example llamafiles because Windows has a maximum executable file size of 4GB, and all of these examples exceed that size. (The LLaVA llamafile works on Windows because it is 30MB shy of the size limit.) But don't lose heart: llamafile allows you to use external weights; this is described in the Getting Started section.

Having trouble? See the Troubleshooting page.

A note about models

The example llamafiles provided above should not be interpreted as endorsements or recommendations of specific models, licenses, or data sets on the part of Mozilla.