docs/skills/llamafile/building.md
Complete guide to the llamafile build system and toolchain.
Llamafile uses Cosmopolitan C/C++ compiler (cosmocc) to create Actually Portable Executables (APE). The toolchain
is downloaded automatically when make setup is called but can be fetched manually too with:
build/download-cosmocc.sh .cosmocc/4.0.2 4.0.2 85b8c37a406d862e656ad4ec14be9f6ce474c1b436b9615e91a55208aced3f44
Arguments:
.cosmocc/4.0.2)4.0.2)Three main dependencies are git submodules:
Before first build, initialize and configure dependencies:
make setup
This command:
<submodule>.patches/ directoriesImportant: Run make setup after:
.cosmocc/4.0.2/bin/make -j $(nproc) # or: llamafile:build
The -j $(nproc) flag enables parallel compilation (adjust based on CPU cores).
Adapt nproc to the OS where you are building, (e.g. sysctl -n hw.physicalcpu on mac)
Critical: Always use .cosmocc/4.0.2/bin/make, not system make. The cosmocc toolchain includes its own make with Cosmopolitan-specific behavior.
Remove build outputs:
.cosmocc/4.0.2/bin/make clean # or: llamafile:clean
This removes the o/ directory containing all compiled objects and binaries.
sudo .cosmocc/4.0.2/bin/make install PREFIX=/usr/local
Installs binaries and man pages.
build/
├── config.mk # Compiler, flags, toolchain version
├── rules.mk # Generic build patterns
├── download-cosmocc.sh # Toolchain download script
├── llamafile-convert # Model conversion script
└── llamafile-upgrade-engine # Engine update script
Defines:
Generic patterns for:
.c → .o compilation.a archive creation.zip.o asset bundling (embed files into executables)Each major component has a BUILD.mk file defining:
The top-level Makefile includes all BUILD.mk files to orchestrate the build.
All outputs go to o/$(MODE)/:
o/
└── $(MODE)/
├── llamafile/
│ ├── llamafile # Main executable
│ ├── *.o # Object files
│ └── *.a # Static libraries
├── llama.cpp/
├── whisper.cpp/
├── stable-diffusion.cpp/
└── third_party/
└── zipalign/
└── zipalign # Asset bundling tool
The build system creates universal binaries supporting:
Both architectures are compiled simultaneously and combined into single APE binaries.
Binaries detect CPU features at runtime and select optimal code paths:
Files can be embedded into executables using the .zip.o pattern:
o/$(MODE)/path/to/asset.zip.o: path/to/asset
The zipalign tool handles bundling. Embedded assets are accessible at runtime through the Cosmopolitan virtual filesystem.
GPU acceleration (CUDA/ROCm) uses dynamic loading:
Ensure using the cosmocc make:
# Wrong
make -j $(nproc)
# Correct
.cosmocc/4.0.2/bin/make -j $(nproc)
# Or use the command directly:
# llamafile:build
If build fails with missing files in llama.cpp/whisper.cpp/stable-diffusion.cpp:
make setup
After significant changes, clean and rebuild:
.cosmocc/4.0.2/bin/make clean # or: llamafile:clean
.cosmocc/4.0.2/bin/make -j $(nproc) # or: llamafile:build
If download-cosmocc.sh fails verification, check: