Back to Eliza

Ollama Modelfiles for the Eliza-1 series

packages/training/cloud/ollama/README.md

2.0.12.4 KB
Original Source

Ollama Modelfiles for the Eliza-1 series

Three Modelfiles, one per published size, all pulling GGUF artifacts from the public consolidated elizaos/eliza-1 bundle repo on HuggingFace.

FileSizeTarget GPUResident VRAM
Modelfile.eliza-1-2b-q4_k_m2B16 GB consumer (RTX 5080 / 4080)~3 GB
Modelfile.eliza-1-9b-q4_k_m9B24-48 GB workstation (RTX 4090 / 5090)~8-10 GB
Modelfile.eliza-1-27b-q4_k_m27B32+ GB (RTX 5090 / RTX Pro 5000)~22 GB

Build

Each Modelfile ships with the canonical Eliza system prompt, the ChatML stop tokens used by Eliza-1, and per-size context / sampling defaults.

bash
# 2B — local consumer GPU
ollama create eliza-1-2b -f Modelfile.eliza-1-2b-q4_k_m

# 9B — workstation
ollama create eliza-1-9b -f Modelfile.eliza-1-9b-q4_k_m

# 27B — high-VRAM card or datacenter
ollama create eliza-1-27b -f Modelfile.eliza-1-27b-q4_k_m

Ollama pulls the GGUF directly from HuggingFace on first build — no intermediate ollama pull needed. Subsequent builds reuse the cached blob.

Run

bash
ollama run eliza-1-9b

Or expose to Eliza:

bash
# .env
OLLAMA_API_ENDPOINT=http://localhost:11434/api
OLLAMA_LARGE_MODEL=eliza-1-9b
OLLAMA_SMALL_MODEL=eliza-1-2b

When the @elizaos/plugin-ollama plugin is enabled, Eliza sends TEXT_LARGE requests to the model named in OLLAMA_LARGE_MODEL and TEXT_SMALL requests to the model named in OLLAMA_SMALL_MODEL.

Updating to a newer release

When a new fine-tune ships (e.g. eliza-1.1-9b), update the FROM line to the new HF repo and rebuild:

bash
ollama create eliza-1-9b -f Modelfile.eliza-1-9b-q4_k_m   # picks up new FROM

Ollama replaces the local model in place; agents already pointing at eliza-1-9b will use the new weights on the next request without any config change.

Why GGUF + Ollama (and not vLLM) for local

GGUF + llama.cpp is the canonical local-inference path:

  • Cross-platform (CUDA, Metal, CPU).
  • Runs on consumer GPUs without any FP8/PolarQuant kernel availability worries.
  • Ollama exposes an OpenAI-compatible API on :11434 that the @elizaos/plugin-ollama plugin already consumes.

For datacenter / multi-GPU serving, see the sibling vast-pyworker manifests at ../vast-pyworker/ — those use the vLLM + PolarQuant / fp8 path defined in training/scripts/inference/serve_vllm.py.