packages/training/cloud/vast-pyworker/README.md
One JSON manifest per published Eliza-1 size. Each describes the Vast.ai
worker shape (image, GPU class, disk, ports) and the canonical
vllm serve argv (mirroring the per-target defaults from
training/scripts/inference/serve_vllm.py).
| Manifest | GPU target (in serve_vllm.py) | Approx. disk | Recommended cards |
|---|---|---|---|
eliza-1-2b.json | single | 30 GB | RTX 4090 / 5090 / RTX Pro 5000 / L40S |
eliza-1-9b.json | h100-2x | 60 GB | 2x H100 PCIe / H100 SXM / H200 SXM / H200 NVL |
eliza-1-27b.json | h200-2x | 120 GB | 2x H200 SXM / NVL, 2x B200 SXM |
The manifests are deliberately parallel to serve_vllm.py's
GPU_TARGETS table so the same canonical args ship in both paths.
When the underlying registry / vLLM args drift, regenerate by hand from
training/scripts/inference/serve_vllm.py --registry-key <key> --gpu-target <target> --dry-run.
Worker contract — read before launching. The upstream pyworker at
eliza/cloud/services/vast-pyworker/{worker.py,onstart.sh}is hardcoded to a llama-server (GGUF) backend on port 8080. These manifests serve vLLM on port 8000. The siblingonstart-vllm.shin this directory bridges that gap: it reads the per-size manifest'svllm_args+vast_template_env, launchesvllm serve, tails the log forApplication startup complete, and thenexecs the upstreamworker.pywithLLAMA_SERVER_PORTandLLAMA_SERVER_LOGpointed at the vLLM endpoint. The pyworker code stays unchanged — only the launch-and-tail step is swapped.Two deploy paths:
- vLLM (default for this directory) — point the Vast template at
cloud/vast-pyworker/onstart-vllm.shand setvast_template_env.{MODEL_REPO,MODEL_ALIAS,VLLM_PORT,VLLM_REGISTRY_KEY}. Each manifest declares the script via theonstart_scriptfield.- GGUF/llama-server (legacy) — point a Vast template at
MODEL_REPO=elizaos/eliza-1+MODEL_FILE=bundles/<size>/text/...plus the upstreamonstart.sh. llama.cpp resolves the subpath inside the consolidated bundle repo. That matches the pyworker contract without any wrapper, and is whateliza/cloud/scripts/vast/upsert-template.tsdoes for the 27B NEO-CODE deploy.
There is no vastai launch instance subcommand in the official CLI as of
this writing. The actual flow is:
onstart.sh and embeds the per-size env. The existing
eliza/cloud/scripts/vast/upsert-template.ts script handles this for
the 27B GGUF/llama-server path; clone it per size to upsert the
vLLM-based templates.vastai create instance <id> --template_hash <hash> --label eliza-1-<size>. Vast picks an offer that matches the template's
search_params (mirrored verbatim into each manifest above).For autoscaled deployment (recommended for prod), provision a Vast Serverless endpoint that owns the template:
# 1. Upsert the GGUF template (once per size).
VASTAI_API_KEY=vastai_… \
VAST_TEMPLATE_NAME=eliza-cloud-eliza-1-9b-gguf \
PYWORKER_REPO=https://github.com/elizaOS/cloud.git \
PYWORKER_REF=<commit-sha> \
MODEL_REPO=elizaos/eliza-1 \
MODEL_FILE=bundles/9b/text/eliza-1-9b-128k.gguf \
MODEL_ALIAS=vast/eliza-1-9b \
bun eliza/cloud/scripts/vast/upsert-template.ts
# → prints VAST_TEMPLATE_ID=<n>
# 2. Provision the autoscaled endpoint.
VASTAI_API_KEY=vastai_… VAST_TEMPLATE_ID=<n> \
bun eliza/cloud/scripts/vast/provision-endpoint.ts
The eliza/cloud/services/vast-pyworker/ pyworker code itself is
unchanged — it tails its LLAMA_SERVER_LOG for readiness regardless
of the underlying engine. The sibling onstart-vllm.sh in this
directory drives the vLLM path: it pulls the manifest, launches
vllm serve with the manifest's vllm_args, tails for
Application startup complete, and then execs the pyworker with
LLAMA_SERVER_PORT=$VLLM_PORT and LLAMA_SERVER_LOG=/var/log/vllm-server.log
so the heartbeat handlers wire up to the vLLM endpoint. The canonical
elizaos/eliza-1 GGUF bundle repo uses the upstream llama-server onstart.sh
path instead.
onstart-vllm.sh quick referenceRequired env (sourced from the manifest's vast_template_env block):
| Var | Meaning |
|---|---|
MODEL_REPO | HuggingFace repo id for a vLLM-compatible checkpoint. |
MODEL_ALIAS | Display alias the pyworker reports (e.g. vast/eliza-1-9b). |
VLLM_PORT | Port vLLM binds to (must match the manifest port). |
VLLM_REGISTRY_KEY | Training-side key, used for log lines + manifest auto-resolve. |
Optional env: ELIZA_VAST_MANIFEST (override manifest path), HUGGING_FACE_HUB_TOKEN,
PYWORKER_REPO, PYWORKER_REF, MODEL_DIR, PYWORKER_DIR, VLLM_LOG,
VLLM_READY_TIMEOUT. See the script header for full defaults.
The script is idempotent (re-runs detect a live /health on $VLLM_PORT
and skip relaunch) and writes to /var/log/vllm-server.log so Vast template
log tailing works.
Pick an offer manually:
# Find a 2x H200 box for the 27B GGUF path.
vastai search offers 'gpu_name=H200_SXM num_gpus>=2 reliability>=0.95 verified=true' \
--order 'dph+' --limit 5
# Launch using one of the offer IDs.
vastai create instance <offer-id> \
--image ghcr.io/ggml-org/llama.cpp:server-cuda \
--disk 120 \
--label eliza-1-27b \
--env '-p 8080:8080 -e MODEL_REPO=elizaos/eliza-1 -e MODEL_FILE=bundles/27b/text/eliza-1-27b-128k.gguf -e MODEL_ALIAS=vast/eliza-1-27b' \
--onstart-cmd "$(cat ../../cloud-services/vast-pyworker/onstart.sh)"
# Get the public endpoint.
vastai show instance <instance-id>
# → forward 8080 → http://<public-ip>:<mapped-port>/v1
ghcr.io/ggml-org/llama.cpp:server-cuda is the image used for the canonical
GGUF deploy path.
The cloud Worker expects model IDs like vast/eliza-1-9b to forward
through eliza/cloud/packages/lib/providers/vast.ts. Adding a new
size to the cloud catalog requires one new entry in
eliza/cloud/packages/lib/models/catalog.ts per Vast alias —
structurally identical to the existing vast/eliza-1-27b
row. See ../README.md for the integration overview.