packages/training/scripts/dflash/README.md
The DFlash drafter pipeline distills a small "drafter" model that proposes
N tokens per step for the target Eliza-1 text model to verify. The
acceptance window must be ≥ the per-tier gate (see
distill_dflash_drafter.py::ACCEPTANCE_GATE) before the drafter is
publish-eligible.
The catalog at packages/shared/src/local-inference/catalog.ts
(ELIZA_1_TIER_IDS) is the canonical, user-facing tier identifier set.
Today this is:
0_8b, 2b, 4b, 9b, 27b
These IDs name the target text models — the bundles a user downloads from
elizaos/eliza-1 under bundles/<tier>/ on Hugging Face.
DFlash drafter training covers the same tier set:
0_8b, 2b, 4b, 9b, 27b
0_8b uses the smallest/tiny drafter recipe. Its runtime DFlash launch remains
gated until acceptance/tg is validated on real hardware, but the artifact is no
longer optional or undefined.
Two ideas not to conflate:
eliza-1-0_8b is a 0.8B target). The catalog
bundle ships it as text/eliza-1-<tier>-<ctx>.gguf.dflash/drafter-<tier>.gguf for drafter-enabled tiers. The
drafter's own parameter count is usually smaller than the target (e.g. a
0.3B Qwen3.5 student drafts for a 2B target). See
distill_dflash_drafter.py::DEFAULT_STUDENT_BASE /
DEFAULT_STUDENT_CONFIG for the per-tier student recipe.The tier ID in every filename, env var, CLI flag, and script name
refers to the target tier, never the drafter's own size. So
dflash/drafter-2b.gguf is "the drafter that ships with the
eliza-1-2b target bundle" — even though that drafter's weights are a
0.3B Qwen3.5 distillation.
Each jobs/distill_dflash_<tier>.sh script:
_lib.sh for shared validation + log routing.TIER to the catalog tier (e.g. TIER="2b").EPOCHS, BATCH_SIZE, GRAD_ACCUM, LR,
MAX_SEQ_LEN) — these are starting points, not gospel; tune
empirically per release.dflash_run_distill "$@".The distill_dflash_0_8b.sh wrapper trains the 0.1B drafter for the smallest
target tier. The distill_dflash_2b.sh wrapper trains the 0.3B drafter for
the 2B target tier.
To run a script:
# Real run (requires GPU + dataset + target checkpoint + target GGUF).
TARGET_CHECKPOINT=checkpoints/eliza-1-2b/final \
TARGET_GGUF=out/eliza-1-2b/text/eliza-1-2b-128k.gguf \
DATASET=out/distill/eliza-1-2b/train.jsonl \
bash packages/training/scripts/dflash/jobs/distill_dflash_2b.sh
# Synthetic smoke (no GPU, no real models — exercises the pipeline).
bash packages/training/scripts/dflash/jobs/distill_dflash_2b.sh \
--synthetic-smoke
The smoke flag exports DFLASH_SMOKE=1, bypasses input validation, exercises
the CLI/control-flow path, and exits zero without writing release artifacts.
This is what CI exercises.
When the catalog adds a new canonical tier:
ELIZA_1_TIERS in
packages/training/scripts/manifest/eliza1_manifest.py.TEXT_QUANT_BY_TIER, CONTEXTS_BY_TIER,
SUPPORTED_BACKENDS_BY_TIER, VOICE_QUANT_BY_TIER,
REQUIRED_PLATFORM_EVIDENCE_BY_TIER in eliza1_platform_plan.py /
eliza1_manifest.py.distill_dflash_drafter.py::DEFAULT_STUDENT_BASE /
DEFAULT_STUDENT_CONFIG / ACCEPTANCE_GATE / DEFAULT_TARGET_MODEL.KNOWN_TIERS entry in
prepare_distill_dataset.py (and the validator will read the gate from
distill_dflash_drafter.py).jobs/distill_dflash_<tier>.sh scripts and change the TIER= line + the
hyperparameters.ELIZA_1_GGUF_READINESS.md with
uv run python -m scripts.manifest.eliza1_platform_plan (from
packages/training/).