v3/docs/adr/ADR-173-remote-gpu-distillation-weight-eft.md
ID: ADR-173
Status: Proposed — implemented on feat/agenticow-integration (ships in 3.21.0)
Date: 2026-07-04
Authors: rUv (drafted with Claude Code)
Related ADRs:
@metaharness/[email protected] turns an agent's run archive into portable LoRA training data (SFT + DPO JSONL) with contamination / reward-hacking / long-context guards — all $0, offline. Its value proposition is closing the cost flywheel: fine-tune a cheap open-weights model (GLM/Qwen/DeepSeek) on successful runs so the cascade escalates to frontier models less often.
But the flywheel did not close, for reasons verified end-to-end:
train emits the ruvllm microlora command string and deliberately never spawns it (no GPU job, no paid call — by design).The unlock: a real GPU host — ruvultra on tailscale (Linux, SSH open, no HTTP inference port). The tune can run there over SSH.
Ship ruflo neural distill export | plan | eval | train, where train --remote <host> constructs an SSH-based tune on a user-supplied GPU host — but every spending / executing path is off by default and explicitly gated.
export — captured run records → DarwinTrajectory[] → weight-eft SFT/DPO JSONL + guard report. $0, offline.plan — print the ruvllm microlora GPU training plan. $0 dry-run.eval — cost-Pareto delta between two cascade outcome sets. $0.train --remote <host> [--base <model>]:
ruvllm microlora sft && dpo over SSH, fetch the adapter back.ruvllm --version / nvidia-smi on remote, wrapped so failure is reported not fatal). No data leaves the machine, no compute runs.--execute and --yes (or an interactive confirm). Even then the command preview is shown first.--remote / RUFLO_DISTILL_REMOTE) — never hard-coded. ruvultra / its IP appear in no source file; it is the user's infra, passed at runtime.No command, help text, or doc claims ruflo "trains a model" or "reduces escalation" as a $0/local capability. It is: export audited training data + measure the cost-Pareto + print the plan, and — as an explicit, user-triggered, remote-GPU, spend-gated step — run the tune. The residual gaps are stated plainly: (a) resolved-gold is Tier-graded (ADR-171), so SFT quality is only as good as its provenance; (b) a real adapter must clear a hand-verified holdout before it is trusted in routing.
--execute --yes away once the user confirms the box has the GPU toolchain and accepts the spend.