Add TTS Engine

Goal

Integrate a new text-to-speech engine into Voicebox end-to-end: dependency research, backend protocol implementation, frontend UI wiring, PyInstaller bundling, and frozen-build verification. The user should only need to test the final build locally.

Reference Doc

The full phased guide lives at docs/content/docs/developer/tts-engines.mdx. Read this file in its entirety before starting. It contains:

Phase 0: Dependency research (mandatory before writing code)
Phase 1: Backend implementation (TTSBackend protocol)
Phase 2: Route and service integration (usually zero changes)
Phase 3: Frontend integration (5 files)
Phase 4: Dependencies (requirements.txt, justfile, CI, Docker)
Phase 5: PyInstaller bundling (build_binary.py + server.py)
Phase 6: Common upstream workarounds
Implementation checklist (gate between phases)

Workflow

1. Read the guide

bash

# Read the full TTS engines doc
cat docs/content/docs/developer/tts-engines.mdx

Internalize all phases, especially Phase 0 and Phase 5. The v0.2.3 release was three patch releases because Phase 0 was skipped.

2. Dependency research (Phase 0)

Clone the model library into a temporary directory and audit it. Do NOT skip this.

bash

mkdir /tmp/engine-research && cd /tmp/engine-research
git clone <model-library-url>

Run the grep searches from Phase 0.2 in the guide against the cloned source and its transitive dependencies. Produce a written dependency audit covering:

PyPI vs non-PyPI packages
PyInstaller directives needed (--collect-all, --copy-metadata, --hidden-import)
Runtime data files that must be bundled
Native library paths that need env var overrides in frozen builds
Monkey-patches needed (torch.load, float64, MPS, HF token)
Sample rate
Model download method (from_pretrained vs snapshot_download + from_local)

Test model loading and generation on CPU in the throwaway venv before proceeding.

3. Implement (Phases 1–4)

Follow the guide's phases in order. Key files to modify:

Backend (Phase 1):

Create backend/backends/<engine>_backend.py
Register in backend/backends/__init__.py (ModelConfig + TTS_ENGINES + factory)
Update regex in backend/models.py

Frontend (Phase 3):

app/src/lib/api/types.ts — engine union type
app/src/lib/constants/languages.ts — ENGINE_LANGUAGES
app/src/components/Generation/EngineModelSelector.tsx — ENGINE_OPTIONS, ENGINE_DESCRIPTIONS
app/src/lib/hooks/useGenerationForm.ts — Zod schema, model-name mapping
app/src/components/ServerSettings/ModelManagement.tsx — MODEL_DESCRIPTIONS

Dependencies (Phase 4):

backend/requirements.txt
justfile (setup-python, setup-python-release targets)
.github/workflows/release.yml
Dockerfile (if applicable)

4. PyInstaller bundling (Phase 5)

--hidden-import for the backend module and model package
--collect-all for packages using inspect.getsource, shipping data files, or native libraries
--copy-metadata for packages using importlib.metadata

If the engine has native data paths, add os.environ.setdefault() in backend/server.py inside the if getattr(sys, 'frozen', False): block.

5. Verify in dev mode

bash

just dev

Test the full chain: model download → load → generate → voice cloning.

6. Use the checklist

Walk through the Implementation Checklist at the bottom of tts-engines.mdx. Every item must be checked before handing the build to the user.

Key Lessons (from v0.2.3)

These are the most common failure modes. Phase 0 research catches all of them:

Pattern	Symptom in Frozen Build	Fix
`@typechecked` / `inspect.getsource()`	"could not get source code"	`--collect-all <package>`
Package ships pretrained model files	`FileNotFoundError` for `.pth.tar`, `.yaml`	`--collect-all <package>`
C library with hardcoded system paths	`FileNotFoundError` for `/usr/share/...`	`--collect-all` + env var in `server.py`
`importlib.metadata.version()`	"No package metadata found"	`--copy-metadata <package>`
`torch.load` without `map_location`	CUDA device not available on CPU build	Monkey-patch `torch.load`
`torch.from_numpy` on float64 data	dtype mismatch RuntimeError	Cast to `.float()`
`token=True` in HF download calls	Auth failure without stored HF token	Use `snapshot_download(token=None)` + `from_local()`

Notes

The route and service layers have zero per-engine dispatch points. main.py requires zero changes.
The model config registry in backends/__init__.py handles all dispatch automatically.
Use get_torch_device() and model_load_progress() from backends/base.py — don't reimplement device detection or progress tracking.
Always test with a clean HuggingFace cache (no pre-downloaded models from dev).
Do NOT push or create a release. Hand the build to the user for local testing.