Back to Voicebox

Add TTS Engine

.agents/skills/add-tts-engine/SKILL.md

0.5.05.0 KB
Original Source

Add TTS Engine

Goal

Integrate a new text-to-speech engine into Voicebox end-to-end: dependency research, backend protocol implementation, frontend UI wiring, PyInstaller bundling, and frozen-build verification. The user should only need to test the final build locally.

Reference Doc

The full phased guide lives at docs/content/docs/developer/tts-engines.mdx. Read this file in its entirety before starting. It contains:

  • Phase 0: Dependency research (mandatory before writing code)
  • Phase 1: Backend implementation (TTSBackend protocol)
  • Phase 2: Route and service integration (usually zero changes)
  • Phase 3: Frontend integration (5 files)
  • Phase 4: Dependencies (requirements.txt, justfile, CI, Docker)
  • Phase 5: PyInstaller bundling (build_binary.py + server.py)
  • Phase 6: Common upstream workarounds
  • Implementation checklist (gate between phases)

Workflow

1. Read the guide

bash
# Read the full TTS engines doc
cat docs/content/docs/developer/tts-engines.mdx

Internalize all phases, especially Phase 0 and Phase 5. The v0.2.3 release was three patch releases because Phase 0 was skipped.

2. Dependency research (Phase 0)

Clone the model library into a temporary directory and audit it. Do NOT skip this.

bash
mkdir /tmp/engine-research && cd /tmp/engine-research
git clone <model-library-url>

Run the grep searches from Phase 0.2 in the guide against the cloned source and its transitive dependencies. Produce a written dependency audit covering:

  1. PyPI vs non-PyPI packages
  2. PyInstaller directives needed (--collect-all, --copy-metadata, --hidden-import)
  3. Runtime data files that must be bundled
  4. Native library paths that need env var overrides in frozen builds
  5. Monkey-patches needed (torch.load, float64, MPS, HF token)
  6. Sample rate
  7. Model download method (from_pretrained vs snapshot_download + from_local)

Test model loading and generation on CPU in the throwaway venv before proceeding.

3. Implement (Phases 1–4)

Follow the guide's phases in order. Key files to modify:

Backend (Phase 1):

  • Create backend/backends/<engine>_backend.py
  • Register in backend/backends/__init__.py (ModelConfig + TTS_ENGINES + factory)
  • Update regex in backend/models.py

Frontend (Phase 3):

  • app/src/lib/api/types.ts — engine union type
  • app/src/lib/constants/languages.ts — ENGINE_LANGUAGES
  • app/src/components/Generation/EngineModelSelector.tsx — ENGINE_OPTIONS, ENGINE_DESCRIPTIONS
  • app/src/lib/hooks/useGenerationForm.ts — Zod schema, model-name mapping
  • app/src/components/ServerSettings/ModelManagement.tsx — MODEL_DESCRIPTIONS

Dependencies (Phase 4):

  • backend/requirements.txt
  • justfile (setup-python, setup-python-release targets)
  • .github/workflows/release.yml
  • Dockerfile (if applicable)

4. PyInstaller bundling (Phase 5)

Register the engine in backend/build_binary.py:

  • --hidden-import for the backend module and model package
  • --collect-all for packages using inspect.getsource, shipping data files, or native libraries
  • --copy-metadata for packages using importlib.metadata

If the engine has native data paths, add os.environ.setdefault() in backend/server.py inside the if getattr(sys, 'frozen', False): block.

5. Verify in dev mode

bash
just dev

Test the full chain: model download → load → generate → voice cloning.

6. Use the checklist

Walk through the Implementation Checklist at the bottom of tts-engines.mdx. Every item must be checked before handing the build to the user.

Key Lessons (from v0.2.3)

These are the most common failure modes. Phase 0 research catches all of them:

PatternSymptom in Frozen BuildFix
@typechecked / inspect.getsource()"could not get source code"--collect-all <package>
Package ships pretrained model filesFileNotFoundError for .pth.tar, .yaml--collect-all <package>
C library with hardcoded system pathsFileNotFoundError for /usr/share/...--collect-all + env var in server.py
importlib.metadata.version()"No package metadata found"--copy-metadata <package>
torch.load without map_locationCUDA device not available on CPU buildMonkey-patch torch.load
torch.from_numpy on float64 datadtype mismatch RuntimeErrorCast to .float()
token=True in HF download callsAuth failure without stored HF tokenUse snapshot_download(token=None) + from_local()

Notes

  • The route and service layers have zero per-engine dispatch points. main.py requires zero changes.
  • The model config registry in backends/__init__.py handles all dispatch automatically.
  • Use get_torch_device() and model_load_progress() from backends/base.py — don't reimplement device detection or progress tracking.
  • Always test with a clean HuggingFace cache (no pre-downloaded models from dev).
  • Do NOT push or create a release. Hand the build to the user for local testing.