docs/5-CONFIGURATION/local-tts.md
Run text-to-speech locally for free, private podcast generation using OpenAI-compatible TTS servers.
| Benefit | Description |
|---|---|
| Free | No per-character costs after setup |
| Private | Audio never leaves your machine |
| Unlimited | No rate limits or quotas |
| Offline | Works without internet |
Speaches is an open-source, OpenAI-compatible TTS server.
💡 Ready-made Docker Compose files available:
- docker-compose-speaches.yml - Speaches + Open Notebook
- docker-compose-full-local.yml - Speaches + Ollama (100% local setup)
These include complete setup instructions and configuration examples. Just copy and run!
Create a folder and add docker-compose.yml:
services:
speaches:
image: ghcr.io/speaches-ai/speaches:latest-cpu
container_name: speaches
ports:
- "8969:8000"
volumes:
- hf-hub-cache:/home/ubuntu/.cache/huggingface/hub
restart: unless-stopped
volumes:
hf-hub-cache:
# Start Speaches
docker compose up -d
# Wait for startup
sleep 10
# Download voice model (~500MB)
docker compose exec speaches uv tool run speaches-cli model download speaches-ai/Kokoro-82M-v1.0-ONNX
curl "http://localhost:8969/v1/audio/speech" -s \
-H "Content-Type: application/json" \
--output test.mp3 \
--data '{
"input": "Hello! Local TTS is working.",
"model": "speaches-ai/Kokoro-82M-v1.0-ONNX",
"voice": "af_bella"
}'
Play test.mp3 to verify.
Via Settings UI (Recommended):
http://host.docker.internal:8969/v1 (Docker) or http://localhost:8969/v1 (local)Legacy (Deprecated) — Environment variables:
# In your Open Notebook docker-compose.yml
environment:
- OPENAI_COMPATIBLE_BASE_URL_TTS=http://host.docker.internal:8969/v1
# Local development
export OPENAI_COMPATIBLE_BASE_URL_TTS=http://localhost:8969/v1
openai_compatiblespeaches-ai/Kokoro-82M-v1.0-ONNXLocal TTSThe Kokoro model includes multiple voices:
| Voice ID | Description |
|---|---|
af_bella | Clear, professional |
af_sarah | Warm, friendly |
af_nicole | Energetic, expressive |
| Voice ID | Description |
|---|---|
am_adam | Deep, authoritative |
am_michael | Friendly, conversational |
| Voice ID | Description |
|---|---|
bf_emma | British female, professional |
bm_george | British male, formal |
for voice in af_bella af_sarah am_adam am_michael; do
curl "http://localhost:8969/v1/audio/speech" -s \
-H "Content-Type: application/json" \
--output "test_${voice}.mp3" \
--data "{
\"input\": \"Hello, this is the ${voice} voice.\",
\"model\": \"speaches-ai/Kokoro-82M-v1.0-ONNX\",
\"voice\": \"${voice}\"
}"
done
For faster generation with NVIDIA GPUs:
services:
speaches:
image: ghcr.io/speaches-ai/speaches:latest-cuda
container_name: speaches
ports:
- "8969:8000"
volumes:
- hf-hub-cache:/home/ubuntu/.cache/huggingface/hub
restart: unless-stopped
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
volumes:
hf-hub-cache:
When configuring your OpenAI-Compatible credential in Settings → API Keys, use the appropriate TTS base URL for your setup:
TTS Base URL: http://host.docker.internal:8969/v1
TTS Base URL (Option 1 — Docker bridge IP): http://172.17.0.1:8969/v1
Option 2: Use host networking mode (docker run --network host ...), then use: http://localhost:8969/v1
Run Speaches on a different machine:
TTS Base URL: http://server-ip:8969/v1 (replace with your server's IP)
Configure different voices for each speaker:
Speaker 1 (Host):
Model: speaches-ai/Kokoro-82M-v1.0-ONNX
Voice: af_bella
Speaker 2 (Guest):
Model: speaches-ai/Kokoro-82M-v1.0-ONNX
Voice: am_adam
Speaker 3 (Narrator):
Model: speaches-ai/Kokoro-82M-v1.0-ONNX
Voice: bf_emma
# Check logs
docker compose logs speaches
# Verify port available
lsof -i :8969
# Restart
docker compose down && docker compose up -d
# Test Speaches is running
curl http://localhost:8969/v1/models
# From inside Open Notebook container
docker exec -it open-notebook curl http://host.docker.internal:8969/v1/models
# List downloaded models
docker compose exec speaches uv tool run speaches-cli model list
# Download if missing
docker compose exec speaches uv tool run speaches-cli model download speaches-ai/Kokoro-82M-v1.0-ONNX
"speed": 0.9 to 1.2| Solution | How |
|---|---|
| Use GPU | Switch to latest-cuda image |
| More CPU | Allocate more cores in Docker |
| Faster model | Use smaller/quantized models |
| SSD storage | Move Docker volumes to SSD |
| Component | Minimum | Recommended |
|---|---|---|
| CPU | 2 cores | 4+ cores |
| RAM | 2 GB | 4+ GB |
| Storage | 5 GB | 10 GB (for multiple models) |
| GPU | None | NVIDIA (optional) |
services:
speaches:
# ... other config
mem_limit: 4g
cpus: 2
docker stats speaches
| Aspect | Local (Speaches) | Cloud (OpenAI/ElevenLabs) |
|---|---|---|
| Cost | Free | $0.015-0.10/min |
| Privacy | Complete | Data sent to provider |
| Speed | Depends on hardware | Usually faster |
| Quality | Good | Excellent |
| Setup | Moderate | Simple API key |
| Offline | Yes | No |
| Voices | Limited | Many options |
Any OpenAI-compatible TTS server works. The key is:
/v1/audio/speech endpointopenai_compatible