Back to Open Notebook

Local Text-to-Speech Setup

docs/5-CONFIGURATION/local-tts.md

1.8.57.8 KB
Original Source

Local Text-to-Speech Setup

Run text-to-speech locally for free, private podcast generation using OpenAI-compatible TTS servers.


Why Local TTS?

BenefitDescription
FreeNo per-character costs after setup
PrivateAudio never leaves your machine
UnlimitedNo rate limits or quotas
OfflineWorks without internet

Quick Start with Speaches

Speaches is an open-source, OpenAI-compatible TTS server.

💡 Ready-made Docker Compose files available:

These include complete setup instructions and configuration examples. Just copy and run!

Step 1: Create Docker Compose File

Create a folder and add docker-compose.yml:

yaml
services:
  speaches:
    image: ghcr.io/speaches-ai/speaches:latest-cpu
    container_name: speaches
    ports:
      - "8969:8000"
    volumes:
      - hf-hub-cache:/home/ubuntu/.cache/huggingface/hub
    restart: unless-stopped

volumes:
  hf-hub-cache:

Step 2: Start and Download Model

bash
# Start Speaches
docker compose up -d

# Wait for startup
sleep 10

# Download voice model (~500MB)
docker compose exec speaches uv tool run speaches-cli model download speaches-ai/Kokoro-82M-v1.0-ONNX

Step 3: Test

bash
curl "http://localhost:8969/v1/audio/speech" -s \
  -H "Content-Type: application/json" \
  --output test.mp3 \
  --data '{
    "input": "Hello! Local TTS is working.",
    "model": "speaches-ai/Kokoro-82M-v1.0-ONNX",
    "voice": "af_bella"
  }'

Play test.mp3 to verify.

Step 4: Configure Open Notebook

Via Settings UI (Recommended):

  1. Go to SettingsAPI Keys
  2. Click Add Credential → Select OpenAI-Compatible
  3. Enter base URL for TTS: http://host.docker.internal:8969/v1 (Docker) or http://localhost:8969/v1 (local)
  4. Click Save, then Test Connection

Legacy (Deprecated) — Environment variables:

yaml
# In your Open Notebook docker-compose.yml
environment:
  - OPENAI_COMPATIBLE_BASE_URL_TTS=http://host.docker.internal:8969/v1
bash
# Local development
export OPENAI_COMPATIBLE_BASE_URL_TTS=http://localhost:8969/v1

Step 5: Add Model in Open Notebook

  1. Go to SettingsModels
  2. Click Add Model in Text-to-Speech section
  3. Configure:
    • Provider: openai_compatible
    • Model Name: speaches-ai/Kokoro-82M-v1.0-ONNX
    • Display Name: Local TTS
  4. Click Save
  5. Set as default if desired

Available Voices

The Kokoro model includes multiple voices:

Female Voices

Voice IDDescription
af_bellaClear, professional
af_sarahWarm, friendly
af_nicoleEnergetic, expressive

Male Voices

Voice IDDescription
am_adamDeep, authoritative
am_michaelFriendly, conversational

British Accents

Voice IDDescription
bf_emmaBritish female, professional
bm_georgeBritish male, formal

Test Different Voices

bash
for voice in af_bella af_sarah am_adam am_michael; do
  curl "http://localhost:8969/v1/audio/speech" -s \
    -H "Content-Type: application/json" \
    --output "test_${voice}.mp3" \
    --data "{
      \"input\": \"Hello, this is the ${voice} voice.\",
      \"model\": \"speaches-ai/Kokoro-82M-v1.0-ONNX\",
      \"voice\": \"${voice}\"
    }"
done

GPU Acceleration

For faster generation with NVIDIA GPUs:

yaml
services:
  speaches:
    image: ghcr.io/speaches-ai/speaches:latest-cuda
    container_name: speaches
    ports:
      - "8969:8000"
    volumes:
      - hf-hub-cache:/home/ubuntu/.cache/huggingface/hub
    restart: unless-stopped
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

volumes:
  hf-hub-cache:

Docker Networking

When configuring your OpenAI-Compatible credential in Settings → API Keys, use the appropriate TTS base URL for your setup:

Open Notebook in Docker (macOS/Windows)

TTS Base URL: http://host.docker.internal:8969/v1

Open Notebook in Docker (Linux)

TTS Base URL (Option 1 — Docker bridge IP): http://172.17.0.1:8969/v1

Option 2: Use host networking mode (docker run --network host ...), then use: http://localhost:8969/v1

Remote Server

Run Speaches on a different machine:

TTS Base URL: http://server-ip:8969/v1 (replace with your server's IP)


Multi-Speaker Podcasts

Configure different voices for each speaker:

Speaker 1 (Host):
  Model: speaches-ai/Kokoro-82M-v1.0-ONNX
  Voice: af_bella

Speaker 2 (Guest):
  Model: speaches-ai/Kokoro-82M-v1.0-ONNX
  Voice: am_adam

Speaker 3 (Narrator):
  Model: speaches-ai/Kokoro-82M-v1.0-ONNX
  Voice: bf_emma

Troubleshooting

Service Won't Start

bash
# Check logs
docker compose logs speaches

# Verify port available
lsof -i :8969

# Restart
docker compose down && docker compose up -d

Connection Refused

bash
# Test Speaches is running
curl http://localhost:8969/v1/models

# From inside Open Notebook container
docker exec -it open-notebook curl http://host.docker.internal:8969/v1/models

Model Not Found

bash
# List downloaded models
docker compose exec speaches uv tool run speaches-cli model list

# Download if missing
docker compose exec speaches uv tool run speaches-cli model download speaches-ai/Kokoro-82M-v1.0-ONNX

Poor Audio Quality

  • Try different voices
  • Adjust speed: "speed": 0.9 to 1.2
  • Check model downloaded completely
  • Allocate more memory

Slow Generation

SolutionHow
Use GPUSwitch to latest-cuda image
More CPUAllocate more cores in Docker
Faster modelUse smaller/quantized models
SSD storageMove Docker volumes to SSD

Performance Tips

ComponentMinimumRecommended
CPU2 cores4+ cores
RAM2 GB4+ GB
Storage5 GB10 GB (for multiple models)
GPUNoneNVIDIA (optional)

Resource Limits

yaml
services:
  speaches:
    # ... other config
    mem_limit: 4g
    cpus: 2

Monitor Usage

bash
docker stats speaches

Comparison: Local vs Cloud

AspectLocal (Speaches)Cloud (OpenAI/ElevenLabs)
CostFree$0.015-0.10/min
PrivacyCompleteData sent to provider
SpeedDepends on hardwareUsually faster
QualityGoodExcellent
SetupModerateSimple API key
OfflineYesNo
VoicesLimitedMany options

When to Use Local

  • Privacy-sensitive content
  • High-volume generation
  • Development/testing
  • Offline environments
  • Cost control

When to Use Cloud

  • Premium quality needs
  • Multiple languages
  • Time-sensitive projects
  • Limited hardware

Other Local TTS Options

Any OpenAI-compatible TTS server works. The key is:

  1. Server implements /v1/audio/speech endpoint
  2. Add an OpenAI-Compatible credential in Settings → API Keys with the TTS base URL
  3. Add model with provider openai_compatible