docs/5-CONFIGURATION/local-stt.md
Run speech-to-text locally for free, private audio/video transcription using OpenAI-compatible STT servers.
| Benefit | Description |
|---|---|
| Free | No per-minute costs after setup |
| Private | Audio never leaves your machine |
| Unlimited | No rate limits or quotas |
| Offline | Works without internet |
Speaches is an open-source, OpenAI-compatible server that supports both TTS and STT. It uses faster-whisper for transcription.
💡 Ready-made Docker Compose files available:
- docker-compose-speaches.yml - Speaches + Open Notebook
- docker-compose-full-local.yml - Speaches + Ollama (100% local setup)
These include complete setup instructions and configuration examples. Just copy and run!
Create a folder and add docker-compose.yml:
services:
speaches:
image: ghcr.io/speaches-ai/speaches:latest-cpu
container_name: speaches
ports:
- "8969:8000"
volumes:
- hf-hub-cache:/home/ubuntu/.cache/huggingface/hub
restart: unless-stopped
volumes:
hf-hub-cache:
# Start Speaches
docker compose up -d
# Wait for startup
sleep 10
# Download Whisper model (~500MB for small)
docker compose exec speaches uv tool run speaches-cli model download Systran/faster-whisper-small
Models can also be downloaded automatically on first use, but pre-downloading avoids delays.
# Create a test audio file (or use your own)
# Then transcribe it:
curl "http://localhost:8969/v1/audio/transcriptions" \
-F "[email protected]" \
-F "model=Systran/faster-whisper-small"
You should see the transcribed text in the response.
Via Settings UI (Recommended):
http://host.docker.internal:8969/v1 (Docker) or http://localhost:8969/v1 (local)Legacy (Deprecated) — Environment variables:
# In your Open Notebook docker-compose.yml
environment:
- OPENAI_COMPATIBLE_BASE_URL_STT=http://host.docker.internal:8969/v1
# Local development
export OPENAI_COMPATIBLE_BASE_URL_STT=http://localhost:8969/v1
openai_compatibleSystran/faster-whisper-smallLocal WhisperSpeaches supports various Whisper model sizes. Larger models are more accurate but slower:
| Model | Size | Speed | Accuracy | VRAM (GPU) |
|---|---|---|---|---|
Systran/faster-whisper-tiny | ~75 MB | Fastest | Basic | ~1 GB |
Systran/faster-whisper-base | ~150 MB | Fast | Good | ~1 GB |
Systran/faster-whisper-small | ~500 MB | Medium | Better | ~2 GB |
Systran/faster-whisper-medium | ~1.5 GB | Slow | Great | ~5 GB |
Systran/faster-whisper-large-v3 | ~3 GB | Slowest | Best | ~10 GB |
Systran/faster-distil-whisper-small.en | ~400 MB | Fast | Good (English only) | ~2 GB |
docker compose exec speaches uv tool run speaches-cli registry ls --task automatic-speech-recognition
Systran/faster-whisper-tiny or Systran/faster-whisper-baseSystran/faster-whisper-small (recommended)Systran/faster-whisper-large-v3For faster transcription with NVIDIA GPUs:
services:
speaches:
image: ghcr.io/speaches-ai/speaches:latest-cuda
container_name: speaches
ports:
- "8969:8000"
volumes:
- hf-hub-cache:/home/ubuntu/.cache/huggingface/hub
environment:
- WHISPER__TTL=-1 # Keep model in VRAM (recommended if you have enough memory)
restart: unless-stopped
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
volumes:
hf-hub-cache:
By default, Speaches unloads models after some time. To keep the Whisper model loaded for instant transcription:
environment:
- WHISPER__TTL=-1 # Never unload
This is recommended if you have enough RAM/VRAM, as loading the model can take a few seconds.
When configuring your OpenAI-Compatible credential in Settings → API Keys, use the appropriate STT base URL for your setup:
STT Base URL: http://host.docker.internal:8969/v1
STT Base URL (Option 1 — Docker bridge IP): http://172.17.0.1:8969/v1
Option 2: Use host networking mode (docker run --network host ...), then use: http://localhost:8969/v1
Run Speaches on a different machine:
STT Base URL: http://server-ip:8969/v1 (replace with your server's IP)
Whisper supports 99+ languages. Specify the language for better accuracy:
curl "http://localhost:8969/v1/audio/transcriptions" \
-F "[email protected]" \
-F "model=Systran/faster-whisper-small" \
-F "language=ru"
Common language codes:
en - Englishru - Russianes - Spanishfr - Frenchde - Germanzh - Chineseja - Japanese# Check logs
docker compose logs speaches
# Verify port available
lsof -i :8969
# Restart
docker compose down && docker compose up -d
# Test Speaches is running
curl http://localhost:8969/v1/models
# From inside Open Notebook container
docker exec -it open-notebook curl http://host.docker.internal:8969/v1/models
Models are downloaded automatically on first use. If download fails:
# Check available disk space
df -h
# Check Docker logs for errors
docker compose logs speaches
# Restart and try again
docker compose restart speaches
faster-whisper-medium or large-v3)| Solution | How |
|---|---|
| Use GPU | Switch to latest-cuda image |
| Smaller model | Use faster-whisper-tiny or base |
| More CPU | Allocate more cores in Docker |
| SSD storage | Move Docker volumes to SSD |
| Component | Minimum | Recommended |
|---|---|---|
| CPU | 2 cores | 4+ cores |
| RAM | 2 GB | 8+ GB |
| Storage | 5 GB | 10 GB (for multiple models) |
| GPU | None | NVIDIA (optional, much faster) |
services:
speaches:
# ... other config
mem_limit: 4g
cpus: 2
docker stats speaches
| Aspect | Local (Speaches) | Cloud (OpenAI Whisper) |
|---|---|---|
| Cost | Free | $0.006/min |
| Privacy | Complete | Data sent to provider |
| Speed | Depends on hardware | Usually faster |
| Quality | Excellent (same Whisper) | Excellent |
| Setup | Moderate | Simple API key |
| Offline | Yes | No |
| Languages | 99+ | 99+ |
Speaches supports both TTS and STT in one server. In Settings → API Keys, add a single OpenAI-Compatible credential and configure both the TTS and STT base URLs to point to the same Speaches server (e.g., http://localhost:8969/v1).
See Local TTS Setup for TTS configuration.
Any OpenAI-compatible STT server works:
| Server | Description |
|---|---|
| Speaches | TTS + STT in one (recommended) |
| faster-whisper-server | Lightweight STT only |
| whisper.cpp | C++ implementation with server mode |
| LocalAI | Multi-model local AI server |
The key requirements:
/v1/audio/transcriptions endpointopenai_compatible