backend/python/speaker-recognition/README.md
Speaker (voice) recognition backend for LocalAI. The audio analog to
insightface — produces speaker embeddings and supports 1:1 voice
verification and voice demographic analysis.
files: entry.Engine selection is gallery-driven: if the model config provides
model_path: / onnx: the ONNX engine is used, otherwise the
SpeechBrain engine.
POST /v1/voice/verify — 1:1 same-speaker check.
POST /v1/voice/embed — extract a speaker embedding vector.
POST /v1/voice/analyze — voice demographics, loaded lazily on
the first analyze call:
superb/wav2vec2-base-superb-er
(Apache-2.0), 4-way categorical (neutral / happy / angry / sad).Wav2Vec2ForSequenceClassification head via
age_gender_model:<repo> in options. The Audeering
age-gender model is not usable as a drop-in because its
multi-task head isn't loadable via AutoModelForAudioClassification.Both heads are optional. When nothing loads, the engine returns 501.
Audio is materialised by the HTTP layer to a temp wav before calling the gRPC backend. Accepted input forms on the HTTP side: URL, data-URI, or raw base64. The backend itself always receives a filesystem path.