docs/content/features/embeddings.md
+++ disableToc = false title = "Embeddings" weight = 13 url = "/features/embeddings/" +++
LocalAI supports generating embeddings for text or list of tokens.
For the API documentation you can refer to the OpenAI docs: https://platform.openai.com/docs/api-reference/embeddings
The embedding endpoint is compatible with llama.cpp models, bert.cpp models and sentence-transformers models available in huggingface.
LocalAI provides a model gallery with pre-configured embedding models. To use a gallery model:
Example gallery models:
qwen3-embedding-4b - Qwen3 Embedding 4B modelqwen3-embedding-8b - Qwen3 Embedding 8B modelqwen3-embedding-0.6b - Qwen3 Embedding 0.6B modelcurl http://localhost:8080/embeddings -X POST -H "Content-Type: application/json" -d '{
"input": "My text to embed",
"model": "qwen3-embedding-4b",
"dimensions": 2560
}'
Create a YAML config file in the models directory. Specify the backend and the model file.
name: text-embedding-ada-002 # The model name used in the API
parameters:
model: <model_file>
backend: "<backend>"
embeddings: true
To use sentence-transformers and models in huggingface you can use the sentencetransformers embedding backend.
name: text-embedding-ada-002
backend: sentencetransformers
embeddings: true
parameters:
model: all-MiniLM-L6-v2
The sentencetransformers backend uses Python sentence-transformers. For a list of all pre-trained models available see here: https://github.com/UKPLab/sentence-transformers#pre-trained-models
{{% notice note %}}
sentencetransformers backend is an optional backend of LocalAI and uses Python. If you are running LocalAI from the containers you are good to go and should be already configured for use.EXTERNAL_GRPC_BACKENDS environment variable.
EXTERNAL_GRPC_BACKENDS="sentencetransformers:/path/to/LocalAI/backend/python/sentencetransformers/sentencetransformers.py"sentencetransformers backend does support only embeddings of text, and not of tokens. If you need to embed tokens you can use the bert backend or llama.cpp.sentencetransformers backend. The models will be downloaded automatically the first time the API is used.{{% /notice %}}
Embeddings with llama.cpp are supported with the llama-cpp backend, it needs to be enabled with embeddings set to true.
name: my-awesome-model
backend: llama-cpp
embeddings: true
parameters:
model: ggml-file.bin
Then you can use the API to generate embeddings:
curl http://localhost:8080/embeddings -X POST -H "Content-Type: application/json" -d '{
"input": "My text",
"model": "my-awesome-model"
}' | jq "."
Symptoms:
Common Causes:
Incorrect model filename: Ensure you're using the correct filename from the gallery or your model file location.
Qwen3-Embedding-4B-Q4_K_M.gguf)Context size mismatch: Ensure your context_size setting doesn't exceed the model's maximum context length.
Missing embeddings: true flag: The model configuration must have embeddings: true set.
Correct Configuration Example:
name: qwen3-embedding-4b
backend: llama-cpp
embeddings: true
context_size: 32768
parameters:
model: Qwen3-Embedding-4B-Q4_K_M.gguf
Symptoms:
Solution:
dimensions parameter in your API request to specify the output dimensioncurl http://localhost:8080/embeddings -X POST -H "Content-Type: application/json" -d '{
"input": "My text",
"model": "qwen3-embedding-4b",
"dimensions": 1024
}'
Symptoms:
Solution:
name field in the configurationThe Qwen3 Embedding series models have these characteristics:
| Model | Parameters | Max Context | Max Dimensions | Supported Languages |
|---|---|---|---|---|
| qwen3-embedding-0.6b | 0.6B | 32k | 1024 | 100+ |
| qwen3-embedding-4b | 4B | 32k | 2560 | 100+ |
| qwen3-embedding-8b | 8B | 32k | 4096 | 100+ |
All models support: