documentation/docs/advanced/support-multilingual-docs.md
Khoj uses an embedding model to understand documents. Multilingual embedding models improve the search quality for documents not in English. This affects both search and chat with docs experiences across Khoj.
To improve search and chat quality for non-english documents you can use a multilingual model.
For example, the paraphrase-multilingual-MiniLM-L12-v2 supports 50+ languages, has decent search quality and speed for a consumer machine. To use it:
bi_encoder field to sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2cross_encoder field to mixedbread-ai/mxbai-rerank-xsmall-v1:::info[Note]
Modern search/embedding model like mixedbread-ai/mxbai-embed-large-v1 expect a prefix to the query (or docs) string to improve encoding. Update the bi_encoder_query_encode_config field of your embedding model with {prompt: <prefix-prompt>} to improve the search quality of these models.
E.g. {prompt: "Represent this query for searching documents"}. You can pass any valid JSON object that the SentenceTransformer encode function accepts
:::