Back to Manticoresearch

Conversational search

manual/english/Searching/Conversational_search.md

27.2.08.7 KB
Original Source

Conversational search

Conversational search lets Manticore Buddy answer questions over an existing vectorized table. Buddy retrieves the most relevant rows with KNN search, turns those rows into context, and sends the context plus the conversation history to an LLM.

It is managed from SQL with:

  • CREATE CHAT MODEL
  • SHOW CHAT MODELS
  • DESCRIBE CHAT MODEL
  • DROP CHAT MODEL
  • CALL CHAT

Before you start

You need a vectorized table and an LLM provider. The table requirements are covered below. Provider credentials can be set in CREATE CHAT MODEL with api_key, or supplied through the matching environment variable, such as OPENAI_API_KEY.

How it works

When CALL CHAT runs, Buddy builds a retrieval-augmented answer in this order:

  1. Buddy loads the chat model.
  2. Buddy loads the conversation history for the supplied conversation UUID. If no UUID is supplied, it creates one.
  3. Buddy inspects the target table and chooses a FLOAT_VECTOR field.
  4. The LLM decides how to handle the message: search again, answer from the previous search context, or answer without retrieval.
  5. Buddy runs KNN search with the selected vector field when retrieval is needed.
  6. Buddy builds the LLM context from the vector field's from='...' source fields.
  7. The configured LLM generates the answer.
  8. Buddy saves the user message and the assistant reply in the conversation history.

The fifth argument of CALL CHAT is called fields internally, but for conversational search it means the vector field used by knn(...). It is not a list of fields to return. Buddy selects rows with SELECT *, then removes vector columns from the sources payload so the response does not include large embedding values.

Table requirements

The table must have at least one FLOAT_VECTOR field configured for auto embeddings. The vector field must include from='...', because Buddy uses those source fields as LLM context.

The examples below use onnx-models/all-MiniLM-L12-v2-onnx, which runs through the recommended ONNX path and does not require an embedding API key.

<!-- example conversational_search_create_table --> <!-- intro -->
SQL:
<!-- request SQL -->
sql
CREATE TABLE docs (
    id BIGINT,
    title TEXT,
    content TEXT,
    embedding FLOAT_VECTOR
        knn_type='hnsw'
        hnsw_similarity='cosine'
        model_name='onnx-models/all-MiniLM-L12-v2-onnx'
        from='title,content'
) TYPE='rt';

INSERT INTO docs(id, title, content) VALUES
    (1, 'Vector search', 'Vector search compares embeddings to find semantically similar documents.'),
    (2, 'Full-text search', 'Full-text search matches terms and phrases in indexed text.');
<!-- end -->

If CALL CHAT does not specify a vector field, Buddy uses the first FLOAT_VECTOR field found in the table definition.

Creating a chat model

Use CREATE CHAT MODEL to store the LLM provider, model id, and retrieval settings.

<!-- example conversational_search_create_model --> <!-- intro -->
SQL:
<!-- request SQL -->
sql
CREATE CHAT MODEL assistant (
    model='openai:gpt-4o-mini'
);
<!-- end -->

You can also set provider options and retrieval limits:

<!-- example conversational_search_create_model_extended --> <!-- intro -->
SQL:
<!-- request SQL -->
sql
CREATE CHAT MODEL support_assistant (
    model='openai:gpt-4o-mini',
    api_key='your-provider-api-key',
    base_url='http://host.docker.internal:8787/v1',
    timeout=60,
    retrieval_limit=5,
    max_document_length=3000
);
<!-- end -->

Common options:

OptionRequiredDescription
modelYesLLM model id in provider:model format.
descriptionNoStored description.
api_keyNoProvider API key passed to the llm extension.
base_urlNoProvider or proxy base URL.
timeoutNoLLM request timeout, 1..65536.
retrieval_limitNoNumber of documents requested from KNN, 1..50; default is 5.
max_document_lengthNoPer-document context limit. 0 disables truncation; 100..65536 truncates; default is 2000.

Chat model names may contain only letters, numbers, and underscores.

The model option must use provider:model format:

sql
model='openai:gpt-4o-mini'

Provider api_key is optional if the provider key is already available in Buddy's environment. For example, a Docker Compose service can pass provider keys like this:

yaml
environment:
  - OPENAI_API_KEY=${OPENAI_API_KEY}
  - OPENROUTER_API_KEY=${OPENROUTER_API_KEY}

If api_key is not set in CREATE CHAT MODEL, the llm extension can use the matching provider environment variable. Set api_key in the chat model only when you need this model to use a different key.

CALL CHAT syntax

sql
CALL CHAT(
    'query',
    'table',
    'model_name',
    'conversation_uuid',
    'vector_field'
);

Arguments are positional only:

PositionArgumentRequiredDescription
1queryYesUser question.
2tableYesTable to search.
3model_nameYesChat model name.
4conversation_uuidNoExisting conversation id, or an empty string.
5fields / vector fieldNoFLOAT_VECTOR field used in knn(...).

The table argument must be a plain table identifier, optionally qualified as database.table. The vector field argument must be a plain field identifier.

Asking questions

Use CALL CHAT with a query, a table, and a chat model.

<!-- example conversational_search_call_chat --> <!-- intro -->
SQL:
<!-- request SQL -->
sql
CALL CHAT(
    'What is vector search?',
    'docs',
    'assistant'
);
<!-- end -->

To continue a conversation, pass the same conversation UUID:

<!-- example conversational_search_continue_chat --> <!-- intro -->
SQL:
<!-- request SQL -->
sql
CALL CHAT(
    'Can you explain it with an example?',
    'docs',
    'assistant',
    'docs-chat-001'
);
<!-- end -->

To search a specific vector field, pass it as the fifth argument:

<!-- example conversational_search_vector_field --> <!-- intro -->
SQL:
<!-- request SQL -->
sql
CALL CHAT(
    'Find documents where the title is about vector search',
    'docs',
    'assistant',
    '',
    'title_embedding'
);
<!-- end -->

When the fifth argument is present, Buddy checks that the field exists and is a FLOAT_VECTOR. If the argument is omitted, Buddy detects the first FLOAT_VECTOR field from SHOW CREATE TABLE.

Search and context details

When Buddy needs retrieval, it runs KNN search on the selected vector field and returns up to retrieval_limit rows. The default distance threshold is 0.8.

Buddy uses the retrieved rows as LLM context. The same rows are returned in sources, with knn_dist included and FLOAT_VECTOR columns removed.

max_document_length limits how much text from each source row can be sent to the LLM. Use 0 to disable truncation; otherwise use a value from 100 to 65536.

Response

CALL CHAT returns one row:

ColumnDescription
conversation_uuidExisting or generated conversation id.
user_queryOriginal user query.
search_queryStandalone search query used for retrieval.
responseLLM answer.
sourcesJSON string containing retrieved source rows.

Example response shape:

json
{
  "conversation_uuid": "docs-chat-001",
  "user_query": "What is vector search?",
  "search_query": "vector search, embeddings, similarity search",
  "response": "Vector search finds similar items by comparing embeddings...",
  "sources": "[{\"id\":1,\"title\":\"Vector Search\",\"content\":\"...\",\"knn_dist\":0.12}]"
}

Vector fields are not included in sources.

Managing chat models

List models:

<!-- example conversational_search_show_models --> <!-- intro -->
SQL:
<!-- request SQL -->
sql
SHOW CHAT MODELS;
<!-- end -->

Describe a model:

<!-- example conversational_search_describe_model --> <!-- intro -->
SQL:
<!-- request SQL -->
sql
DESCRIBE CHAT MODEL assistant;
<!-- end -->

Drop a model:

<!-- example conversational_search_drop_model --> <!-- intro -->
SQL:
<!-- request SQL -->
sql
DROP CHAT MODEL assistant;
<!-- end -->

Drop safely:

<!-- example conversational_search_drop_model_if_exists --> <!-- intro -->
SQL:
<!-- request SQL -->
sql
DROP CHAT MODEL IF EXISTS assistant;
<!-- end -->

SHOW CHAT MODELS returns name, model, and created_at. DESCRIBE CHAT MODEL returns property and value; stored API keys are shown as HIDDEN.

Dropping a chat model also drops that model's conversation history table. Conversation history is stored per model and written with a 30-day TTL.