Ollama - Mindsdb — ContextQMD

This documentation describes the integration of MindsDB with Ollama, a tool that enables local deployment of large language models. The integration allows for the deployment of Ollama models within MindsDB, providing the models with access to data from various data sources.

Prerequisites

Before proceeding, ensure the following prerequisites are met:

Install MindsDB locally via Docker or Docker Desktop.
To use Ollama within MindsDB, install the required dependencies following this instruction.
Follow this instruction to download Ollama and run models locally.

<Info> Here are the recommended system specifications:

A working Ollama installation, as in point 3.
For 7B models, at least 8GB RAM is recommended.
For 13B models, at least 16GB RAM is recommended.
For 70B models, at least 64GB RAM is recommended. </Info>

Setup

Create an AI engine from the Ollama handler.

sql

CREATE ML_ENGINE ollama
FROM ollama;

Create a model using ollama as an engine.

sql

CREATE MODEL ollama_model
PREDICT completion
USING
   engine = 'ollama',   -- engine name as created via CREATE ML_ENGINE
   model_name = 'model-name',  -- model run with 'ollama run model-name'
   ollama_serve_url = 'http://localhost:11434';

<Tip> If you run Ollama and MindsDB in separate Docker containers, use the `localhost` value of the container. For example, `ollama_serve_url = 'http://host.docker.internal:11434'`. </Tip>

You can find available models here.

Usage

The following usage examples utilize ollama to create a model with the CREATE MODEL statement.

Deploy and use the llama3 model.

First, download Ollama and run the model locally by executing ollama pull llama3.

Now deploy this model within MindsDB.

sql

CREATE MODEL llama3_model
PREDICT completion
USING
   engine = 'ollama',
   model_name = 'llama3';

<Tip> Models can be run in either the 'generate' or 'embedding' modes. The 'generate' mode is used for text generation, while the 'embedding' mode is used to generate embeddings for text.

However, these modes can only be used with models that support them. For example, the moondream model supports both modes.

By default, if the mode is not specified, the model will run in 'generate' mode if multiple modes are supported. If only one mode is supported, the model will run in that mode.

To specify the mode, use the mode parameter in the CREATE MODEL statement. For example, mode = 'embedding'. </Tip>

Query the model to get predictions.

sql

SELECT text, completion
FROM llama3_model
WHERE text = 'Hello';

Here is the output:

sql

+-------+--------------------------------------------------------------------------------------+
| text  | completion                                                                           |
+-------+--------------------------------------------------------------------------------------+
| Hello | Hello back to you! Is there something I can help you with or would you like to chat? |
+-------+--------------------------------------------------------------------------------------+

You can override the prompt message as below:

sql

SELECT text, completion
FROM llama3_model
WHERE text = 'Hello'
USING 
   prompt_template = 'Answer using exactly five words: {{text}}:';

Here is the output:

sql

+-------+------------------------------+
| text  | completion                   |
+-------+------------------------------+
| Hello | Warmly welcome to our space. |
+-------+------------------------------+

<Tip> **Next Steps**

Go to the Use Cases section to see more examples. </Tip>