docs/integrations/vector-db-integrations/pgvector.mdx
This is the implementation of the PGVector for MindsDB.
PGVector is an open-source vector similarity search for Postgres. It supports the following:
This handler uses pgvector Python library.
To connect to a PGVector instance, use the following statement:
CREATE DATABASE pvec
WITH
ENGINE = 'pgvector',
PARAMETERS = {
"host": "127.0.0.1",
"port": 5432,
"database": "postgres",
"user": "user",
"password": "password",
"distance": "cosine"
};
The required arguments to establish a connection are the following:
host: The host name or IP address of the postgres instance.port: The port to use when connecting.database: The database to connect to.user: The user to connect as.password: The password to use when connecting.distance: It defines how the distance between vectors is calculated. Available methods include cosine (default), l1, l2, ip, hamming, jaccard. Learn more here.where you have postgres installed run the following commands to install the pgvector extension
cd /tmp git clone --branch v0.4.4 https://github.com/pgvector/pgvector.git cd pgvector make make install
Ensure you install all from requirements.txt in the pgvector_handler folder
You can create a database connection like you would for a regular postgres database, the only difference is that you need to specify the engine as pgvector
CREATE DATABASE pvec
WITH
ENGINE = 'pgvector',
PARAMETERS = {
"host": "127.0.0.1",
"port": 5432,
"database": "postgres",
"user": "user",
"password": "password"
};
You can insert data into a new collection like so
CREATE TABLE pvec.embed
(SELECT embeddings FROM mysql_demo_db.test_embeddings
);
CREATE ML_ENGINE openai
FROM openai
USING
api_key = 'your-openai-api-key';
CREATE MODEL openai_emb
PREDICT embedding
USING
engine = 'openai',
model_name='text-embedding-ada-002',
mode = 'embedding',
question_column = 'review';
create table pvec.itemstest (
SELECT m.embedding AS embeddings, t.review content FROM mysql_demo_db.amazon_reviews t
join openai_emb m
);
You can query a collection within your PGVector as follows:
SELECT *
FROM pvec.embed
Limit 5;
SELECT *
FROM pvec.itemstest
Limit 5;
You can query on semantic search like so:
SELECT *
FROM pvec3.items_test
WHERE embeddings = (select * from mindsdb.embedding) LIMIT 5;