changelogs/drizzle-orm/0.31.0-beta.md
The previous Drizzle+PostgreSQL indexes API was incorrect and was not aligned with the PostgreSQL documentation. The good thing is that it was not used in queries, and drizzle-kit didn't support all properties for indexes. This means we can now change the API to the correct one and provide full support for it in drizzle-kit
Previous API
.on..using and .on in our case are the same thing, so the API is incorrect here..asc(), .desc(), .nullsFirst(), and .nullsLast() should be specified for each column or expression on indexes, but not on an index itself.// Index declaration reference
index('name')
.on(table.column1, table.column2, ...) or .onOnly(table.column1, table.column2, ...)
.concurrently()
.using(sql``) // sql expression
.asc() or .desc()
.nullsFirst() or .nullsLast()
.where(sql``) // sql expression
Current API
// First example, with `.on()`
index('name')
.on(table.column1.asc(), table.column2.nullsFirst(), ...) or .onOnly(table.column1.desc().nullsLast(), table.column2, ...)
.concurrently()
.where(sql``)
.with({ fillfactor: '70' })
// Second Example, with `.using()`
index('name')
.using('btree', table.column1.asc(), sql`lower(${table.column2})`, table.column1.op('text_ops'))
.where(sql``) // sql expression
.with({ fillfactor: '70' })
There is no specific code to create an extension inside the Drizzle schema. We assume that if you are using vector types, indexes, and queries, you have a PostgreSQL database with the
pg_vectorextension installed.
You can now specify indexes for pg_vector and utilize pg_vector functions for querying, ordering, etc.
Let's take a few examples of pg_vector indexes from the pg_vector docs and translate them to Drizzle
// CREATE INDEX ON items USING hnsw (embedding vector_l2_ops);
// CREATE INDEX ON items USING hnsw (embedding vector_ip_ops);
// CREATE INDEX ON items USING hnsw (embedding vector_cosine_ops);
const table = pgTable('items', {
embedding: vector('embedding', { dimensions: 3 })
}, (table) => ({
l2: index('l2_index').using('hnsw', table.embedding.op('vector_l2_ops'))
ip: index('ip_index').using('hnsw', table.embedding.op('vector_ip_ops'))
cosine: index('cosine_index').using('hnsw', table.embedding.op('vector_cosine_ops'))
}))
// CREATE INDEX ON items USING hnsw (embedding vector_l1_ops);
// CREATE INDEX ON items USING hnsw (embedding bit_hamming_ops);
// CREATE INDEX ON items USING hnsw (embedding bit_jaccard_ops);
const table = pgTable('table', {
embedding: vector('embedding', { dimensions: 3 })
}, (table) => ({
l1: index('l1_index').using('hnsw', table.embedding.op('vector_l1_ops'))
hamming: index('hamming_index').using('hnsw', table.embedding.op('bit_hamming_ops'))
bit: index('bit_jaccard_index').using('hnsw', table.embedding.op('bit_jaccard_ops'))
}))
For queries, you can use predefined functions for vectors or create custom ones using the SQL template operator.
You can also use the following helpers:
import { l2Distance, l1Distance, innerProduct,
cosineDistance, hammingDistance, jaccardDistance } from 'drizzle-orm'
l2Distance(table.column, [3, 1, 2]) // table.column <-> '[3, 1, 2]'
l1Distance(table.column, [3, 1, 2]) // table.column <+> '[3, 1, 2]'
innerProduct(table.column, [3, 1, 2]) // table.column <#> '[3, 1, 2]'
cosineDistance(table.column, [3, 1, 2]) // table.column <=> '[3, 1, 2]'
hammingDistance(table.column, '101') // table.column <~> '101'
jaccardDistance(table.column, '101') // table.column <%> '101'
If pg_vector has some other functions to use, you can replicate implimentation from existing one we have. Here is how it can be done
export function l2Distance(
column: SQLWrapper | AnyColumn,
value: number[] | string[] | TypedQueryBuilder<any> | string,
): SQL {
if (is(value, TypedQueryBuilder<any>) || typeof value === 'string') {
return sql`${column} <-> ${value}`;
}
return sql`${column} <-> ${JSON.stringify(value)}`;
}
Name it as you wish and change the operator. This example allows for a numbers array, strings array, string, or even a select query. Feel free to create any other type you want or even contribute and submit a PR
Let's take a few examples of pg_vector queries from the pg_vector docs and translate them to Drizzle
import { l2Distance } from 'drizzle-orm';
// SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 5;
db.select().from(items).orderBy(l2Distance(items.embedding, [3,1,2]))
// SELECT embedding <-> '[3,1,2]' AS distance FROM items;
db.select({ distance: l2Distance(items.embedding, [3,1,2]) })
// SELECT * FROM items ORDER BY embedding <-> (SELECT embedding FROM items WHERE id = 1) LIMIT 5;
const subquery = db.select({ embedding: items.embedding }).from(items).where(eq(items.id, 1));
db.select().from(items).orderBy(l2Distance(items.embedding, subquery)).limit(5)
// SELECT (embedding <#> '[3,1,2]') * -1 AS inner_product FROM items;
db.select({ innerProduct: sql`(${maxInnerProduct(items.embedding, [3,1,2])}) * -1` }).from(items)
// and more!