apps/docs/content/guides/ai/going-to-prod.mdx
This guide will help you to prepare your application for production. We'll provide actionable steps to help you scale your application, ensure that it is reliable, can handle the load, and provide optimal accuracy for your use case.
See our Engineering for Scale guide for more information about engineering at scale.
Sequential scans will result in significantly higher latencies and lower throughput, guaranteeing 100% accuracy and not being RAM bound.
There are a couple of cases where you might not need indexes:
You don't have to create indexes in these cases and can use sequential scans instead. This type of workload will not be RAM bound and will not require any additional resources but will result in higher latencies and lower throughput. Extra CPU cores may help to improve queries per second, but it will not help to improve latency.
On the other hand, if you need to scale your application, you will need to create indexes. This will result in lower latencies and higher throughput, but will require additional RAM to make use of Postgres Caching. Also, using indexes will result in lower accuracy, since you are replacing exact (KNN) search with approximate (ANN) search.
pgvector supports two types of indexes: HNSW and IVFFlat. We recommend using HNSW because of its performance and robustness against changing data.
<Image alt="dbpedia embeddings comparing ivfflat and hnsw queries-per-second using the 4XL compute add-on" src={{ light: '/docs/img/ai/going-prod/dbpedia-ivfflat-vs-hnsw-4xl--light.png', dark: '/docs/img/ai/going-prod/dbpedia-ivfflat-vs-hnsw-4xl--dark.png', }} width={1052} height={796}
/>
ef_construction, ef_search, and mIndex build parameters:
m is the number of bi-directional links created for every new element during construction. Higher m is suitable for datasets with high dimensionality and/or high accuracy requirements. Reasonable values for m are between 2 and 100. Range 12-48 is a good starting point for most use cases (16 is the default value).
ef_construction is the size of the dynamic list for the nearest neighbors (used during the construction algorithm). Higher ef_construction will result in better index quality and higher accuracy, but it will also increase the time required to build the index. ef_construction has to be at least 2 * m (64 is the default value). At some point, increasing ef_construction does not improve the quality of the index. You can measure accuracy when ef_search=ef_construction: if accuracy is lower than 0.9, then there is room for improvement.
Search parameters:
ef_search is the size of the dynamic list for the nearest neighbors (used during the search). Increasing ef_search will result in better accuracy, but it will also increase the time required to execute a query (40 is the default value).<Image alt="dbpedia embeddings comparing hnsw queries-per-second using different build parameters" src={{ light: '/docs/img/ai/going-prod/dbpedia-hnsw-build-parameters--light.png', dark: '/docs/img/ai/going-prod/dbpedia-hnsw-build-parameters--dark.png', }} width={1052} height={796}
/>
probes and listsIndexes used for approximate vector similarity search in pgvector divides a dataset into partitions. The number of these partitions is defined by the lists constant. The probes controls how many lists are going to be searched during a query.
The values of lists and probes directly affect accuracy and queries per second (QPS).
lists means an index will be built slower, but you can achieve better QPS and accuracy.probes means that select queries will be slower, but you can achieve better accuracy.lists and probes are not independent. Higher lists means that you will have to use higher probes to achieve the same accuracy.You can find more examples of how lists and probes constants affect accuracy and QPS in pgvector 0.4.0 performance blogpost.
<Image alt="multi database" src={{ light: '/docs/img/ai/going-prod/lists-count--light.png', dark: '/docs/img/ai/going-prod/lists-count--dark.png', }} width={1467} height={808}
/>
First, a few generic tips which you can pick and choose from:
inner-product to L2 or Cosine distances if your vectors are normalized (like text-embedding-ada-002). If embeddings are not normalized, Cosine distance should give the best results with an index.select pg_prewarm('vecs.docs_vec_idx');. This will help to avoid cold cache issues.m and ef_construction or lists constants for the pgvector index to accelerate your queries (at the expense of a slower build times). For instance, for benchmarks with 1,000,000 OpenAI embeddings, we set m and ef_construction to 32 and 80, and it resulted in 35% higher QPS than 24 and 56 values respectively.5, but it's better to start with a larger size to get the best results for RAM requirements. (We'd recommend at least 8XL if you're using Supabase.)vecs library, it will automatically generate an index with default parameters.vecs library with the ann-benchmarks tool. Do it with default values for index build parameters, you can later adjust them to get the best results.vecs library for that as well with ann-benchmarks tool. Tweak ef_search for HNSW or probes for IVFFlat until you see that both accuracy and QPS match your requirements.m and ef_construction for HNSW or lists for IVFFlat parameters (consider switching from IVF to HNSW). You have to rebuild the index with a higher m and ef_construction values and repeat steps 6-7 to find the best combination of m, ef_construction and ef_search constants to achieve the best QPS and accuracy values. Higher m, ef_construction mean that index will build slower, but you can achieve better QPS and accuracy. Higher ef_search mean that select queries will be slower, but you can achieve better accuracy.Don't forget to check out the general Production Checklist to ensure your project is secure, performant, and will remain available for your users.
You can look at our Choosing Compute Add-on guide to get a basic understanding of how much compute you might need for your workload.
Or take a look at our pgvector 0.5.0 performance and pgvector 0.4.0 performance blog posts to see what pgvector is capable of and how the above technique can be used to achieve the best results.
<Image alt="multi database" src={{ light: '/docs/img/ai/going-prod/size-to-rps--light.png', dark: '/docs/img/ai/going-prod/size-to-rps--dark.png', }} width={1427} height={862}
/>