apps/www/_blog/2023-07-13-pgvector-performance.mdx
🚀 The incorporation of the HNSW index in pgvector v0.5.0 ensures lightning-fast vector searches. We tested it, benchmarked it, and shared everything. Read the new post
</div>There are a few pgvector benchmarks floating around the internet, most recently a pgvector vs Qdrant comparison by NirantK. We wanted to reproduce (or improve!) the results.
There is an obvious bias here: we're a Postgres company. It's not our goal to prove that pgvector is better than Qdrant for running vector workloads. From everything we hear about Qdrant, it's fantastic.
Our goals in this article are:
We've used the ANN Benchmarks methodology, a standard for benchmarking vector databases.
The key elements are:
All compute add-ons available on Supabase were used to run our benchmarks. Each add-on variant has a different allocation of RAM and CPU cores, the details of which are available in our docs. Each Supabase compute add-on comes with a specific set of optimizations (version 2023-07).
| Instance | CPU | Memory |
|---|---|---|
| 2XL | 8-core ARM (dedicated) | 32 GB |
| 4XL | 16-core ARM (dedicated) | 64 GB |
| 8XL | 32-core ARM (dedicated) | 128 GB |
| 12XL | 48-core ARM (dedicated) | 192 GB |
| 16XL | 64-core ARM (dedicated) | 256 GB |
We tested using the same dataset as the Qdrant comparison: dbpedia-entities-openai-1M. This dataset includes 1M embeddings with 1536 dimensions (created using OpenAI). The embeddings are created by Wikipedia articles. It's a great dataset!
<div className="bg-gray-300 rounded-lg px-6 py-2 italic">We also have some useful benchmarks in our docs for gist-960-angular (1M image embeddings, 960 dimensions) and GloVe Reddit comments (1.6M text embeddings, 512 dimensions).
</div>Let's start with NirantK's results as a baseline:
<div> </div>They aren't very flattering! Repeating our statements above, these benchmarks use the defaults for both engines. Our goal now is to replicate the results, and then see what improvements need to be made as developers scale up their workload.
Our tests mirrored NirantK's: but incorporated slight variations:
Same:
Changed:
inner-product distance function.lists constant for an index equal to 2000 instead of 1000.probes = 1).The resulting figures were significantly different after these changes.
With the changes above and probes set to 10, pgvector was faster and more accurate:
If we increase the probes from 10 to 40, pgvector was not just substantially faster but also boasted almost the same accuracy as Qdrant:
Another key takeaway is that the performance scales predictably with the size of the database. For instance, a 4XL instance achieves accuracy@10 of 0.98 and QPS of 270 with probes set to 40. Moreover, an 8XL compute add-on analogously obtains accuracy@10 of 0.98 and an QPS of 470, surpassing the results of Qdrant.
<div className="bg-gray-300 rounded-lg px-6 py-2 italic">The Qdrant benchmark uses “default” configuration and is in not indicative of its capabilities after modifying the configuration.
</div> <div> </div>Although more compute is required to match Qdrant's accuracy and QPS levels concurrently, this is still a satisfying outcome. It means that it's not a necessity to use another vector database. You can put everything in Postgres to lower your operational complexity.
Putting it all together, we find that we can predictably scale our database to match the performance we need.
With a 64-core, 256 GB server we achieve ~1800 QPS and 0.91 accuracy. This is for pgvector 0.4.0, and we've heard that the latest version (0.4.4) already has significant improvements. We'll release those benchmarks as soon as we have them.
<div> </div>It's been about 5 months since we added pgvector to the platform. Since then we've discovered a few other important things to keep in mind.
Another way to improve performance without throwing more compute would be to increase lists.
We ran a test to measure the impact of list size: we uploaded 90,000 vectors from the Wikipedia dataset and then queried 10,000 vectors from the same dataset. The documentation recommends to use lists constant of number of vectors / 1000. In this case, it would be 90.
But as our experiment shows, we can improve QPS if we increase lists (i.e. with more lists in the index we need to get less index data to get the same accuracy). So for 95% accuracy, we can take any of:
This also has an important caveat: building the index takes longer with more lists. Here we measure the index build time for a dataset containing 900,000 vectors:
So if you can afford an index build time of 1 hour or more, you can go with lists=5000 (number of vectors / 200) or more!
You may need to increase maintenance_work_mem to be able to create an index with high values for lists. For example:
SET maintenance_work_mem TO '7168 MB';
Keeping in mind that the overall index size is almost the same, and only index build time increases, we can say that it's better to use more lists for better select queries speed.
Embeddings created from “real” data are more likely to be clustered together, whereas random embeddings are more likely to be scattered. In other words, real embeddings are very far from being randomly distributed. This might seem obvious, but it's an important call-out for benchmarks.
Embeddings generated for similarity search using “real world data” will be more correlated, so the accuracy will be higher as well. You can see the difference in this chart using 10,000 Wikipedia embeddings, vs 10,000 randomly-generated embeddings:
<div> </div>Armed with all this information, we can safely give you a few tips and strategies for optimizing your pgvector workloads.
First, a few generic tips which you can pick and choose from:
inner-product to L2 or Cosine distances if your vectors are normalized (like text-embedding-ada-002). If embeddings are not normalized, Cosine distance should give the best results with an index.lists constant of 2000 (number of vectors / 500) as opposed to the suggested 1000 (number of vectors / 1000).probes constant, balancing accuracy with QPS.Before running your pgvector workload in production, here are a few steps you can take to maximize performance.
5, but it's better to start with a larger size to get the best results for RAM requirements. (We'd recommend at least 8XL if you're using Supabase.)vecs library, it will automatically generate an index with default parameters.vecs library with the ann-benchmarks tool. Do it with probes set to 10 (default) and then with probes set to 100 or more, so QPS will be lower than 10.vecs library for that as well with ann-benchmarks tool. Do it with probes set to 10 (default) and then gradually increase/decrease probes value until you see that both accuracy and QPS match your requirements.lists constantly. You have to rebuild the index with higher lists value and repeat steps 6-7 to find the best combination of lists and probes constants to achieve the best QPS and accuracy values. Higher lists mean that index will build slower, but you can achieve better QPS and accuracy. Higher probes mean that select queries will be slower, but you can achieve better accuracy.pgvector is still early in development. As with any open source tool, it needs time and resources to make it better. Supabase plans to continue supporting Andrew with his development of pgvector.
What's next on the roadmap? Andrew has an impressive list of features planned for v0.5.0:
✅ Adding HNSW: an index with better speed & accuracy than IVFFlat (at a higher memory cost)