doc/development/database/query_performance.md
This document describes various guidelines to follow when optimizing SQL queries.
When you are optimizing your SQL queries, there are two dimensions to pay attention to:
| Query Type | Maximum Query Time | Notes |
|---|---|---|
| General queries | 100ms | This is not a hard limit, but if a query is getting above it, it is important to spend time understanding why it can or cannot be optimized. |
| Queries in a migration | 100ms | This is different than the total migration time. |
| Concurrent operations in a migration | 5min | Concurrent operations do not block the database, but they block the GitLab update. This includes operations such as add_concurrent_index, add_concurrent_foreign_key, and validate constraint (e.g. adding text limit via add_text_limit). |
| Concurrent operations in a post migration | 20min | Concurrent operations do not block the database, but they block the GitLab post update process. This includes operations such as add_concurrent_index, add_concurrent_foreign_key, and validate constraint (e.g. adding text limit via add_text_limit). If index creation exceeds 20 minutes, consider async index creation. |
| Background migrations | 1s | |
| Service Ping | 1s | See the Metrics Instrumentation docs for more details. |
When evaluating query performance it is important to understand the difference between cold and warm cached queries.
The first time a query is made, it is made on a "cold cache". Meaning it needs to read from disk. If you run the query again, the data can be read from the cache, or what PostgreSQL calls shared buffers. This is the "warm cache" query.
When analyzing an EXPLAIN plan, you can see
the difference not only in the timing, but by looking at the output for Buffers
by running your explain with EXPLAIN(analyze, buffers). Database Lab
automatically includes these options.
If you are making a warm cache query, you see only the shared hits.
For example, using Database Lab:
Shared buffers:
- hits: 36467 (~284.90 MiB) from the buffer pool
- reads: 0 from the OS file cache, including disk I/O
Or in the explain plan from psql:
Buffers: shared hit=7323
If the cache is cold, you also see reads.
Using Database Lab:
Shared buffers:
- hits: 17204 (~134.40 MiB) from the buffer pool
- reads: 15229 (~119.00 MiB) from the OS file cache, including disk I/O
In psql:
Buffers: shared hit=7202 read=121
We often build filtered list views and APIs in GitLab which need to have many different filter and sorting options. All these options are usually encapsulated in finders and exposed by API/GraphQL arguments. While we have many possible pagination performance optimizations , there is often no way to make all combinations of sorting and filtering performant. Attempts to make many options performant might involve adding too many indexes which sacrifices performance of our primary database. This is only justified for common use cases and should not be considered as a way to make all permutations of filter and sort performant. What this means practically is that there will likely be filtered views and API requests that timeout when certain sorting or filtering options are applied. We still allow them to be added by teams where they benefit certain customers with specific combinations of filtering/sorting, but we need to accept that they will timeout for some users.