website/content/en/blog/2025/optimizing-promql-queries.md
This guide explains how Cortex evaluates PromQL queries, details how time series data is stored and retrieved, and offers strategies to write performant queries — particularly in high-cardinality environments.
Note: If you are new to PromQL, it is recommended to start with the Querying basics documentation.
Prometheus employs a straightforward data model:
Label matchers define the selection criteria for time series within the TSDB. Consider the following PromQL expression:
http_requests_total{cluster="prod", job="envoy"}
the label matchers are:
__name__="http_requests_total"cluster="prod"job="envoy"Prometheus supports four types of label matchers:
| Type | Syntax | Example |
|---|---|---|
| Equal | label="value" | job="envoy" |
| Not Equal | label!="value" | job!="prometheus" |
| Regex Equal | label=~"regex" | job=~"env.*" |
| Regex Not Equal | label!~"regex" | status!~"4.." |
Cortex uses Prometheus's Time Series Database (TSDB) for storing time series data. The Prometheus TSDB is time partitioned into blocks. Each TSDB block is made up of the following files:
ID - ID of the block (ULID)meta.json - Contains the metadata of the blockindex - A binary file that contains the indexchunks - Directory containing the chunk segment filesMore details: TSDB format docs
The index file contains two key mappings for query processing:
Given the following time series:
http_requests_total{cluster="prod", job="envoy", status="200"} -> SeriesID(1)
http_requests_total{cluster="prod", job="envoy", status="400"} -> SeriesID(2)
http_requests_total{cluster="prod", job="envoy", status="500"} -> SeriesID(3)
http_requests_total{cluster="prod", job="prometheus", status="200"} -> SeriesID(4)
The index file would store mappings such as:
__name__=http_requests_total → [1, 2, 3, 4]
cluster=prod → [1, 2, 3, 4]
job=envoy → [1, 2, 3]
job=prometheus → [4]
status=200 → [1, 4]
status=400 → [2]
status=500 → [3]
Each chunk segment file can store up to 512MB of data. Each chunk in the segment file typically holds up to 120 samples.
To optimize PromQL queries effectively, it is essential to understand how queries are executed within Cortex. Consider the following example:
sum(rate(http_requests_total{cluster="prod", job="envoy"}[5m]))
Cortex first identifies the TSDB blocks that fall within the query’s time range. This process is very fast in Cortex and will not add a huge overhead on query execution.
Next, Cortex uses the inverted index to retrieve the set of matching series IDs for each label matcher. For example:
__name__="http_requests_total" → [1, 2, 3, 4]
cluster="prod" → [1, 2, 3, 4]
job="envoy" → [1, 2, 3]
The intersection of these sets yields:
http_requests_total{cluster=“prod”, job=“envoy”, status=“200”}
http_requests_total{cluster=“prod”, job=“envoy”, status=“400”}
http_requests_total{cluster=“prod”, job=“envoy”, status=“500”}
The mapping from series to chunks is used to identify the relevant chunks from the chunk segment files. These chunks are decoded to retrieve the underlying time series samples.
Using the retrieved series and samples, the PromQL engine evaluates the query. There are two modes of running queries:
Several factors influence the latency and resource usage of PromQL queries. This section highlights the key contributors and practical strategies for improving performance.
High cardinality increases the number of time series that must be scanned and evaluated.
The number of samples fetched impacts both memory usage and CPU time for decoding and processing.
Until downsampling is implemented, reducing the scrape interval can help lower the amount of samples to be processed. But this comes at the cost of reduced resolution.
The number of evaluation steps for a range query is computed as:
num of steps = 1 + (end - start) / step
Example: A 24-hour query with a 1-minute step results in 1,441 evaluation steps.
Grafana can automatically set the step size based on the time range. If a query is slow, manually increasing the step parameter can reduce computational overhead.
Wider time ranges amplify the effects of cardinality, sample volume, and evaluation steps.
Subqueries, nested expressions, and advanced functions may lead to substantial CPU consumption.
While Prometheus has optimized regex matching, such queries remain CPU-intensive.
Queries returning large datasets (>100MB) can incur significant serialization and network transfer costs.
pod_container_info #No aggregation
sum by (pod) (rate(container_cpu_seconds_total[1m])) # High cardinality result
The key optimization techniques are: