docs/sources/shared/troubleshoot-query.md
This guide helps you troubleshoot errors that occur when querying logs from Loki. When Loki rejects or fails query requests, it's typically due to query syntax errors, exceeding limits, timeout issues, or storage access problems.
Before you begin, ensure you have the following:
Query errors can be observed using these Prometheus metrics:
loki_request_duration_seconds - Query latency by route and status codeloki_logql_querystats_bytes_processed_per_seconds - Bytes processed during queriesloki_frontend_query_range_duration_seconds_bucket - Frontend query latencyYou can set up alerts on 4xx and 5xx status codes to detect query problems early. This can be helpful when tuning limits configurations.
Parse errors occur when the LogQL query syntax is invalid. Loki returns HTTP status code 400 Bad Request for all parse errors.
Error message:
failed to parse the log query
Or with position details:
parse error at line <line>, col <col>: <message>
Cause:
The LogQL query contains syntax errors. This could be due to:
Common examples:
| Invalid Query | Error | Fix |
|---|---|---|
{app="foo" | Missing closing brace | {app="foo"} |
{app="foo"} |= test | Unquoted filter string | {app="foo"} |= "test" |
rate({app="foo"}[5minutes]) | Invalid duration unit | rate({app="foo"}[5m]) |
{app="foo"} | json | Missing pipe symbol before parser | {app="foo"} | json |
Resolution:
Start with a simple stream selector: {job="app"}, then add filters and operations incrementally to identify syntax issues.
{, }, (, ), [, ] are properly closed.ns, us, ms, s, m, h, d, w, y , for example, 5m not 5minutes.=, !=, =~, !~). Check the LogQL documentation for correct operator usage.Properties:
Error message:
parse error : queries require at least one regexp or equality matcher that does not have an empty-compatible value. For instance, app=~".*" does not meet this requirement, but app=~".+" will
Cause:
The query uses only negative matchers (!=, !~) or matchers that match empty strings (=~".*"), which would select all streams. This is prevented to protect against accidentally querying the entire database.
Invalid examples:
{foo!="bar"}
{app=~".*"}
{foo!~"bar|baz"}
Valid examples:
{foo="bar"}
{app=~".+"}
{app="baz", foo!="bar"}
Resolution:
.+ instead of .* in regex matchers to require at least one character.Properties:
Error message:
only label matchers are supported
Cause:
The query was passed to an API that only accepts label matchers (like the series API), but included additional expressions like line filters or parsers.
Resolution:
Use only stream selectors for APIs that don't support full LogQL:
# Valid for series API
{app="foo", env="prod"}
# Invalid for series API
{app="foo"} |= "error"
Properties:
Error message:
log queries are not supported as an instant query type, please change your query to a range query type
Cause:
A log query (one that returns log lines rather than metrics) was submitted to the instant query endpoint (/loki/api/v1/query). Log queries must use the range query endpoint.
Resolution:
Convert to a range query Convert log queries to range queries with a time range. Range queries are the default in Grafana Explore.
Use the range query endpoint /loki/api/v1/query_range for log queries.
Convert to a metric query if you need to use instant queries:
# This is a log query (returns logs)
{app="foo"} |= "error"
# This is a metric query (can be instant)
count_over_time({app="foo"} |= "error"[5m])
Use Grafana Assistant - If you are a Cloud Logs user, you can use Grafana Assistant to write or revise your query.
Properties:
Error message:
parse error : invalid aggregation sum_over_time without unwrap
Cause:
Aggregation functions like sum_over_time, avg_over_time, min_over_time, max_over_time require an unwrap expression to extract a numeric value from log lines.
Resolution:
Add an unwrap expression to extract the numeric label:
# Invalid
sum_over_time({app="foo"} | json [5m])
# Valid - unwrap a numeric label
sum_over_time({app="foo"} | json | unwrap duration [5m])
Properties:
Error message:
parse error : invalid aggregation count_over_time with unwrap
Cause:
The count_over_time function doesn't use unwrapped values - it just counts log lines. Using it with unwrap is invalid.
Resolution:
Remove the unwrap expression for count_over_time:
# Invalid
count_over_time({app="foo"} | json | unwrap duration [5m])
# Valid
count_over_time({app="foo"} | json [5m])
Use sum_over_time if you want to sum unwrapped values.
Properties:
These errors occur when queries exceed configured resource limits. They return HTTP status code 400 Bad Request.
Error message:
maximum number of series (<limit>) reached for a single query; consider reducing query cardinality by adding more specific stream selectors, reducing the time range, or aggregating results with functions like sum(), count() or topk()
Cause:
The query matches more unique label combinations (series) than the configured limit allows. This protects against queries that would consume excessive memory.
Default configuration:
max_query_series: 500 (default)Resolution:
Add more specific stream selectors to reduce cardinality:
# Too broad
{job="ingress-nginx"}
# More specific
{job="ingress-nginx", namespace="production", pod=~"ingress-nginx-.*"}
Reduce the time range of the query.
Use label filters to narrow down results: {job="app"} |= "error"
Use aggregation functions to reduce cardinality:
sum by (status) (rate({job="nginx"} | json [5m]))
Increase the limit if resources allow:
limits_config:
max_query_series: 1000 #default is 500
Properties:
Error message:
cardinality limit exceeded for {}; 100001 entries, more than limit of 100000
Cause:
The query produces results with too many unique label combinations. This protects against queries that would generate excessive memory usage and slow performance.
Default configuration:
cardinality_limit: 100000Resolution:
Use more specific label selectors to reduce the number of unique streams.
Apply aggregation functions to reduce cardinality:
sum by (status) (rate({job="nginx"}[5m]))
Use by() or without() clauses to group results and reduce dimensions:
sum by (status, method) (rate({job="nginx"} | json [5m]))
Another alternative is using drop or keep to reduce the number of labels and hence the cardinality:
# Drop high-cardinality labels like request_id or trace_id
{job="nginx"} | json | drop request_id, trace_id, session_id
# Keep only the labels you need
{job="nginx"} | json | keep status, method, path
Increase the limit if needed:
limits_config:
cardinality_limit: 200000 #default is 100000
Properties:
Error message:
max entries limit per query exceeded, limit > max_entries_limit_per_query (<requested> > <limit>)
Cause:
The query requests more log entries than the configured maximum. This applies to log queries (not metric queries).
Default configuration:
max_entries_limit_per_query: 5000Resolution:
Reduce the limit parameter in your query request.
Add more specific filters to return fewer results:
{app="foo"} |= "error"
Reduce the time range of the query.
Increase the limit if needed:
limits_config:
max_entries_limit_per_query: 10000 #default is 5000
Properties:
Error message:
the query would read too many bytes (query: <size>, limit: <limit>); consider adding more specific stream selectors or reduce the time range of the query
Cause:
The estimated data volume for the query exceeds the configured limit. This is determined before query execution using index statistics.
Default configuration:
max_query_bytes_read: 0B (disabled by default)Resolution:
Add more specific stream selectors to reduce data volume.
Reduce the time range of the query.
Increase the limit if resources allow:
limits_config:
max_query_bytes_read: 10GB
Properties:
Error message:
the query hit the max number of chunks limit (limit: 2000000 chunks)
Cause:
The number of chunks that the query would read exceeds the configured limit. This protects against queries that would scan excessive amounts of data and consume too much memory.
Default configuration:
max_chunks_per_query: 2000000Resolution:
Narrow stream selectors to reduce the number of matching chunks:
# Too broad
{job="app"}
# More specific
{job="app", environment="production", namespace="api"}
Reduce the query time range to scan fewer chunks.
Increase the limit if resources allow:
limits_config:
max_chunks_per_query: 5000000 #default is 2000000
Properties:
Error message:
max streams matchers per query exceeded, matchers-count > limit (1500 > 1000)
Cause:
The query contains too many stream matchers. This limit prevents queries with excessive complexity that could impact query performance.
Default configuration:
max_streams_matchers_per_query: 1000Resolution:
Simplify your query by using fewer label matchers.
Combine multiple queries instead of using many OR conditions.
Use regex matchers to consolidate multiple values:
# Good: 3 matchers using regex patterns
{cluster="prod", namespace=~"api|web", pod=~"nginx-.*"}
Increase the limit if needed:
limits_config:
max_streams_matchers_per_query: 2000 #default is 1000
Properties:
Error message:
query too large to execute on a single querier: (query: <size>, limit: <limit>); consider adding more specific stream selectors, reduce the time range of the query, or adjust parallelization settings
Or for un-shardable queries:
un-shardable query too large to execute on a single querier: (query: <size>, limit: <limit>); consider adding more specific stream selectors or reduce the time range of the query
Cause:
Even after query splitting and sharding, individual query shards exceed the per-querier byte limit.
Default configuration:
max_querier_bytes_read: 150GB (per querier)Resolution:
Add more specific stream selectors.
Reduce the time range or Break large queries into smaller time ranges.
Simplify the query if possible - some queries cannot be sharded.
Increase the limit (requires more querier resources):
limits_config:
max_querier_bytes_read: 200GB # default is 150GB
Scale querier resources
Properties:
Error message:
[interval] value exceeds limit
Cause:
The range vector interval (in brackets like [5m]) exceeds configured limits.
Resolution:
Reduce the range interval in your query:
# If [1d] is too large, try smaller intervals
rate({app="foo"}[1h])
Check your configuration for max_query_length limits. The default is 30d1h.
Properties:
These errors relate to the time range specified in queries.
Error message:
the query time range exceeds the limit (query length: <duration>, limit: <limit>)
Cause:
The difference between the query's start and end time exceeds the maximum allowed query length.
Default configuration:
max_query_length: 721h (30 days + 1 hour)Resolution:
Reduce the query time range:
# Instead of querying 60 days
logcli query '{app="foo"}' --from="60d" --to="now"
# Query 30 days or less
logcli query '{app="foo"}' --from="30d" --to="now"
Increase the limit if storage retention supports it:
limits_config:
max_query_length: 2160h # 90 days
Properties:
Error message:
this data is no longer available, it is past now - max_query_lookback (<duration>)
Cause:
The entire query time range falls before the max_query_lookback limit. This happens when trying to query data older than the configured lookback period.
Default configuration:
max_query_lookback: 0 (The default value of 0 does not set a limit.)Resolution:
Query more recent data within the lookback window.
Adjust the lookback limit if the data should be queryable:
limits_config:
max_query_lookback: 8760h # 1 year
{{< admonition type="caution" >}} The lookback limit should not exceed your retention period. {{< /admonition >}}
Properties:
Error message:
invalid query, through < from (<end> < <start>)
Cause:
The query end time is before the start time, which is invalid.
Resolution:
Properties:
These errors occur when queries don't meet configured label requirements.
Error message:
stream selector is missing required matchers [<required_labels>], labels present in the query were [<present_labels>]
Cause:
The tenant is configured to require certain label matchers in all queries, but the query doesn't include them.
Default configuration:
required_labels: [] (none required by default)Resolution:
Check with your administrator about which labels are required.
Add the required labels to your query:
# If 'namespace' is required
{app="foo", namespace="production"}
Properties:
Error message:
stream selector has less label matchers than required: (present: [<labels>], number_present: <count>, required_number_label_matchers: <required>)
Cause:
The tenant is configured to require a minimum number of label matchers, but the query has fewer.
Default configuration:
minimum_labels_number: 0 (no minimum by default)Resolution:
Add more label matchers to meet the minimum requirement:
# If minimum is 2, add another selector
{app="foo", namespace="production"}
Properties:
Timeout errors occur when queries take too long to execute.
Error message:
request timed out, decrease the duration of the request or add more label matchers (prefer exact match over regex match) to reduce the amount of data processed
Or:
context deadline exceeded
Cause:
The query exceeded the configured timeout. This can happen due to:
Default configuration:
query_timeout: 1mserver.http_server_read_timeout: 30sserver.http_server_write_timeout: 30sResolution:
Reduce the time range of the query.
Add more specific filters to reduce data processing:
# Less specific (slower)
{namespace=~"prod.*"}
# More specific (faster)
{namespace="production"}
Prefer exact matchers over regex when possible.
Add line filters early in the pipeline:
{app="foo"} |= "error" | json | level="error"
Increase timeout limits (if resources allow):
limits_config:
query_timeout: 5m
server:
http_server_read_timeout: 5m
http_server_write_timeout: 5m
Use sampling for exploratory queries
{job="app"} | line_format "{{__timestamp__}} {{.msg}}" | sample 0.1
Check for network issues between components.
Properties:
Error message:
the request was cancelled by the client
Cause:
The client closed the connection before receiving a response. This is typically caused by:
Resolution:
Properties:
These errors occur when queries are administratively blocked.
Error message:
query blocked by policy
Cause:
The query matches a configured block rule. Administrators create tenant policies and rate limiting rules to block specific queries or query patterns to protect the cluster from expensive or problematic queries.
Resolution:
Configuration reference:
limits_config:
blocked_queries:
- pattern: ".*" # Regex pattern to match
regex: true
types: # Query types to block
- metric
- filter
hash: 0 # Or block specific query hash
Properties:
Error message:
querying is disabled, please contact your Loki operator
Cause:
Query parallelism is set to 0, effectively disabling queries for the tenant.
Resolution:
Contact your Loki administrator to enable querying.
Check configuration for max_query_parallelism:
limits_config:
max_query_parallelism: 32 #(the default)
Properties:
Multi variant queries are an experimental feature that enables support for running multiple query variants over the same underlying data. For example, running both a rate() and count_over_time() query over the same range selector.
Error message:
multi variant queries are disabled for this instance
Cause:
The query uses the variants feature, but it's disabled for the tenant or instance.
Resolution:
Remove variant expressions from the query.
Enable the feature if needed:
limits_config:
enable_multi_variant_queries: true #default is false
Properties:
Pipeline errors occur during log line processing but don't cause query failures. Instead, affected log lines are annotated with error labels.
When a pipeline stage fails (for example, parsing JSON that isn't valid JSON), Loki:
__error__ label with the error type__error_details__ with more information| Error Label Value | Cause |
|---|---|
JSONParserErr | Log line is not valid JSON |
LogfmtParserErr | Log line is not valid logfmt |
SampleExtractionErr | Failed to extract numeric value for metrics |
LabelFilterErr | Label filter operation failed |
TemplateFormatErr | Template formatting failed |
To see logs with errors:
{app="foo"} | json | __error__!=""
To see error details:
{app="foo"} | json | __error__!="" | line_format "Error: {{.__error__}} - {{.__error_details__}}"
To exclude logs with parsing errors:
{app="foo"} | json | __error__=""
To exclude specific error types:
{app="foo"} | json | __error__!="JSONParserErr"
To remove error labels from results:
{app="foo"} | json | drop __error__, __error_details__
These errors occur when connecting to Loki, often when using LogCLI.
Error message:
no org id
Cause:
Multi-tenancy is enabled but no tenant ID was provided in the request.
Resolution:
Add the X-Scope-OrgID header in your request.
For LogCLI, use the --org-id flag:
logcli query '{app="foo"}' --org-id="my-tenant"
In Grafana, configure the tenant ID in the data source settings.
Properties:
Error message:
at most one of HTTP basic auth (username/password), bearer-token & bearer-token-file is allowed to be configured
Or:
at most one of the options bearer-token & bearer-token-file is allowed to be configured
Cause:
Multiple authentication methods are configured simultaneously in LogCLI.
Resolution:
Use only one authentication method:
# Basic auth
logcli query '{app="foo"}' --username="user" --password="pass"
# OR bearer token
logcli query '{app="foo"}' --bearer-token="token"
# OR bearer token file
logcli query '{app="foo"}' --bearer-token-file="/path/to/token"
Properties:
Error message:
run out of attempts while querying the server
Cause:
LogCLI exhausted all retry attempts when trying to reach Loki. This usually indicates:
Resolution:
Check Loki server availability.
Verify network connectivity.
Check authentication credentials.
Increase retries if transient issues are expected:
logcli query '{app="foo"}' --retries=5
Properties:
Error message:
websocket: close 1006 (abnormal closure): unexpected EOF
Cause:
When tailing logs, the WebSocket connection was closed unexpectedly. This can happen if:
Resolution:
Properties:
These errors occur when requested data is not available.
Error message:
no data found
Or an empty result set with no error message.
Cause:
The query time range contains no matching log data. This can happen if:
Resolution:
Verify the time range contains data for your streams.
Check if log ingestion is working correctly:
# Check if any data is being ingested
logcli query '{job=~".+"}'
Verify stream selectors match existing log streams:
# List available streams
curl http://loki:3100/loki/api/v1/series
Check data retention settings to ensure logs are still available.
Use broader selectors to test if any data exists:
{job=~".+"}
Properties:
Error message:
index not ready
Or:
index gateway not ready for time range
Cause:
The index for the requested time range is not yet available for querying. This can happen when:
Default configuration:
query_ready_index_num_days: 0 (all indexes are considered ready)Resolution:
Wait for the index to become available - this is often a temporary issue during startup.
Query more recent data that's available in ingesters:
{app="foo"} # Query last few hours instead of older data
Check the configuration for index readiness:
query_range:
query_ready_index_num_days: 7 #default is 0
Verify index synchronization is working correctly by checking ingester and index gateway logs.
Properties:
Error message:
max concurrent tail requests limit exceeded, count > limit (10 > 5)
Cause:
The tenant has exceeded the maximum number of concurrent streaming (tail) requests. This limit protects the cluster from excessive resource consumption by real-time log streaming.
Default configuration:
max_concurrent_tail_requests: 10Resolution:
Reduce the number of concurrent tail/streaming queries.
Use batch queries instead of real-time streaming where possible:
# Instead of tailing in real-time
# Use periodic range queries
{app="foo"} |= "error"
Increase the limit if more concurrent tails are needed:
limits_config:
max_concurrent_tail_requests: 20 #default is 10
Properties:
These errors occur when Loki cannot read data from storage.
Error message:
failed to load chunk '<chunk_key>'
Cause:
Loki couldn't retrieve a chunk from object storage. Possible causes:
Resolution:
Properties:
Error message:
object not found in storage
Cause:
The requested chunk or object doesn't exist in storage. This might happen if:
Resolution:
Properties:
Error message:
failed to decode chunk '<chunk_key>' for tenant '<tenant>': <error>
Cause:
A chunk was retrieved from storage but couldn't be decoded. This indicates chunk corruption.
Resolution:
Properties:
Follow this workflow when investigating query issues:
Check the error message - Identify which category of error you're encountering.
Review query syntax - Use the LogQL documentation to validate your query.
Check query statistics - In Grafana, enable "Query Inspector" to see:
Simplify the query - Start with a basic selector and add complexity:
# Start simple
{app="foo"}
# Add filters
{app="foo"} |= "error"
# Add parsing
{app="foo"} |= "error" | json
# Add label filters
{app="foo"} |= "error" | json | level="error"
Check metrics for query performance:
# Query latency
histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket[5m])) by (le, route))
# Query errors
sum by (status_code) (rate(loki_request_duration_seconds_count[5m]))
Review Loki logs for detailed error information:
kubectl logs -l app=loki-read --tail=100 | grep -i error
Test with LogCLI for more detailed output:
logcli query '{app="foo"}' --stats --limit=10