docs/sources/operations/troubleshooting/troubleshoot-operations.md
This guide helps you troubleshoot errors that occur during Loki operations, including configuration issues, storage backend problems, cluster communication failures, and service component errors. These errors are distinct from ingestion (write path) and query (read path) errors covered in separate troubleshooting topics.
Before you begin, ensure you have the following:
Configuration errors occur during Loki startup or when loading runtime configuration. These errors prevent Loki from starting or operating correctly.
Error message:
MULTIPLE CONFIG ERRORS FOUND, PLEASE READ CAREFULLY
<list of configuration errors>
Cause:
Multiple configuration validation errors were detected during startup. Loki aggregates all configuration errors rather than failing on the first one.
Resolution:
Review all listed errors carefully - each error message describes a specific configuration problem.
Check your configuration file for syntax errors and invalid values.
Validate your configuration before applying:
loki -config.file=/path/to/config.yaml -verify-config
Properties:
Error message:
too many storage configs provided in the common config, please only define one storage backend
Cause:
Multiple storage backends are configured in the common configuration section. Loki requires a single storage backend for the common config.
Resolution:
Use only one storage backend in your common config:
common:
storage:
# Choose only ONE of the following:
s3:
endpoint: s3.amazonaws.com
bucketnames: loki-data
# OR
gcs:
bucket_name: loki-data
# OR
azure:
container_name: loki-data
For multiple storage backends, configure them explicitly in specific sections rather than common config.
Properties:
Error message:
if persist_tokens is true, path_prefix MUST be defined
Cause:
The persist_tokens option is enabled for a ring but no path_prefix is specified. Loki needs a path to store the token file.
Resolution:
Set the path prefix:
common:
path_prefix: /var/loki
persist_tokens: true
ingester:
lifecycler:
ring:
kvstore:
store: memberlist
tokens_file_path: /var/loki/tokens
Or disable persist_tokens if you don't need token persistence:
common:
persist_tokens: false
Properties:
Error message:
both `grpc_client_config` and (`query_frontend_grpc_client` or `query_scheduler_grpc_client`) are set at the same time. Please use only `query_frontend_grpc_client` and `query_scheduler_grpc_client`
Cause:
Both the deprecated grpc_client_config and the newer specific gRPC client configs are set. These are mutually exclusive.
Resolution:
Remove the deprecated config and use specific gRPC client configs:
# Remove this:
# grpc_client_config: ...
# Use these instead:
query_frontend_grpc_client:
max_recv_msg_size: 104857600
query_scheduler_grpc_client:
max_recv_msg_size: 104857600
Properties:
Error message:
CONFIG ERROR: schema v13 is required to store Structured Metadata and use native OTLP ingestion, your schema version is <version>. Set `allow_structured_metadata: false` in the `limits_config` section or set the command line argument `-validation.allow-structured-metadata=false` and restart Loki. Then proceed to update to schema v13 or newer before re-enabling this config, search for 'Storage Schema' in the docs for the schema update procedure
Cause:
Structured metadata is enabled but the active schema version is older than v13. Structured metadata requires schema v13 or newer.
Resolution:
Disable structured metadata temporarily:
limits_config:
allow_structured_metadata: false
Update your schema config to v13 or newer:
schema_config:
configs:
- from: "2024-04-01"
store: tsdb
object_store: s3
schema: v13
index:
prefix: index_
period: 24h
Re-enable structured metadata after the schema migration is complete.
Properties:
Error message:
CONFIG ERROR: `tsdb` index type is required to store Structured Metadata and use native OTLP ingestion, your index type is `<type>` (defined in the `store` parameter of the schema_config). Set `allow_structured_metadata: false` in the `limits_config` section or set the command line argument `-validation.allow-structured-metadata=false` and restart Loki. Then proceed to update the schema to use index type `tsdb` before re-enabling this config, search for 'Storage Schema' in the docs for the schema update procedure
Cause:
Structured metadata is enabled but the active index type is not TSDB. Structured metadata requires the TSDB index type.
Resolution:
Disable structured metadata temporarily and migrate to the TSDB index type:
limits_config:
allow_structured_metadata: false
schema_config:
configs:
- from: "2024-01-01"
store: tsdb
object_store: s3
schema: v13
Re-enable structured metadata after migrating to TSDB.
Properties:
Error message:
CONFIG ERROR: `tsdb` index type is configured in at least one schema period, however, `storage_config`, `tsdb_shipper`, `active_index_directory` is not set, please set this directly or set `path_prefix:` in the `common:` section
Or:
CONFIG ERROR: `tsdb` index type is configured in at least one schema period, however, `storage_config`, `tsdb_shipper`, `cache_location` is not set, please set this directly or set `path_prefix:` in the `common:` section
Cause:
The TSDB index type is configured in the schema but required local directories for index files are not set.
Resolution:
Set the common path prefix (simplest approach):
common:
path_prefix: /var/loki
Or configure directories explicitly:
storage_config:
tsdb_shipper:
active_index_directory: /var/loki/tsdb-index
cache_location: /var/loki/tsdb-cache
Properties:
Error message:
CONFIG ERROR: `compactor:` `working_directory:` is empty, please set a valid directory or set `path_prefix:` in the `common:` section
Cause:
The compactor requires a working directory for index compaction, but none is configured.
Resolution:
Set the common path prefix:
common:
path_prefix: /var/loki
Or set the compactor working directory explicitly:
compactor:
working_directory: /var/loki/compactor
Properties:
Error message:
CONFIG ERROR: the active index is <type> which is configured to use an `index_cache_validity` (TTL) of <duration>, however the chunk_retain_period is <duration> which is LESS than the `index_cache_validity`. This can lead to query gaps, please configure the `chunk_retain_period` to be greater than the `index_cache_validity`
Cause:
The chunk retain period is shorter than the index cache validity (TTL), which can cause query gaps where data exists in the index cache but the chunks have already been flushed and removed from ingesters.
Resolution:
Increase the chunk retain period to be greater than the index cache validity:
ingester:
chunk_retain_period: 15m # Must be > index_cache_validity
storage_config:
index_cache_validity: 5m
Properties:
Error message:
CONFIG ERROR: invalid target, cannot run backend target with legacy read mode
Cause:
The backend target is configured while legacy read mode is enabled. These are incompatible deployment configurations.
Resolution:
Disable legacy read mode if using the backend target:
# Remove or set to false:
legacy_read_mode: false
Or use a different target compatible with legacy read mode.
Properties:
Error message:
unrecognized `store` (index) type `<type>`, choose one of: <supported_types>
Or:
unrecognized `object_store` type `<type>`, which also does not match any named_stores. Choose one of: <supported_types>. Or choose a named_store
Cause:
The schema configuration references an index type or object store type that Loki does not recognize.
Resolution:
Use a supported index type: tsdb (recommended) or boltdb-shipper
Use a supported object store type: s3, gcs, azure, swift, filesystem, bos
Or reference a valid named store defined in your configuration:
storage_config:
named_stores:
aws:
my-store:
endpoint: s3.amazonaws.com
bucketnames: my-bucket
schema_config:
configs:
- from: 2024-01-01
store: tsdb
object_store: my-store # References the named store
Properties:
Error message:
overrides-exporter has been enabled, but no runtime configuration file was configured
Cause:
The overrides-exporter target is enabled but no runtime configuration file is provided. The overrides-exporter needs a runtime config to expose tenant-specific limit overrides as metrics.
Resolution:
Configure a runtime configuration file:
runtime_config:
file: /etc/loki/runtime-config.yaml
Or disable the overrides-exporter if not needed by removing it from your target list.
Properties:
Error message:
invalid override for tenant <tenant>: <details>
Cause:
The runtime configuration file contains an invalid override for a specific tenant. The override failed validation.
Resolution:
Properties:
Error message:
retention period must be >= 24h was <duration>
Cause:
A stream-level retention rule specifies a retention period shorter than 24 hours, which is the minimum allowed.
Resolution:
Set retention periods to at least 24 hours:
limits_config:
retention_stream:
- selector: '{namespace="dev"}'
priority: 1
period: 24h # Must be >= 24h
Properties:
Error message:
it is an error to specify a non zero `query_store_max_look_back_period` value when using any object store other than `filesystem`
Cause:
The query_store_max_look_back_period is set to a non-zero value with a storage backend other than filesystem. This setting only applies to local filesystem storage.
Resolution:
Remove the setting if using object storage:
# Remove or set to 0:
query_store_max_look_back_period: 0
Or use filesystem storage if this setting is needed for local development.
Properties:
Authentication and tenant errors occur when requests are missing required tenant identification or when tenant IDs are invalid. In multi-tenant mode, every request must include a valid tenant ID.
Error message:
no org id
Cause:
A request was made to Loki without the required X-Scope-OrgID header. In multi-tenant mode, every request must identify the tenant.
Resolution:
Add the X-Scope-OrgID header to your requests:
curl -H "X-Scope-OrgID: my-tenant" http://loki:3100/loki/api/v1/push ...
For Grafana, configure the tenant ID in the Loki data source settings under "HTTP Headers".
For Alloy, set the tenant ID in the loki.write component:
loki.write "default" {
endpoint {
url = "http://loki:3100/loki/api/v1/push"
tenant_id = "my-tenant"
}
}
Disable multi-tenancy for single-tenant deployments:
auth_enabled: false
Properties:
Error message:
multiple org IDs present
Cause:
The request contains multiple different tenant IDs, but the operation requires a single tenant. This can happen when a request is forwarded through multiple proxies that each inject a tenant ID.
Resolution:
Ensure only one tenant ID is set in the X-Scope-OrgID header.
Check proxy configurations for conflicting tenant ID injection.
For cross-tenant queries, use pipe-separated tenant IDs only where supported:
curl -H "X-Scope-OrgID: tenant1|tenant2" http://loki:3100/loki/api/v1/query ...
Properties:
Error message:
tenant ID is too long: max 150 characters
Cause:
The tenant ID exceeds the maximum allowed length of 150 characters.
Resolution:
Properties:
Error message:
tenant ID is '.' or '..'
Cause:
The tenant ID is set to . or .., which are reserved filesystem path components and could cause path traversal issues.
Resolution:
. or ...Properties:
Error message:
tenant ID '<id>' contains unsupported character '<char>'
Cause:
The tenant ID contains characters that are not allowed. Tenant IDs must consist of alphanumeric characters, hyphens, underscores, and periods.
Resolution:
-), underscores (_), and periods (.).Properties:
Error message:
deletion is not available for this tenant
Cause:
A delete request was submitted for a tenant that does not have deletion enabled. Log deletion must be explicitly enabled per tenant.
Resolution:
Enable deletion for the tenant in the runtime configuration:
overrides:
my-tenant:
deletion_mode: filter-and-delete # Or "filter-only"
Valid deletion modes:
disabled - Deletion is not allowed (default)filter-only - Lines matching delete requests are filtered at query time but not physically deletedfilter-and-delete - Lines are filtered at query time and physically deleted during compactionEnsure the compactor is configured for retention:
compactor:
retention_enabled: true
delete_request_store: s3
Properties:
Storage backend errors occur when Loki cannot communicate with or properly configure object storage (Amazon S3, Google Cloud Services, Microsoft Azure, Swift, or filesystem).
Error message:
unsupported storage backend
Cause:
The specified storage backend type is not recognized. This typically occurs when a typo exists in the storage type configuration.
Resolution:
Use a valid storage backend type:
s3 - Amazon S3 or S3-compatible storagegcs - Google Cloud Storageazure - Azure Blob Storageswift - OpenStack Swiftfilesystem - Local filesystembos - Baidu Object Storagestorage_config:
boltdb_shipper:
shared_store: s3 # Must be one of the valid types
Properties:
Error message:
storage prefix contains invalid characters, it may only contain digits, English alphabet letters and dashes
Cause:
The storage path prefix contains invalid characters. Only alphanumeric characters and dashes are allowed.
Resolution:
Use valid characters in your storage prefix:
storage_config:
# Invalid: prefix_with_underscore_or/special chars
# Valid: my-loki-data or lokilogs123
aws:
s3: s3://my-bucket/my-loki-data
Properties:
Error message:
unsupported S3 SSE type
Cause:
The S3 server-side encryption (SSE) type is not supported. Loki supports specific SSE types.
Resolution:
Use a supported SSE type:
storage_config:
aws:
sse:
type: SSE-S3 # Or SSE-KMS
Supported types:
SSE-S3 - Server-side encryption with Amazon S3-managed keysSSE-KMS - Server-side encryption with AWS KMS-managed keysProperties:
Error message:
invalid S3 SSE encryption context
Cause:
The SSE-KMS encryption context is malformed and cannot be parsed as valid JSON.
Resolution:
Provide valid JSON for the encryption context:
storage_config:
aws:
sse:
type: SSE-KMS
kms_key_id: alias/my-key
kms_encryption_context: '{"key": "value"}' # Valid JSON
Properties:
Error message:
the endpoint must not prefixed with the bucket name
Cause:
The S3 endpoint incorrectly includes the bucket name as a prefix. This can cause path-style vs virtual-hosted-style URL issues.
Resolution:
Remove the bucket name from the endpoint and configure it separately:
storage_config:
aws:
# Incorrect:
# endpoint: my-bucket.s3.amazonaws.com
# Correct:
endpoint: s3.amazonaws.com
bucketnames: my-bucket
Properties:
Error message:
sts-endpoint must be a valid url
Cause:
The AWS STS (Security Token Service) endpoint URL is malformed or invalid.
Resolution:
Provide a valid URL for the STS endpoint:
storage_config:
aws:
sts_endpoint: https://sts.us-east-1.amazonaws.com
Properties:
Error message:
connection string is either blank or malformed. The expected connection string should contain key value pairs separated by semicolons. For example 'DefaultEndpointsProtocol=https;AccountName=<accountName>;AccountKey=<accountKey>;EndpointSuffix=core.windows.net'
Cause:
The Azure storage connection string is missing or doesn't follow the expected format.
Resolution:
Use a valid connection string format:
storage_config:
azure:
# Use account credentials:
account_name: myaccount
account_key: mykey
# Or connection string:
connection_string: "DefaultEndpointsProtocol=https;AccountName=myaccount;AccountKey=mykey;EndpointSuffix=core.windows.net"
Verify the connection string in Azure Portal under Storage Account > Access Keys.
Properties:
Error message:
unrecognized named storage config <name>
Or for specific backends:
unrecognized named s3 storage config <name>
unrecognized named gcs storage config <name>
unrecognized named azure storage config <name>
unrecognized named filesystem storage config <name>
unrecognized named swift storage config <name>
Or for an unrecognized store type:
unrecognized named storage type: <storeType>
Cause:
A named storage configuration referenced in the schema config doesn't exist in the named stores configuration.
Resolution:
Define the named store in your configuration:
storage_config:
named_stores:
aws:
my-s3-store: # This name must match the reference
endpoint: s3.amazonaws.com
bucketnames: my-bucket
schema_config:
configs:
- from: 2024-01-01
store: tsdb
object_store: my-s3-store # References the named store above
Check spelling of the store name in both the definition and reference.
Properties:
Cache errors occur when Loki cannot connect to or communicate with caching backends (Memcached, Redis).
Error message:
redis client setup failed: <details>
Cause:
Loki cannot establish a connection to the Redis server. Common causes include:
Resolution:
Verify Redis connectivity from the Loki host:
redis-cli -h <REDIS-HOST> -p <REDIS-PORT> ping
Check the Redis endpoint configuration:
chunk_store_config:
chunk_cache_config:
redis:
endpoint: redis:6379
timeout: 500ms
Configure authentication if required:
chunk_store_config:
chunk_cache_config:
redis:
endpoint: redis:6379
password: ${REDIS_PASSWORD}
Properties:
Error message:
could not lookup host: <hostname>
Cause:
DNS resolution failed for the Redis hostname.
Resolution:
Verify DNS resolution:
nslookup redis-host
Use an IP address if DNS is not available:
chunk_store_config:
chunk_cache_config:
redis:
endpoint: 10.0.0.100:6379
Check your DNS configuration and network settings.
Properties:
Error message:
redis: Unexpected PING response "<response>"
Cause:
The Redis server returned an unexpected response to a PING command. This could indicate:
Resolution:
Verify the endpoint is actually a Redis server.
Check Redis health:
redis-cli -h <HOST> -p <PORT> INFO
Review proxy configurations if using a load balancer in front of Redis.
Properties:
Error message:
use of multiple cache storage systems is not supported
Cause:
Both Memcached and Redis cache backends are configured for the same cache type. Only one caching backend is supported per cache type.
Resolution:
Choose one cache backend per cache type:
chunk_store_config:
chunk_cache_config:
# Use either memcached OR redis, not both
redis:
endpoint: redis:6379
memcached: {} # Remove this
Properties:
Error message:
no cache configured
Cause:
A results cache is required for the query frontend but no cache configuration was provided.
Resolution:
Configure a cache backend:
query_range:
results_cache:
cache:
memcached:
expiration: 1h
memcached_client:
addresses: memcached:11211
Or disable results caching if not needed:
query_range:
cache_results: false
Properties:
Error message:
msg="error loading cache generation numbers" err="unexpected status code: 403"
Or from the compactor HTTP client:
msg="error getting cache gen numbers from the store" err="unexpected status code: 403"
Metric: loki_delete_cache_gen_load_failures_total
Cause:
Loki uses cache generation numbers to invalidate query caches when log deletion requests are processed. The cache generation number loader periodically fetches these numbers from the compactor. When the compactor returns HTTP 403, it means the deletion API is not enabled for the tenant. The loader logs this error and increments the loki_delete_cache_gen_load_failures_total metric.
Other non- 403 causes include:
Resolution:
Check if deletion is intentionally disabled. If you don't use log deletion for this tenant, these errors are harmless but noisy. You can verify by sending a GET request to the compactor's cache generation number endpoint:
curl - s - H "X- Scope- OrgID: <tenant>" http://compactor:3100/loki/api/v1/cache/generation_numbers
If the response is "deletion is not available for this tenant", the deletion API is not enabled for the tenant.
Enable deletion for the tenant if log deletion is required:
overrides:
<tenant>:
deletion_mode: filter- and- delete
Check compactor connectivity if the error includes a non- 403 status code or a connection error:
curl - s http://compactor:3100/ready
Verify the compactor address is correctly configured. Queriers and other components that use the cache generation loader need to reach the compactor:
compactor:
compactor_address: http://compactor:3100
Properties:
GenNumberLoader)deletion_mode in tenant overrides)Ring errors occur when Loki components cannot properly communicate through the hash ring, which is used to distribute work across instances. The ring is fundamental to Loki's distributed operation.
Error message:
too many unhealthy instances in the ring
Cause:
The ring contains too many unhealthy instances to satisfy the replication factor. For example, with a replication factor of 3, at least 3 healthy instances must be available.
Resolution:
Check the health of ring members:
curl -s http://loki:3100/ring | jq '.shards[] | select(.state != "ACTIVE")'
Restart unhealthy instances that are stuck in a bad state.
Scale up instances if there aren't enough healthy members.
Check resource constraints (CPU, memory, disk) on unhealthy instances.
Properties:
Error message:
empty ring
Cause:
No instances are registered in the ring. This typically occurs during initial cluster startup, for example if your ingesters are OOM crashing, or due to misconfiguration.
Resolution:
Wait for instances to register during initial startup.
Check ingesters to make sure they are running.
Check that all instances can communicate over the configured ports.
Verify ring configuration across all components, especially memberlist configuration:
ingester:
lifecycler:
ring:
kvstore:
store: memberlist
replication_factor: 3
Check KV store health (Consul, etcd, or memberlist):
# For memberlist
curl -s http://loki:3100/memberlist
Properties:
Error message:
instance <id> not found in the ring
Cause:
A specific instance is expected to be in the ring but isn't registered. This can happen after a restart if the instance hasn't re-joined the ring yet.
Resolution:
Properties:
Error message:
this instance owns no tokens
Cause:
The instance has joined the ring but hasn't claimed any tokens. Without tokens, the instance cannot receive any work. This can happen if:
Resolution:
Wait for token assignment during startup.
Check the ring status for the instance: Open a browser and navigate to http://localhost:3100/ring. You should see the Loki Ring Status page.
OR
curl -s http://loki:3100/ring
Restart the instance if tokens are not assigned after startup completes.
Check KV store connectivity and health.
Properties:
Error message:
error talking to the KV store
Cause:
The instance cannot communicate with the key-value store used for ring state. The KV store (Consul, etcd, or memberlist) is required for ring coordination.
Resolution:
Check KV store health and connectivity:
# For Consul
curl http://consul:8500/v1/status/leader
# For etcd
etcdctl endpoint health
Verify network connectivity between Loki instances and the KV store.
Check firewall rules allow traffic on KV store ports.
For memberlist, verify that gossip ports are accessible between all instances:
memberlist:
bind_port: 7946
join_members:
- loki-memberlist:7946
Properties:
Error message:
no ring returned from the KV store
Cause:
The KV store responded but returned an empty or invalid ring descriptor. This can happen if the KV store was recently initialized or its data was cleared.
Resolution:
Properties:
Error message:
failed to join memberlist cluster on startup
Or:
joining memberlist cluster failed
Cause:
The instance could not join the memberlist gossip cluster. Common causes:
Resolution:
Check that join members are reachable:
# Test connectivity to seed nodes
nc -zv loki-memberlist 7946
Verify DNS resolution for join addresses:
nslookup loki-memberlist
Check memberlist configuration:
memberlist:
bind_port: 7946
join_members:
- loki-gossip-ring.loki.svc.cluster.local:7946
Ensure firewall rules allow UDP and TCP traffic on the gossip port (default 7946).
For Kubernetes, verify that the headless service for memberlist is configured correctly.
Properties:
Error message:
re-joining memberlist cluster failed
Cause:
After being disconnected from the memberlist cluster, the instance failed to rejoin. This can happen during network partitions or after prolonged network issues.
Resolution:
Properties:
Readiness errors occur when Loki components are not ready to serve requests. These errors are returned by the /ready health check endpoint and prevent load balancers from routing traffic to unready instances.
Error message:
Application is stopping
Cause:
Loki is shutting down and no longer accepting new requests. This is normal during graceful shutdown.
Resolution:
Properties:
Error message:
Some services are not Running:
<state>: <count>
<state>: <count>
For example:
Some services are not Running:
Starting: 1
Failed: 2
Cause:
One or more internal Loki services have failed to start or have stopped unexpectedly. The error message lists each service state with a count of services in that state.
Resolution:
Properties:
Error message:
Ingester not ready: <details>
When the ingester's own state check fails, <details> contains the ingester state, giving the full message:
Ingester not ready: ingester not ready: <state>
Where <state> is the service state, for example Starting, Stopping, or Failed.
Cause:
The ingester is not in a ready state to accept writes or serve reads. The detail message indicates the specific reason, such as:
Starting)Resolution:
Wait for startup to complete - ingesters take time to join the ring and become ready.
Check ring membership: Open a browser and navigate to http://localhost:3100/ring. You should see the Loki Ring Status page.
OR
curl -s http://ingester:3100/ring
Review logs for startup errors.
Adjust the minimum ready duration if startup is too slow:
ingester:
lifecycler:
min_ready_duration: 15s
Properties:
Error message:
Query Frontend not ready: not ready: number of queriers connected to query-frontend is 0
Cause:
The query frontend has no querier workers connected. Without queriers, the frontend cannot process any queries. This typically occurs when:
Resolution:
Check that queriers are running and healthy.
Verify querier configuration points to the correct frontend address:
frontend_worker:
frontend_address: query-frontend:9095
Check gRPC connectivity between queriers and the frontend:
# Test gRPC port connectivity
nc -zv query-frontend 9095
Review querier logs for connection errors.
Properties:
Error message:
Query Frontend not ready: not ready: number of schedulers this worker is connected to is 0
Cause:
The query frontend worker has no active connections to any query scheduler. This prevents the frontend from dispatching queries.
Resolution:
Check that query schedulers are running and healthy.
Verify scheduler address configuration:
frontend_worker:
scheduler_address: query-scheduler:9095
Check gRPC connectivity between the frontend and schedulers.
Review query scheduler logs for errors.
Properties:
gRPC errors occur during inter-component communication. Loki components communicate using gRPC for ring coordination, query execution, and data transfer.
Error message:
message size too large than max (<size> vs <max>)
Or for the decompressed body:
decompressed message size too large than max (<size> vs <max>)
Cause:
The compressed or decompressed body of an HTTP push request to the distributor exceeds the configured limit.
Default configuration:
distributor.max_recv_msg_size: 100MB (compressed request body limit)distributor.max_decompressed_size: 5000MB (decompressed body limit, defaults to 50× max_recv_msg_size)Resolution:
Increase the distributor receive message size limit:
distributor:
max_recv_msg_size: 209715200 # 200MB compressed
max_decompressed_size: 10737418240 # 10GB decompressed
Reduce push batch sizes in your log shipping client (Alloy, etc.) to send smaller individual requests.
Reduce the amount of data per request by lowering the batch size or flush interval in your client.
Properties:
Error message:
response larger than the max message size (<size> vs <max>)
Cause:
A query result from the querier to the frontend exceeds the maximum allowed gRPC response size. This typically happens with queries that return very large result sets.
Default configuration:
server.grpc_server_max_send_msg_size: 4MB (gRPC server send limit on the querier)querier.query_frontend_grpc_client.max_recv_msg_size: 100MB (gRPC client receive limit on the querier worker)Resolution:
Reduce query scope to return fewer results:
Increase gRPC message size limits if needed. Apply these settings to querier nodes:
server:
grpc_server_max_send_msg_size: 209715200 # 200MB
querier:
query_frontend_grpc_client:
max_recv_msg_size: 209715200 # 200MB
Properties:
Error message:
compressed message size <size> exceeds limit <limit>
Cause:
The compressed body of an HTTP push request exceeds the distributor's configured limit. This check runs after the request body has been fully read and validates the total compressed size against the configured maximum.
Default configuration:
distributor.max_recv_msg_size: 100MBResolution:
Reduce batch sizes in your log shipping client.
Split large batches into smaller, more frequent requests.
Increase the limit if needed:
distributor:
max_recv_msg_size: 209715200 # 200MB
Properties:
TLS errors occur when Loki or its clients cannot establish secure connections due to certificate issues.
Error message:
error loading ca cert: <path>
Or:
error loading client cert: <path>
Or:
error loading client key: <path>
Or:
failed to load TLS certificate <cert_path>,<key_path>
Cause:
Loki cannot load TLS certificates from the specified paths. Common causes:
Resolution:
Verify certificate files exist and are readable:
ls -la /path/to/cert.pem /path/to/key.pem /path/to/ca.pem
Check file permissions (the Loki process must be able to read them).
Validate the certificate format:
openssl x509 -in /path/to/cert.pem -noout -text
openssl rsa -in /path/to/key.pem -check
Verify cert and key match:
openssl x509 -noout -modulus -in cert.pem | md5sum
openssl rsa -noout -modulus -in key.pem | md5sum
# Both should produce the same hash
Check your TLS configuration:
server:
http_tls_config:
cert_file: /path/to/cert.pem
key_file: /path/to/key.pem
client_ca_file: /path/to/ca.pem
grpc_tls_config:
cert_file: /path/to/cert.pem
key_file: /path/to/key.pem
client_ca_file: /path/to/ca.pem
Properties:
Error message:
error generating http tls config: <details>
Or:
error generating grpc tls config: <details>
Where <details> may include messages such as TLS version %q not recognized, cipher suite %q not recognized, or unknown TLS version: <version>.
Cause:
The TLS configuration is invalid. This can happen when:
Resolution:
Review TLS settings for compatibility issues.
Use supported TLS versions by setting tls_min_version at the top level of the server block:
server:
tls_min_version: VersionTLS12
Valid values are VersionTLS10, VersionTLS11, VersionTLS12, and VersionTLS13. There is no max_version setting; tls_min_version is the only version constraint.
Check cipher suite configuration if customized.
Properties:
DNS errors occur when Loki cannot resolve hostnames for service discovery or backend connections.
Error message:
msg="failed to resolve server addresses" err="... DNS lookup timeout: [<address>] ..."
Cause:
DNS resolution exceeded the 5-second timeout when trying to resolve addresses for Loki service discovery or backend connections.
This error is emitted by the index gateway and bloom gateway DNS discovery loops.
The DNS lookup timeout: [<address>] string is the context cause embedded within the err field; the full address list is formatted as a Go slice (for example, [dns+loki-index-gateway.loki.svc.cluster.local:9095]).
Resolution:
Check DNS server availability and configuration.
Verify hostname resolution:
nslookup <hostname>
dig <hostname>
Use IP addresses as a workaround if DNS is unreliable:
# Instead of dns+hostname:port
memberlist:
join_members:
- 10.0.0.1:7946
- 10.0.0.2:7946
For Kubernetes, ensure CoreDNS is healthy and headless services are configured correctly.
Properties:
These errors relate to query scheduling, frontend workers, and queue management.
Error message:
scheduler is not running
Cause:
The query scheduler service is not in a running state. This can occur when:
Resolution:
Check scheduler logs for startup errors or crashes.
Verify scheduler health:
curl -s http://scheduler:3100/ready
Check scheduler ring membership if using ring-based scheduling:
curl -s http://scheduler:3100/ring | jq
Properties:
Error message:
too many outstanding requests
Cause:
The query queue has reached its maximum capacity. This indicates the system is overloaded with queries.
Resolution:
Scale out queriers to process queries faster:
querier:
max_concurrent: 10
Increase queue capacity (with caution). The default is 32000; increase beyond that only if you have confirmed the system can handle the additional load. Note that increasing the queue is often necessary because of how many subqueries can be generated by large values for tsdb_max_query_parallelism. Generally it's preferable to add more queriers and leave this setting unchanged.
query_scheduler:
max_outstanding_requests_per_tenant: 64000
Rate limit queries at the client or load balancer level.
Optimize slow queries to reduce queue time.
Properties:
Error message:
querying is disabled, please contact your Loki operator
Cause:
Query parallelism has been set to zero, effectively disabling all queries. This is typically done intentionally during maintenance.
Resolution:
Check the relevant parallelism setting for your index type. For TSDB indexes (the current default), tsdb_max_query_parallelism supersedes max_query_parallelism. Either value being set to zero triggers this error. Verify that both are greater than zero:
limits_config:
max_query_parallelism: 32 # default; applies to non-TSDB schemas
tsdb_max_query_parallelism: 128 # default; applies to TSDB schemas
Size tsdb_max_query_parallelism to your ingest volume. Typical values in production are in the range of 128–2048, proportional to the volume of logs ingested per day:
| Daily ingest volume | Typical value |
|---|---|
| Low–moderate | 128–256 |
| High | 512 |
| Tens of TB/day | 1024–2048 |
Account for the querier capacity this requires. Each unit of parallelism consumes one querier worker slot. With the default querier.max_concurrent of 4, the number of queriers needed to fully parallelize a single query is:
queriers needed = tsdb_max_query_parallelism / max_concurrent
For example, tsdb_max_query_parallelism: 2048 with max_concurrent: 4 requires 512 queriers to run one query fully in parallel. Production deployments supporting many tenants running large queries simultaneously commonly run thousands of queriers.
Contact your administrator if you don't have access to change these settings.
Properties:
Error message:
no frontend address
Cause:
The scheduler received a request from a frontend but no frontend address was provided for sending responses back.
Resolution:
Check frontend configuration to ensure the address is set:
frontend:
address: query-frontend:9095
Verify gRPC connectivity between frontend and scheduler.
Properties:
Error message:
scheduler is shutting down
Cause:
The frontend scheduler worker detected that the scheduler is in shutdown mode and cannot accept new requests.
Resolution:
Properties:
Index gateway errors occur when queriers cannot communicate with index gateways for index lookups.
Error message:
index-gateway is unhealthy in the ring
Cause:
The index gateway instance detects itself as unhealthy in the ring and refuses to process queries. This is a self-check: before handling tenant requests, the gateway verifies it appears in the set of healthy ring members.
Resolution:
Check index gateway health:
curl -s http://index-gateway:3100/ready
View the ring status: Open a browser and navigate to http://localhost:3100/ring. You should see the Loki Ring Status page.
OR
curl -s http://index-gateway:3100/ring
Check logs for errors preventing the gateway from becoming healthy.
Restart the index gateway if it's stuck in an unhealthy state.
Properties:
Error message:
no index gateway instances found for tenant <tenant>
Cause:
No index gateway instances are available in the ring to serve the tenant's request. This could be due to:
Resolution:
Check if any index gateways are running:
curl -s http://index-gateway:3100/ring | jq '.shards | length'
Verify ring mode is configured if using shuffle sharding. The index gateway must run in ring mode and the per-tenant shard size must be set:
index_gateway:
mode: ring
limits_config:
index_gateway_shard_size: 3 # default = 0 (use all instances)
Scale up index gateways if needed.
Properties:
index_gateway_shard_size in limits_config)Error message:
index client is not initialized likely due to boltdb-shipper not being used
Cause:
The index gateway was queried for operations that require the index client, but the client wasn't initialized because the boltdb-shipper store isn't configured.
Resolution:
Verify your schema config uses the correct index store:
schema_config:
configs:
- from: 2024-01-01
store: tsdb
object_store: s3
schema: v13
index:
prefix: index_
period: 24h
Check if the operation requires boltdb-shipper - some legacy operations may not be supported with TSDB.
Properties:
Compactor errors occur during index compaction or retention enforcement.
Error message:
no chunks found in table, please check if there are really no chunks and manually drop the table or see if there is a bug causing us to drop whole index table
Cause:
The compactor found an empty index table during retention processing. This could indicate:
Resolution:
Verify the table should be empty:
# Check if data exists for the time period
logcli query '{job=~".+"}' --from="<table-start-time>" --to="<table-end-time>" --limit=1
If the table is legitimately empty, manually delete it from object storage.
If data should exist, investigate potential data loss.
Properties:
Error message:
compactor.delete-request-store should be configured when retention is enabled
Cause:
Retention is enabled but no store is configured for tracking delete requests.
Resolution:
Configure the delete request store:
compactor:
retention_enabled: true
delete_request_store: s3
Or disable retention if not needed:
compactor:
retention_enabled: false
Properties:
Error message:
max compaction parallelism must be >= 1
Cause:
The compactor's parallelism setting is configured to zero or a negative number.
Resolution:
Set a valid parallelism value:
compactor:
max_compaction_parallelism: 1 # Must be >= 1
Properties:
Error message:
could not find delete request with given id
Cause:
An attempt to cancel a delete request failed because no matching request exists.
Resolution:
List existing delete requests:
curl -s http://compactor:3100/loki/api/v1/delete | jq
Verify the delete request ID is correct.
Check if the request has already been processed and removed.
Properties:
Error message:
Retention is not enabled
Cause:
A delete request was submitted but retention is not enabled in the compactor configuration. Delete requests require retention to be enabled.
Resolution:
Enable retention in the compactor:
compactor:
retention_enabled: true
delete_request_store: s3
Restart the compactor after changing the configuration.
Properties:
Error message:
invalid start time: require unix seconds or RFC3339 format
Or:
invalid end time: require unix seconds or RFC3339 format
Cause:
The start or end time in a delete request is not in a valid format.
Resolution:
Use Unix seconds or RFC3339 format:
# Unix seconds
curl -X POST http://compactor:3100/loki/api/v1/delete \
-H "X-Scope-OrgID: my-tenant" \
-d "query={app=\"foo\"}" \
-d "start=1704067200" \
-d "end=1704153600"
# RFC3339
curl -X POST http://compactor:3100/loki/api/v1/delete \
-H "X-Scope-OrgID: my-tenant" \
-d "query={app=\"foo\"}" \
-d "start=2024-01-01T00:00:00Z" \
-d "end=2024-01-02T00:00:00Z"
Properties:
Error message:
deletion of request which is in process or already processed is not allowed
Cause:
An attempt was made to cancel a delete request that is already being processed or has completed processing.
Resolution:
Check the status of the delete request:
curl -s http://compactor:3100/loki/api/v1/delete \
-H "X-Scope-OrgID: my-tenant" | jq
Submit a new delete request if you need to delete additional data.
Properties:
Error message:
invalid max_interval: valid time units are 's', 'm', 'h'
Or:
max_interval can't be greater than <configured-limit>
Or:
max_interval can't be greater than the interval to be deleted (<duration>)
Cause:
The max_interval parameter on a delete request has an invalid value, exceeds the configured delete_max_interval limit, or exceeds the time range of the delete request itself.
Resolution:
Use a valid time format with supported units (s, m, h):
curl -X POST http://compactor:3100/loki/api/v1/delete \
-H "X-Scope-OrgID: my-tenant" \
-d "query={app=\"foo\"}" \
-d "start=1704067200" \
-d "end=1704153600" \
-d "max_interval=1h"
Properties:
Ruler errors occur when evaluating alerting rules or recording rules.
Error message:
invalid ruler evaluation config: <details>
Cause:
The ruler evaluation mode configuration is invalid.
Resolution:
Use a valid evaluation mode:
ruler:
evaluation:
mode: local # Or "remote"
Properties:
initRuleEvaluator)Error message:
ruler remote write config: both 'client' and 'clients' options are defined; 'client' is deprecated, please only use 'clients'
Cause:
Both the deprecated client and the new clients configuration options are set for ruler remote write.
Resolution:
Remove the deprecated config and use clients:
ruler:
remote_write:
# Remove this:
# client: {}
# Use this instead:
clients:
primary:
url: http://prometheus:9090/api/v1/write
Properties:
NewRuler)Error message:
remote-write enabled but no clients URL are configured
Or when multiple clients are configured in the clients map and one entry is missing a URL:
remote-write enabled but client '<name>' URL for tenant <client-id> is not configured
Cause:
Remote write is enabled for the ruler but no destination URL is configured. The first variant occurs when the clients map is empty. The second occurs when a named entry in the clients map has no url set; <client-id> is the map key for that entry, not a tenant ID.
Resolution:
Configure the remote write URL:
ruler:
remote_write:
enabled: true
clients:
primary:
url: http://prometheus:9090/api/v1/write
Or disable remote write:
ruler:
remote_write:
enabled: false
Properties:
Error message:
rule result is not a vector or scalar
Cause:
A rule evaluation returned an unexpected result type. Both recording rules and alerting rules must produce vector or scalar results. A plain log-stream expression (one that returns log lines rather than a numeric metric) triggers this error in either rule type.
Resolution:
Check the rule expression returns a vector or scalar:
# Valid - returns vector:
record: my_metric
expr: sum(rate({job="app"}[5m] | json | level="error"))
# Invalid - returns logs (triggers error for both recording and alerting rules):
# record: my_metric
# expr: '{job="app"}'
Use aggregation functions to produce numeric results from log queries.
Properties:
Error message:
WAL storage closed
Cause:
An operation was attempted on the ruler's write-ahead log (WAL) after it was closed. This typically occurs during shutdown.
Resolution:
Properties:
These errors occur when Loki is configured to use Kafka for ingestion.
Error message:
the Kafka address has not been configured
Cause:
Kafka ingestion is enabled but no Kafka broker address is configured.
Resolution:
Configure the Kafka address:
kafka_config:
topic: loki-logs
reader_config:
address: kafka:9092
writer_config:
address: kafka:9092
Properties:
Error message:
the Kafka topic has not been configured
Cause:
Kafka ingestion is enabled but no topic name is configured.
Resolution:
Configure the Kafka topic:
kafka_config:
topic: loki-logs
reader_config:
address: kafka:9092
writer_config:
address: kafka:9092
Properties:
Error message:
both sasl username and password must be set
Cause:
Only one of the Simple Authentication and Security Layer (SASL) username or password is configured. Both must be set together.
Resolution:
Configure both username and password:
kafka_config:
sasl_username: my-user
sasl_password: ${KAFKA_PASSWORD}
Or remove both if SASL authentication is not required.
Properties:
Error message:
kafka is enabled in distributor but not in ingester
Cause:
Kafka is configured for the distributor but the ingester isn't configured to read from Kafka. Both must be configured together.
Resolution:
Enable Kafka in both distributor and ingester:
distributor:
kafka_writes_enabled: true
ingester:
kafka_ingestion:
enabled: true
Properties:
Bloom gateway errors occur when using bloom filters for query acceleration.
Error message:
addresses requires a list of comma separated strings in DNS service discovery format with at least one item
Cause:
The bloom_gateway.client.addresses configuration field is empty or unset.
Resolution:
Configure valid addresses:
bloom_gateway:
client:
addresses: dns+bloom-gateway:9095
Valid formats:
dns+hostname:port - DNS-based discoveryhost1:port,host2:port - Static listProperties:
Error message:
request time range must span exactly one day
Cause:
Bloom gateway requests must be for exactly one day of data due to how bloom blocks are organized.
Resolution:
Properties:
Error message:
from time must not be after through time
Cause:
The bloom gateway received a request where the start time (from) is later than the end time (through).
Resolution:
from ≤ through.Properties:
WAL errors occur when the ingester cannot properly manage its write-ahead log.
Error message:
wal is stopped
Cause:
An operation was attempted on the WAL after it was stopped. This typically occurs during shutdown or after a fatal error.
Resolution:
Properties:
Error message:
invalid checkpoint duration: <duration>
Cause:
The WAL checkpoint duration is set to an invalid value (likely zero or negative).
Resolution:
Set a valid checkpoint duration:
ingester:
wal:
checkpoint_duration: 5m
Properties:
Ingester lifecycle errors occur during ingester startup, shutdown, or state transitions.
Error message:
Ingester is shutting down
Cause:
The ingester is in the process of shutting down and is no longer accepting writes. This error (also known as ErrReadOnly) is returned when a push request arrives during graceful shutdown. During this period the ingester may still serve reads for data it holds in memory.
Resolution:
Properties:
Error message:
Ingester is stopping or already stopped.
Cause:
The ingester's shutdown management endpoint (POST /loki/api/v1/ingester/shutdown) was called when the ingester was not in a Running state. This happens when the endpoint is called a second time during an in-progress shutdown or after the ingester has already stopped. This error is returned by the shutdown endpoint, not by the log-write or query paths.
Resolution:
Properties:
Error message:
failed to start partition reader: <details>
Cause:
The ingester could not start its Kafka partition reader. This occurs when Kafka ingestion is enabled but the partition reader fails to initialize.
Resolution:
Check Kafka connectivity from the ingester.
Verify Kafka topic exists and the ingester has appropriate permissions.
Review Kafka configuration:
kafka:
address: kafka:9092
topic: loki-logs
Check Kafka broker health.
Properties:
Error message:
failed to start partition ring lifecycler: <details>
Cause:
The ingester could not start its Kafka partition ring lifecycler during startup. This is a separate component from the partition reader; it manages the ingester's membership in the partition ring. This only occurs when Kafka ingestion is enabled.
Resolution:
<details>.Properties:
Error message:
lifecycler failed: <details>
Cause:
The ingester's lifecycler (which manages ring membership) encountered a fatal error. This prevents the ingester from participating in the ring.
Resolution:
Properties:
Pattern ingester errors occur when using the pattern ingester for automatic log pattern detection.
Error message:
pattern ingester replication factor must be 1
Cause:
The pattern ingester is configured with a replication factor other than 1. Currently, the pattern ingester only supports a replication factor of 1.
Resolution:
Set the replication factor to 1:
pattern_ingester:
lifecycler:
ring:
replication_factor: 1
Properties:
Error message:
retain-for (<duration>) must be greater than or equal to chunk-duration (<duration>)
Cause:
The pattern ingester's retain_for duration is shorter than max_chunk_age, which would cause data loss.
Resolution:
Increase the retain-for duration to be at least as long as max_chunk_age:
pattern_ingester:
retain_for: 15m # Must be >= max_chunk_age
max_chunk_age: 5m
Properties:
Error message:
chunk-duration (<duration>) must be greater than or equal to sample-interval (<duration>)
Cause:
The pattern ingester's max_chunk_age is shorter than pattern_sample_interval. Chunks must span at least one sample interval to hold any data.
Resolution:
Increase max_chunk_age to be at least as long as pattern_sample_interval:
pattern_ingester:
max_chunk_age: 1h # Must be >= pattern_sample_interval (default: 1h)
pattern_sample_interval: 10s # default: 10s
Properties:
Error message:
volume_threshold (<value>) must be between 0 and 1
Cause:
The volume_threshold value is outside the valid range of 0 to 1. This setting controls what fraction of log volume the pattern ingester tracks — only patterns representing the top X% of log volume are persisted.
Resolution:
Set volume_threshold to a value between 0 and 1 (default is 0.99):
pattern_ingester:
volume_threshold: 0.99
Properties:
These errors occur when API requests contain invalid parameters.
Error message:
invalid direction '<value>'
Cause:
The direction query parameter contains an invalid value.
Resolution:
Use a valid direction value:
forward - Oldest to newestbackward - Newest to oldest (default)curl "http://loki:3100/loki/api/v1/query_range?query={job=\"app\"}&direction=forward"
Properties:
Error message:
limit must be a positive value
Cause:
The limit parameter is zero or negative.
Resolution:
Provide a positive limit:
curl "http://loki:3100/loki/api/v1/query_range?query={job=\"app\"}&limit=100"
Properties:
Error message:
end timestamp must not be before or equal to start time
Cause:
The query's end time is before or equal to its start time.
Resolution:
Ensure end time is after start time:
curl "http://loki:3100/loki/api/v1/query_range?\
query={job=\"app\"}&\
start=2024-01-01T00:00:00Z&\
end=2024-01-02T00:00:00Z"
Properties:
Error message:
delay_for can't be greater than <max>
Cause:
The delay_for parameter for tailing queries exceeds the maximum allowed value.
Resolution:
Reduce the delay_for value:
curl "http://loki:3100/loki/api/v1/tail?query={job=\"app\"}&delay_for=5"
The maximum value is typically 5 seconds.
Properties:
Error message:
query filtering for deletes requires 'compactor_grpc_address' or 'compactor_address' to be configured
Cause:
Query-time filtering for delete requests is enabled but Loki doesn't know how to reach the compactor to retrieve active delete requests.
Resolution:
Configure the compactor address:
compactor:
compactor_grpc_address: compactor:9095
Or use the HTTP address:
compactor:
compactor_address: http://compactor:3100
Properties: