docs/sources/operations/troubleshooting/troubleshoot-ingest.md
This guide helps you troubleshoot errors that occur when ingesting logs into Loki and writing logs to storage. When Loki rejects log ingestion requests, it's typically due to exceeding rate limits, violating validation rules, or encountering storage issues.
Before you begin, ensure you have the following:
All ingestion errors are tracked using Prometheus metrics:
loki_discarded_samples_total - Count of discarded log samples by reasonloki_discarded_bytes_total - Volume of discarded log data by reasonThe reason label on these metrics indicates the specific error type. Set up alerts on these metrics to detect ingestion problems.
Rate limits protect Loki from being overwhelmed by excessive log volume. When rate limits are exceeded, Loki returns HTTP status code 429 Too Many Requests.
rate_limitedError message:
ingestion rate limit exceeded for user <tenant> (limit: <limit> bytes/sec) while attempting to ingest <lines> lines totaling <bytes> bytes, reduce log volume or contact your Loki administrator to see if the limit can be increased
Cause:
The tenant has exceeded their configured ingestion rate limit. This is a global per-tenant limit enforced by the distributor.
Default configuration:
ingestion_rate_mb: 4 MB/secingestion_burst_size_mb: 6 MBingestion_rate_strategy: "global" (shared across all distributors)Resolution:
Increase rate limits (if you have sufficient cluster resources):
limits_config:
ingestion_rate_mb: 8
ingestion_burst_size_mb: 12
Reduce log volume by:
Switch to local strategy (if using single-region deployment):
limits_config:
ingestion_rate_strategy: local
{{< admonition type="note" >}} This multiplies the effective limit by the number of distributor replicas. {{< /admonition >}}
Properties:
stream_limitError message:
maximum active stream limit exceeded when trying to create stream <stream_labels>, reduce the number of active streams (reduce labels or reduce label values), or contact your Loki administrator to see if the limit can be increased, user: <tenant>
Cause:
The tenant has reached the maximum number of active streams. Active streams are held in memory on ingesters, and excessive streams can cause out-of-memory errors.
Default configuration:
max_global_streams_per_user: 5000 (globally)max_streams_per_user: 0 (no local limit per ingester)chunk_idle_period (default: 30 minutes)Terminology:
chunk_idle_period (default: 30 minutes)Resolution:
Reduce stream cardinality by:
Increase stream limits (if you have sufficient memory):
limits_config:
max_global_streams_per_user: 10000
{{< admonition type="note" >}} Do not increase stream limits to accommodate high cardinality labels, this can result in Loki flushing extremely high numbers of small files which will make for extremely poor query performance. As volume and the size of the infrastructure being monitored increase, it would be expected to increase the stream limit. However even for hundreds of TBs per day of logs you should avoid exceeding 300,000 max global streams per user. {{< /admonition >}}
Properties:
Validation errors occur when log data doesn't meet Loki's requirements. These return HTTP status code 400 Bad Request and are not retryable.
line_too_longError message:
max entry size <max_size> bytes exceeded for stream <stream_labels> while adding an entry with length <entry_size> bytes
Cause:
A log line exceeds the maximum allowed size.
Default configuration:
max_line_size: 256 KBmax_line_size_truncate: falseResolution:
Truncate long lines instead of discarding them:
limits_config:
max_line_size: 256KB
max_line_size_truncate: true
Increase the line size limit (not recommended above 256KB):
limits_config:
max_line_size: 256KB
{{< admonition type="warning" >}} Loki was built as a large multi-user, multi-tenant database and as such this limit becomes very important to maintain stability and performance with many users query the database simultaneously. We strongly recommend against increasing the max_line_size, doing so will make it very difficult to provide consistent query performance and stability of the system without having to throw extremely large amounts of memory and/or increasing the GRPC message size limits to really high levels, both of which will likely lead to poorer performance and worse experiences. {{< /admonition >}}
Filter or truncate logs before sending using Alloy processing stages:
stage.replace {
expression = "^(.{10000}).*$"
replace = "$1"
}
Properties:
invalid_labelsError message:
error parsing labels <labels> with error: <parse_error>
Cause:
Label names or values contain invalid characters or don't follow Prometheus naming conventions:
[a-zA-Z_][a-zA-Z0-9_]*__ (reserved for Grafana internal use){key="value"}Default configuration:
This validation is always enabled and cannot be disabled.
Resolution:
Fix label names in your log shipping configuration:
# Incorrect
labels:
"123-app": "value" # Starts with number
"app-name": "value" # Contains hyphen
"__internal": "value" # Reserved prefix
# Correct
labels:
app_123: "value"
app_name: "value"
internal_label: "value"
Use Alloy's relabeling stages to fix labels:
stage.label_drop {
values = ["invalid_label"]
}
stage.labels {
values = {
app_name = "",
}
}
Properties:
missing_labelsError message:
error at least one label pair is required per stream
Cause:
A log stream was submitted without any labels. Loki requires at least one label pair for each log stream.
Default configuration:
This validation is always enabled and cannot be disabled. Every stream must have at least one label.
Resolution:
Ensure all log streams have at least one label. Update your log shipping configuration to include labels:
loki.write "default" {
endpoint {
url = "http://loki:3100/loki/api/v1/push"
}
}
loki.source.file "logs" {
targets = [
{
__path__ = "/var/log/*.log",
job = "myapp",
environment = "production",
},
]
forward_to = [loki.write.default.receiver]
}
Properties:
out_of_order / too_far_behindThese errors occur when log entries arrive in the wrong chronological order.
Error messages:
unordered_writes: false: entry out of orderunordered_writes: true: entry too far behind, entry timestamp is: <timestamp>, oldest acceptable timestamp is: <cutoff>Cause:
Logs are being ingested with timestamps that violate Loki's ordering constraints:
unordered_writes: false: Any log older than the most recent log in the stream is rejectedunordered_writes: true (default): Logs older than half of max_chunk_age from the newest entry are rejectedDefault configuration:
unordered_writes: true (since Loki 2.4)max_chunk_age: 2 hoursmax_chunk_age)Resolution:
Ensure timestamps are assigned correctly:
stage.timestamp {
source = "timestamp_field"
format = "RFC3339"
}
Check for clock skew between log sources and Loki. Ensure clocks are synchronized across your infrastructure.
Properties:
max_chunk_age)greater_than_max_sample_ageError message:
entry for stream <stream_labels> has timestamp too old: <timestamp>, oldest acceptable timestamp is: <cutoff>
Cause:
The log entry's timestamp is older than the configured maximum sample age. This prevents ingestion of very old logs.
Default configuration:
reject_old_samples: truereject_old_samples_max_age: 168 hours (7 days)Resolution:
Increase the maximum sample age:
limits_config:
reject_old_samples_max_age: 336h # 14 days
Fix log delivery delays causing old timestamps.
For historical log imports, temporarily disable the check per tenant.
Check for clock skew between log sources and Loki. Ensure clocks are synchronized across your infrastructure.
Disable old sample rejection (not recommended):
limits_config:
reject_old_samples: false
Properties:
too_far_in_futureError message:
entry for stream <stream_labels> has timestamp too new: <timestamp>
Cause:
The log entry's timestamp is further in the future than the configured grace period allows.
Default configuration:
creation_grace_period: 10 minutesResolution:
Check for clock skew between log sources and Loki. Ensure clocks are synchronized across your infrastructure.
Verify application timestamps Validate timestamp generation in your applications.
Verify timestamp parsing in your log shipping configuration.
Properties:
max_label_names_per_seriesError message:
entry for stream <stream_labels> has <count> label names; limit <limit>
Cause:
The stream has more labels than allowed.
Default configuration:
max_label_names_per_series: 15Resolution:
{{< admonition type="warning" >}}
We strongly recommend against increasing max_label_names_per_series, doing so creates a larger index which hurts query performance as well as opens the door for cardinality explosions. You should be able to categorize your logs with 15 labels or typically much less. In all our years of running Loki, out of thousands of requests to increase this value, the number of valid exceptions we have seen can be counted on one hand.
{{< /admonition >}}
Properties:
label_name_too_longError message:
stream <stream_labels> has label name too long: <label_name>
Cause:
A label name exceeds the maximum allowed length.
Default configuration:
max_label_name_length: 1024 bytesResolution:
Properties:
label_value_too_longError message:
stream <stream_labels> has label value too long: <label_value>
Cause:
A label value exceeds the maximum allowed length.
Default configuration:
max_label_value_length: 2048 bytesResolution:
Shorten label values in your log shipping configuration to under the configured limit. You can use hash values for very long identifiers.
Use structured metadata for long values instead of labels.
Properties:
duplicate_label_namesError message:
stream <stream_labels> has duplicate label name: <label_name>
Cause:
The stream has two or more labels with identical names.
Default configuration:
This validation is always enabled and cannot be disabled. Duplicate label names are never allowed.
Resolution:
Remove duplicates Remove duplicate label definitions in your log shipping configuration to ensure unique label names per stream.
Check clients Check your ingestion pipeline for label conflicts.
Verify processing Verify your label processing and transformation rules.
Properties:
request_body_too_largeError message:
Request body too large: <size> bytes, limit: <limit> bytes
Cause:
The HTTP compressed push request body exceeds the configured limit in your gateway/reverse proxy or service that fronts Loki after decompression.
Default configuration:
gateway.nginxConfig.clientMaxBodySize (default: 4 MB)Resolution:
Split large batches into smaller, more frequent requests.
Increase batch limit Increase the allowed body size on your gateway/reverse proxy. For example, in the Helm chart set gateway.nginxConfig.clientMaxBodySize; default is 4M.
Properties:
Storage errors occur when Loki cannot write data to its storage backend or the Write-Ahead Log (WAL).
Error message:
no space left on device
Cause:
Local storage has run out of space. This can affect ingester WAL, chunk cache, or temporary files.
Default configuration:
Disk space is not limited by Loki configuration; it depends on your infrastructure provisioning.
Resolution:
Properties:
Error message:
failed to store chunks: storage unavailable
Common causes and errors:
NoSuchBucket: Storage bucket doesn't existAccessDenied: Invalid credentials or permissionsRequestTimeout: Network or storage latency issuesResolution:
Verify storage configuration:
storage_config:
aws:
s3: s3://region/bucket-name
s3forcepathstyle: true
Check credentials and permissions:
Monitor storage health:
Review storage metrics:
loki_ingester_chunks_flushed_totalloki_ingester_chunks_flush_errors_totalIncrease retries and timeouts:
storage_config:
aws:
s3:
http_config:
idle_conn_timeout: 90s
response_header_timeout: 0s
Properties:
Error logged:
Error writing to WAL, disk full, no further messages will be logged for this error
Metric: loki_ingester_wal_disk_full_failures_total
Cause:
The disk where the WAL is stored has run out of space. When this occurs, Loki continues accepting writes but doesn't log them to the WAL, losing durability guarantees.
Default configuration:
-ingester.wal-diringester.checkpoint-duration (default: 5 minutes)Resolution:
Increase disk space for the WAL directory.
Monitor disk usage and set up alerts:
loki_ingester_wal_disk_full_failures_total > 0
Reduce log volume to decrease WAL growth.
Check WAL checkpoint frequency:
ingester:
checkpoint_duration: 5m
Verify WAL cleanup is working - old segments should be deleted after checkpointing.
Properties:
{{< admonition type="note" >}}
The WAL sacrifices durability for availability - it won't reject writes when the disk is full. After disk space is restored, durability guarantees resume. Use metric loki_ingester_wal_disk_usage_percent to monitor disk usage.
{{< /admonition >}}
Error message:
encountered WAL read error, attempting repair
Metric: loki_ingester_wal_corruptions_total
Cause:
The WAL has become corrupted, possibly due to:
Default configuration:
-ingester.wal-dirResolution:
Monitor for corruption:
increase(loki_ingester_wal_corruptions_total[5m]) > 0
Automatic recovery: Loki attempts to recover readable data and continues starting.
Investigate root cause:
Properties:
{{< admonition type="note" >}} Loki prioritizes availability over complete data recovery. The replication factor provides redundancy if one ingester loses WAL data. Recovered data may be incomplete if corruption is severe. {{< /admonition >}}
These errors occur due to network issues or service unavailability.
Error message:
connection refused when connecting to loki:3100
Cause:
The Loki service is unavailable or not listening on the expected port.
Default configuration:
-server.http-listen-port)-server.grpc-listen-port)Resolution:
Properties:
Error message:
context deadline exceeded
Cause:
Requests are timing out due to slow response times or network issues.
Default configuration:
-server.http-server-write-timeout (default: 30s)Resolution:
Properties:
Error message:
service unavailable (503)
Cause:
Loki is temporarily unable to handle requests due to high load or maintenance. This can occur when ingesters are unhealthy or the ring is not ready.
Default configuration:
Resolution:
Properties:
These errors occur when using structured metadata incorrectly.
Error message:
stream <stream_labels> includes structured metadata, but this feature is disallowed. Please see limits_config.allow_structured_metadata or contact your Loki administrator to enable it
Cause:
Structured metadata is disabled in the Loki configuration. This feature must be explicitly enabled.
Default configuration:
allow_structured_metadata: true (enabled by default in recent versions)Resolution:
Enable structured metadata:
limits_config:
allow_structured_metadata: true
Move metadata to regular labels if structured metadata isn't needed.
Contact your Loki administrator to enable the feature.
Properties:
Error message:
stream '{job="app"}' has structured metadata too large: '70000' bytes, limit: '65536' bytes. Please see limits_config.max_structured_metadata_size or contact your Loki administrator to increase it
Cause:
The structured metadata size exceeds the configured limit.
Default configuration:
max_structured_metadata_size: 64 KBResolution:
Reduce the size of structured metadata by removing unnecessary fields.
Increase the limit:
limits_config:
max_structured_metadata_size: 128KB
Use compression or abbreviations for metadata values.
Properties:
Error message:
stream '{job="app"}' has too many structured metadata labels: '150', limit: '128'. Please see limits_config.max_structured_metadata_entries_count or contact your Loki administrator to increase it
Cause:
The number of structured metadata entries exceeds the limit.
Default configuration:
max_structured_metadata_entries_count: 128Resolution:
Reduce the number of structured metadata entries by consolidating fields.
Increase the limit:
limits_config:
max_structured_metadata_entries_count: 256
Combine related metadata into single entries using JSON or other formats.
Properties:
These errors occur when ingestion is administratively blocked.
Error message:
ingestion blocked for user <tenant> until <time> with status code 260
Or for policy-specific blocks:
ingestion blocked for user <tenant> (policy: <policy>) until <time> with status code 260
Cause:
Ingestion has been administratively blocked for the tenant, typically due to policy violations, billing issues, or maintenance. Status code 260 is a custom Loki status code for blocked ingestion.
Default configuration:
blocked_ingestion_status_code: 260 (custom status code)Resolution:
Properties:
Follow this workflow when investigating ingestion issues:
Check metrics:
# See which errors are occurring
sum by (reason) (rate(loki_discarded_samples_total[5m]))
# Identify affected tenants
sum by (tenant, reason) (rate(loki_discarded_bytes_total[5m]))
Review logs from distributors and ingesters:
# Enable write failure logging per tenant
limits_config:
limited_log_push_errors: true
Test push endpoint:
curl -H "Content-Type: application/json" \
-XPOST http://loki:3100/loki/api/v1/push \
--data-raw '{"streams":[{"stream":{"job":"test"},"values":[["'"$(date +%s)000000000"'","test log line"]]}]}'
Check tenant limits:
curl http://loki:3100/loki/api/v1/user_stats
Review configuration for affected tenant or global limits.
Monitor system resources: