docs/adr/001-cassandra-find-traces-duration.md
The Cassandra spanstore implementation in Jaeger handles trace queries with duration filters (DurationMin/DurationMax) through a separate code path that cannot efficiently intersect with other query parameters like tags or general operation name filters. This behavior differs from other storage backends like Badger and may seem counterintuitive to users.
Cassandra's data model imposes specific constraints on query patterns. The duration_index table is defined with the following schema structure (as referenced in the CQL insertion query in internal/storage/v1/cassandra/spanstore/writer.go):
INSERT INTO duration_index(service_name, operation_name, bucket, duration, start_time, trace_id)
VALUES (?, ?, ?, ?, ?, ?)
This schema uses a composite partition key consisting of service_name, operation_name, and bucket (an hourly time bucket), with duration as a clustering column. In Cassandra, partition keys require equality constraints in WHERE clauses - you cannot perform range queries or arbitrary intersections across different partition keys efficiently.
The duration index is bucketed by hour to limit partition size and improve query performance. From internal/storage/v1/cassandra/spanstore/writer.go (line 57):
durationBucketSize = time.Hour
When a span is indexed, its start time is rounded to the nearest hour bucket (line 231 in writer.go):
timeBucket := startTime.Round(durationBucketSize)
The indexing function in indexByDuration (lines 229-243) creates two index entries per span:
indexByOperationName("") // index by service name alone
indexByOperationName(span.OperationName) // index by service name and operation name
In internal/storage/v1/cassandra/spanstore/reader.go, the findTraceIDs method (lines 275-301) performs an early return when duration parameters are present:
func (s *SpanReader) findTraceIDs(ctx context.Context, traceQuery *spanstore.TraceQueryParameters) (dbmodel.UniqueTraceIDs, error) {
if traceQuery.DurationMin != 0 || traceQuery.DurationMax != 0 {
return s.queryByDuration(ctx, traceQuery)
}
// ... other query paths
}
This early return means that when a duration query is detected, all other query parameters except ServiceName and OperationName are effectively ignored (tags, for instance, are not processed).
The queryByDuration method (lines 333-375) iterates over hourly buckets within the query time range and issues a Cassandra query for each bucket:
startTimeByHour := traceQuery.StartTimeMin.Round(durationBucketSize)
endTimeByHour := traceQuery.StartTimeMax.Round(durationBucketSize)
for timeBucket := endTimeByHour; timeBucket.After(startTimeByHour) || timeBucket.Equal(startTimeByHour); timeBucket = timeBucket.Add(-1 * durationBucketSize) {
query := s.session.Query(
queryByDuration,
timeBucket,
traceQuery.ServiceName,
traceQuery.OperationName,
minDurationMicros,
maxDurationMicros,
traceQuery.NumTraces*limitMultiple)
// execute query...
}
Each query specifies exact values for bucket, service_name, and operation_name (the partition key components), along with a range filter on duration (the clustering column). The query definition (lines 51-55) is:
SELECT trace_id
FROM duration_index
WHERE bucket = ? AND service_name = ? AND operation_name = ? AND duration > ? AND duration < ?
LIMIT ?
Unlike storage backends such as Badger (which can perform hash-joins and arbitrary index intersections), Cassandra's partition-based architecture makes cross-index intersections expensive and impractical:
Partition key constraints: The duration index requires equality on (service_name, operation_name, bucket). You cannot efficiently query across multiple operations or join with the tag index without scanning many partitions.
No server-side joins: Cassandra does not support server-side joins. To intersect duration results with tag results, the client would need to:
This would be inefficient for large result sets and would require fetching potentially many trace IDs over the network.
Hourly bucket iteration: The duration query already iterates over hourly buckets. Adding tag intersections would multiply the number of queries and result sets to merge.
The Badger storage backend handles duration queries differently. In internal/storage/v1/badger/spanstore/reader.go (around line 486), the FindTraceIDs method performs duration queries and then uses the results as a filter (hashOuter) that can be intersected with other index results:
if query.DurationMax != 0 || query.DurationMin != 0 {
plan.hashOuter = r.durationQueries(plan, query)
}
Badger uses an embedded key-value store where range scans and in-memory filtering are efficient, allowing it to merge results from multiple indices. This is a fundamental difference from Cassandra's distributed, partition-oriented design.
The Cassandra spanstore will continue to treat duration queries as a separate query path that does not intersect with tag indices or other non-service/operation filters.
When a TraceQueryParameters contains DurationMin or DurationMax:
duration_index table exclusivelyServiceName and OperationName parameters will be respected (used as partition key components)This approach is documented in code comments and in this ADR to set proper expectations.
When using Cassandra spanstore: Be aware that specifying DurationMin or DurationMax will cause tag filters to be ignored. Validate that ErrDurationAndTagQueryNotSupported is returned if both are specified (enforced in validateQuery at line 227-229 in reader.go).
For combined filtering needs: Consider using the Badger backend, or implement client-side filtering by:
Query design: Structure queries to leverage the indices available. Use ServiceName and OperationName in conjunction with duration queries for best results.
Implementation files:
internal/storage/v1/cassandra/spanstore/reader.go - Query logic and duration query pathinternal/storage/v1/cassandra/spanstore/writer.go - Duration index schema and insertion logicinternal/storage/v1/badger/spanstore/reader.go - Badger implementation for comparisonCassandra documentation:
Related code:
durationIndex constant (writer.go line 47-50): CQL insert statementqueryByDuration constant (reader.go line 51-55): CQL select statementdurationBucketSize constant (writer.go line 57): Hourly bucketingErrDurationAndTagQueryNotSupported (reader.go line 77): Validation that prevents combining duration and tag queries