scenarios/features/cloudwatch_logs_large_query/SPECIFICATION.md
This feature scenario demonstrates how to perform large-scale queries on Amazon CloudWatch Logs using recursive binary search to retrieve more than the 10,000 result limit.
Important: This is a complete, self-contained scenario that handles all setup and cleanup automatically. The scenario includes:
For an introduction, see the README.md.
This scenario uses the following CloudWatch Logs API actions:
StartQuery - Initiates a CloudWatch Logs Insights queryGetQueryResults - Retrieves results from a query, polling until completeLocation: scenarios/features/cloudwatch_logs_large_query/resources/stack.yaml
Resources Created:
/workflows/cloudwatch-logs/large-querystream1These files are for reference only. New versions of this example should create and upload logs as part of the scenario.
| Variable Name | Description | Type | Default |
|---|---|---|---|
stackName | CloudFormation stack name | String | "CloudWatchLargeQueryStack" |
queryStartDate | Query start timestamp | Long/Integer | From script output |
queryEndDate | Query end timestamp | Long/Integer | From script output |
queryLimit | Maximum results per query | Integer | 10000 |
logGroupName | Log group name (if not using stack) | String | "/workflows/cloudwatch-logs/large-query" |
logStreamName | Log stream name (if not using stack) | String | "stream1" |
The query itself is a "CloudWatch Logs Insights query syntax" string. The query must return the @timestamp field so follow-up queries can use that information. Here's a sample query string: fields @timestamp, @message | sort @timestamp asc. Notice it sorts in ascending order. You can sort in either asc or desc, but the recursive strategy described later will need to match accordingly.
Queries are jobs. You can start a query with StartQuery, but it immediately returns the queryId. You must poll a query using GetQueryResults until the query has finished. For the purpose of this example, a query has "finished" when GetQueryResults has returned a status of one of "Complete", "Failed", "Cancelled", "Timeout", or "Unknown".
StartQuery responds with an error if the query's start or end date occurs out of bounds of the log group creation date. The error message starts with "Query's end date and time".
Start the query and wait for it to "finish". Store the results. If the count of the results is less than the configured LIMIT, return the results. If the results are greater than or equal to the limit, go to Recursive queries.
If the result count from the previous step is 10000 (or the configured LIMIT), it is very likely that there are more results. The example must do a binary search of the remaining logs. To do this, get the date of the last log (earliest or latest, depending on sort order). Use that date as the start date of a new date range. The end date can remain the same.
Split that date range in half, resulting in two new date ranges. Call your query function twice; once for each new date range.
Concatenate the results of the first query with the results of the two new queries.
The following pseudocode illustrates this.
func large_query(date_range):
query_results = get_query_results(date_range)
if query_results.length < LIMIT
return query_results
else
date_range = [query_results.end, date_range.end]
d1, d2 = split(date_range)
return concat(query_results, large_query(d1), large_query(d2))
Purpose: Deploy resources and generate sample data as part of the scenario
resources/stack.yamlFully Self-Contained Behavior:
Purpose: Demonstrate recursive large query functionality
Steps:
fields @timestamp, @message | sort @timestamp ascPurpose: Remove created resources
Interactive Mode Steps:
Display each query execution with the following format:
Query date range: <START_ISO8601> to <END_ISO8601>. Found <COUNT> logs.
Example:
Starting recursive query...
Query date range: 2023-12-22T19:08:42.000Z to 2023-12-22T19:13:41.994Z. Found 10000 logs.
Query date range: 2023-12-22T19:09:41.995Z to 2023-12-22T19:11:41.994Z. Found 10000 logs.
Query date range: 2023-12-22T19:11:41.995Z to 2023-12-22T19:13:41.994Z. Found 10000 logs.
Query date range: 2023-12-22T19:10:41.995Z to 2023-12-22T19:11:11.994Z. Found 5000 logs.
Query date range: 2023-12-22T19:11:11.995Z to 2023-12-22T19:11:41.994Z. Found 5000 logs.
Query date range: 2023-12-22T19:12:41.995Z to 2023-12-22T19:13:11.994Z. Found 5000 logs.
Query date range: 2023-12-22T19:13:11.995Z to 2023-12-22T19:13:41.994Z. Found 5000 logs.
Queries finished in 11.253 seconds.
Total logs found: 50000
After all queries complete, display:
If user chooses to view sample logs, display first 10 entries:
Sample logs (first 10 of 50000):
[2023-12-22T19:08:42.000Z] Entry 0
[2023-12-22T19:08:42.006Z] Entry 1
[2023-12-22T19:08:42.012Z] Entry 2
...
| Error Code | Error Message Pattern | Handling Strategy |
|---|---|---|
InvalidParameterException | "Query's end date and time" | Date range is out of bounds; inform user and adjust dates |
ResourceNotFoundException | Log group not found | Verify log group exists; prompt user to run setup |
| action / scenario | metadata file | metadata key |
|---|---|---|
GetQueryResults | cloudwatch-logs_metadata.yaml | cloudwatch-logs_GetQueryResults |
StartQuery | cloudwatch-logs_metadata.yaml | cloudwatch-logs_StartQuery |
Large Query | cloudwatch-logs_metadata.yaml | cloudwatch-logs_Scenario_LargeQuery |