jstests/query_golden/README.plan_stability.md
The plan_stability tests record the current winning plan for a set of ~ 1K queries produced by SPM-3816. If those plans ever change, the test is expected to fail at which point a human would decide if the changed plans are for the better or for the worse.
The plan_stability test is a standard golden test:
$ buildscripts/resmoke.py run \
--suites=query_golden_classic \
'--mongodSetParameters={internalQueryFrameworkControl: forceClassicEngine, featureFlagCostBasedRanker: ..., internalQueryCBRCEMode: ...}' \
jstests/query_golden/plan_stability.js
There are several resmoke suites predefined for different plan ranking modes, for which it is not needed to add mongod parameters:
query_golden_cbr_automatic
query_golden_cbr_automatic_no_multiplanning_results
query_golden_cbr_automatic_cost_choice
query_golden_cbr_sampling
query_golden_cbr_histogram
$ buildscripts/resmoke.py run --suites=query_golden_cbr_automatic jstests/query_golden/plan_stability.js
To obtain a diff that contains an individual diff fragment for each changed plan:
$HOME/.config/git/attributes:**/plan_stability* diff=plan_stability
~/.golden_test_config.yml to use a customized diff command:diffCmd: 'git -c diff.plan_stability.xfuncname=">>>pipeline" diff --unified=0 --function-context --no-index "{{expected}}" "{{actual}}"'
buildscripts/golden_test.py diff as usual and the output will look like this:...
@@ -8137,7 +8137,7 @@ >>>pipeline
{">>>pipeline": [{"$match":{"i_compound":{"$ne":15},"z_compound":{"$nin":[6,7]}}},{"$skip":12},{"$project":{"_id":0,"a_compound":1,"h_idx":1}}],
- "winningPlan": {"stage":"PROJECTION_SIMPLE","inputStage":{"stage":"SKIP","inputStage":{"stage":"FETCH","filter":true,"inputStage":{"stage":"IXSCAN","indexName":"z_compound_1","indexBounds":{"z_compound":["[MinKey, 6.0)","(6.0, 7.0)","(7.0, MaxKey]"]}}}}},
- "keys" : 98745,
- "docs" : 98743,
+ "winningPlan": {"stage":"PROJECTION_SIMPLE","inputStage":{"stage":"FETCH","inputStage":{"stage":"SKIP","inputStage":{"stage":"IXSCAN","indexName":"i_compound_1_z_compound_1","indexBounds":{"i_compound":["[MinKey, 15.0)","(15.0, MaxKey]"],"z_compound":["[MinKey, 6.0)","(6.0, 7.0)","(7.0, MaxKey]"]}}}}},
+ "keys" : 100000,
+ "docs" : 98730,
"sorts": 0,
"plans": 4,
"rows" : 98730},
...
This provides the plan that changed, the pipeline it belonged to, and the execution counters that have changed.
The feature-extractor internal repository contains a summarization script that can be used to obtain a summary of the failed test as well as information on the individual regressions that should be looked into. Please see scripts/cbr/README.md in that repository for more information.
In Evergreen, the diff will most likely show a pipeline below the counters. This is however the following pipeline in the test, not the one you are looking for. The problematic pipeline is the one that comes before it in the expected_output file.
In local execution, if your environment is configured as described above, the diff will show the actual pipeline of interest, above the counters.
buildscripts/resmoke.py run \
--suites=query_golden_classic \
--mongodSetParameters='{internalQueryFrameworkControl: forceClassicEngine, featureFlagCostBasedRanker: True, internalQueryCBRCEMode: samplingCE, internalQuerySamplingBySequentialScan: True}' \
jstests/query_golden/plan_stability.js \
--pauseAfterPopulate
and wait until the script has advanced to the following log line:
[js_test:plan_stability] [jsTest] ----
[js_test:plan_stability] [jsTest] TestData.pauseAfterPopulate is set. Pausing indefinitely ...
[js_test:plan_stability] [jsTest] ----
mongodb://127.0.0.1:20000 and run the offending pipeline against the db.plan_stability collection.mongosh mongodb://127.0.0.1:20000
pipeline = [...];
db.plan_stability.aggregate(pipeline).explain().queryPlanner.winningPlan;
db.plan_stability.aggregate(pipeline).explain().queryPlanner.rejectedPlans.sort((a,b) => b.costEstimate - a.costEstimate)[0]
The pipelines in the diff are EJSON-ish, while the mongosh shell expects JavaScript. EJSON-ish and JavaScript are identical when it comes to basic types, such as strings and integers, but if the pipeline contains timestamps and decimals, the JSON needs to be converted to JavaScript using EJSON.parse():
> pipelineStr = '[{"$match":{"field20_Timestamp_idx":{"$gt":{"$timestamp":{"t":1760551205,"i":0}}}},"field12_Decimal128_idx":{"$lte":{"$numberDecimal":"35.1"}}}]';
> pipeline = EJSON.parse(pipelineStr);
[
{
'$match': {
field20_Timestamp_idx: { '$gt': Timestamp({ t: 1760551205, i: 0 }) }
},
field12_Decimal128_idx: { '$lte': Decimal128('35.1') }
}
]
db.plan_stability2.aggregate(pipeline);
Note that ISO Timestamps need to be handled separately. JSON will store those as strings, resulting in loss of typing information that EJSON.parse() can not recover. This will result in a semantic change in the query unless manually converted to an ISODate object:
// Manually convert
// [{"$match":{"field19_datetime_idx":{"$gte":"2024-01-27T00:00:00.000Z"}}}]
// to the correct JavaScript
pipeline = [
{$match: {field19_datetime_idx: {$gte: ISODate("2024-01-27T00:00:00.000Z")}}},
];
For the majority of the plans, it will be obvious if the new plan is better or worse because all the execution counters would have moved in the same direction without any ambiguity.
Some plans, such as those involving $sort or $limit will sometimes change in a way that makes some counters better while others become worse. For those queries, consider running them manually multiple times to compare their wallclock execution times:
pipeline = [...];
db.adminCommand({setParameter: 1, featureFlagCostBasedRanker: false});
db.plan_stability.aggregate(pipeline).explain('executionStats').executionStats.executionTimeMillis;
db.adminCommand({setParameter: 1, featureFlagCostBasedRanker: true, internalQueryCBRCEMode: "samplingCE"});
db.plan_stability.aggregate(pipeline).explain('executionStats').executionStats.executionTimeMillis;
You can also modify collSize in plan_stability.js to temporarily use a larger scale factor.
If you want to run a comparison between estimation methods X and Y:
If method X is not multi-planning, place the jstests/query_golden/expected_files/X for estimation method X in the root of expected_files, so that they are used as the base for the comparison;
Temporary remove the expected files for method Y from expected_files/query_golden/expected_files/Y so that they are not considered;
Run the test as described above, specifying featureFlagCostBasedRanker/internalQueryCBRCEMethod;
Use the summarization script as described above to produce a report.
To accept the new plans, use buildscripts/golden_test.py accept, as with any other golden test.
If a given pipeline proves flaky, that is, is flipping between one plan and another for no reason,
you can comment it out from the test with a note. Re-run the test and then run buildscripts/golden_test.py accept
to persist the change.