Planner Rule AI Notes

This file records planner rule related PR experience and pitfalls. Update an existing section when the topic overlaps; append a new dated entry only for a genuinely new topic.

2026-01-31 - NOT NOT in Outer Join ON / Double NOT handling

Background:

A reproduction showed LEFT JOIN ... ON NOT NOT (t0.k0 = t2.k0) treated as an other condition and leading to a cartesian-like join behavior.

Key takeaways:

PushDownNot inside applyPredicateSimplificationHelper applies to WHERE predicates; outer join ON conditions are not simplified there.
InnerJoin/SemiJoin fold join conditions into tempCond and go through ApplyPredicateSimplificationForJoin, so NOT NOT on logical ops is already handled.
For outer join ON conditions, apply PushDownNot to OtherConditions to avoid not(not(eq)) becoming an other condition that triggers cartesian joins.

Implementation choice:

In LogicalJoin.PredicatePushDown, before outer join processing, run PushDownNot on OtherConditions only to eliminate double NOT.
Guard the normalization with a UnaryNot presence check to avoid extra overhead.

Test and verification:

Add cases to pkg/planner/core/casetest/rule/testdata/predicate_pushdown_suite_in.json.
Run: go test ./pkg/planner/core/casetest/rule -run TestConstantPropagateWithCollation -record -tags=intest,deadlock.
Confirm left outer join keeps equal:[eq(t0.k0, t2.k0)].

Test data pattern used:

For predicate pushdown cases that need both plan and result validation, keep raw SQL in predicate_pushdown_suite_in.json.
Use the test runner to record EXPLAIN format='brief' and query results in predicate_pushdown_suite_out.json, so the case list remains simple and readable. Additional notes:
pushNotAcrossExpr already eliminates not(not(expr)) when expr is a logical operator, because wrapWithIsTrue returns logical ops unchanged.
An explicit double-NOT special case is optional for logical expressions; non-logical expressions should continue to use IsTruthWithNull semantics.
Plan regression (CARTESIAN + other cond) is easier to trigger than result differences on small datasets.

2026-02-04 - OR-constant in outer join other conditions (issue #65994)

Background:

A rewrite introduced LEFT JOIN ... ON (a = b OR 0) and the OR constant prevented join key extraction, leading to a cartesian-like join behavior and wrong results.

Key takeaways:

Outer join OtherConditions are not simplified by predicate pushdown, so trivial OR/AND constants must be normalized before updateEQCond.
Applying ApplyPredicateSimplificationForJoin on OtherConditions is sufficient to remove OR 0 and keep equality keys.

Implementation choice:

In LogicalJoin.normalizeJoinConditionsForOuterJoin, call ApplyPredicateSimplificationForJoin with propagateConstant=false on OtherConditions.

Test and verification:

Add SQL-only case to predicate_pushdown_suite_in.json; keep DDL in the test setup, otherwise explain will try to run DROP/CREATE.
Record with: go test ./pkg/planner/core/casetest/rule -run TestConstantPropagateWithCollation -tags=intest,deadlock -record.
Add integration test to tests/integrationtest/t/select.test and record via pushd tests/integrationtest && ./run-tests.sh -r select && popd (integration tests use -r, not -record).

2026-02-16 - Index range dimension mismatch with appended handle column (issue #66291)

Background:

A panic was reported under query + DDL interleaving: runtime error: index out of range [2] with length 2.
The DDL worker for ADD/DROP INDEX on the same table is serialized, so the issue is not caused by concurrent index DDL jobs running at the same time.

Root cause:

For non-unique index paths, planner may append a handle column to path.IdxCols for execution-range construction.
In detachCondAndBuildRangeForPath, ranges were built from the extended column set, but row-count estimation used indexCols truncated to index-definition columns.
In the partial-stats branch (index stats invalid + column stats available), cardinality estimation iterated using range dimension and read idxCols[i], which could overflow when ranges had an extra handle dimension.

Why it is not deterministic:

It only triggers when several conditions overlap:
- optimizer chooses the affected index path,
- estimation enters the partial-stats path,
- range construction includes appended handle dimension,
- query/DDL timing hits the same optimization window.

Implementation choice:

Fix at the source invariant instead of adding defensive checks in lower-level cardinality code.
Keep row-count estimation on index-definition columns (indexCols) to preserve existing plan-estimation behavior.
In pkg/planner/core/stats.go, if execution ranges carry appended handle dimensions, prune each range to the first len(indexCols) dimensions and use the pruned ranges for estimation.
Keep execution-range building unchanged and avoid a second DetachCondAndBuildRangeForIndex for estimation only.
This removes range/column dimension mismatch while keeping estimation inputs comparable with previous behavior (same estimation column slice and row-count API).

Test and verification:

Add regression test TestIndexRangeEstimationWithAppendedHandleColumn in pkg/planner/cardinality/selectivity_test.go.
The test uses a non-unique index + PK handle shape and mocked stats (missing index stats + available column stats) and verifies EXPLAIN does not panic.
Reference command:
- go test ./pkg/planner/cardinality -run TestIndexRangeEstimationWithAppendedHandleColumn --tags=intest -count=1

Reusable lessons:

Prefer fixing cross-layer shape mismatch at the producer boundary (planner/stats boundary), not by adding deep defensive guards in estimation internals.
When execution paths append handle columns, do not blindly reuse execution ranges for stats estimation. First align dimensions with estimation columns; prefer range pruning/projection over a second range build when possible.
For bugfixes in estimation paths, keep inputs comparable with previous baselines whenever possible (minimize semantic movement while removing the panic condition).
For flaky panic reports under query/DDL interleaving, first separate:
- DDL job serialization facts,
- metadata version visibility,
- optimizer path/branch conditions. This avoids misattributing the issue to impossible DDL concurrency.
Regression tests should intentionally force the vulnerable branch (here: index stats invalid + partial column stats) instead of relying on random timing.

2026-03-25 - IndexMerge handle columns must come from DataSource schema (PR #65798)

Background:

A planner fix for PK/common-handle tables updated setIndexMergeTableScanHandleCols and overwritePartialTableScanSchema to reuse UnMutableHandleCols instead of manufacturing _tidb_rowid.
Review surfaced one remaining inconsistency in PhysicalIndexScan.InitSchema: when scanning p.Columns for the handle column, the code still created a fresh fallback column if DataSourceSchema lookup failed.

Key takeaways:

For partial index scans built from DataSource, PhysicalIndexScan.DataSourceSchema is the original datasource schema (ds.Schema()), not a pruned KV schema.
PK handle/common handle metadata is established when building DataSource, via HandleCols / UnMutableHandleCols, so the handle column should already be discoverable from DataSourceSchema.
Creating a synthetic handle column in InitSchema hides invariant violations and can silently diverge from the rest of the index-merge handle-column logic.

Implementation choice:

Keep the table-scan side consistent by reusing UnMutableHandleCols in both setIndexMergeTableScanHandleCols and overwritePartialTableScanSchema.
In PhysicalIndexScan.InitSchema, require handle lookup to succeed from DataSourceSchema; use intest.Assert to make violations explicit in test builds instead of silently fabricating a new column.

Reusable lessons:

When fixing planner bugs around handle columns, first identify which representation is the source of truth:
- DataSourceSchema for logical/original columns,
- HandleCols / UnMutableHandleCols for stable handle identity,
- _tidb_rowid only for tables that truly need extra handle.
If two index-merge construction paths both touch handle columns, patch them together. Fixing only the table-scan side or only the partial-index side usually leaves a second latent failure path behind.