docs/design/sql/16-columbia-optimizer-feedback.md
| ℹ️ Since: 5.x |
|---|
| Related Github issues | Pull Request |
| Document Status / Completeness | REJECTED |
| Developer(s) | Sasha Syrotenko |
| Technical Reviewers | Viliam Durina |
This document describes design, efforts and feature rejection reasons of transferring Hazelcast SQL engine to Columbia optimizer.
Apache Calcite introduced Columbia optimizer implementation with 1.24 release. It implements CASCADES optimization algorithm designed by Columbia optimizer creator, described here.
Apache Calcite names its Columbia optimizer implementation as a "Top-Down Rule Driver". We will also name it with that style : Top-Down Optimizer.
To enable the top-down rule driver, the user should enable calcite.planner.topdown.opt option.
Main benefit to replace Volcano optimizer with new Columbia optimizer is to reduce optimization search space for operators
like Index Scan or Merge Join.
Main requirement to achieve CASCADES-style optimization is to implement PhysicalNode interface for each physical rel available for the optimizer.
(!) We need to note that the meaning of Hazelcast physical relation and Top-Down Optimizer's physical node are different.
It contains 4 methods to implement :
passThrough -- propagates the trait request from given rel to its children rels. In most common case it returns a copy of original rel with satisfied propagated traits.
But, in some cases original rel may be replaced with another rel based on propagated trait analysis.
For example, it may impose Full Scan -> Index Scan conversion based on collation trait state and sorted index for collated attribute availability.
In our prototype, the majority of relations uses default implementation, and only a few rels like FullScanPhysicalRel defines custom logic.
passThroughTraits defines trait propagation logic for given rel.
Some nodes just forward traits, but the majority of rels are using their custom trait propagation logic.
derive - bubbles traits up for given rel. Good use case of trait derivation is described here, pt.4.
deriveTraits - defines traits derivation logic. For example: when sorted Index Scan is picked and
that collation was requested by Sort rel, it may be just eliminated due to redundancy.
As it was mentioned above, the meaning of Hazelcast physical relation and Top-Down Optimizer's physical node are different. Hazelcast has 3 optimization phases (state of 5.2.0) :
Project -> Calc or Filter -> Calc conversions.NONE convention rewrites with LOGICAL convention
with a few rel transpositions or merges.LOGICAL convention rewrites with PHYSICAL convention and applies
a lot of implementation-specific optimization rules.The issue here is that Top-Down Optimizer apply the rules immediately for the first convention conversion,
what is NONE -> LOGICAL in our case.
We tried to avoid that limitation approaching these actions:
Based on that, SQL team decided to postpone all research in optimizer upgrade direction.
The transfer to Top-Down Optimizer was partially implemented in pull request mentioned in header of the TDD. This code is ready to be applied in future efforts.