Back to Datafusion

EXPLAIN

docs/source/user-guide/sql/explain.md

53.1.016.7 KB
Original Source
<!--- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -->

EXPLAIN

The EXPLAIN command shows the logical and physical execution plan for the specified SQL statement.

Syntax

<pre> EXPLAIN [ANALYZE] [VERBOSE] [FORMAT format] statement </pre>

EXPLAIN

Shows the execution plan of a statement. If you need more detailed output, use EXPLAIN VERBOSE. Note that EXPLAIN VERBOSE only supports the indent format.

The optional [FORMAT format] clause controls how the plan is displayed as explained below. If this clause is not specified, the plan is displayed using the format from the configuration value datafusion.explain.format.

tree format (default)

The tree format is modeled after DuckDB plans and is designed to be easier to see the high level structure of the plan

sql
> EXPLAIN FORMAT TREE SELECT SUM(x) FROM t GROUP BY b;
+---------------+-------------------------------+
| plan_type     | plan                          |
+---------------+-------------------------------+
| physical_plan | ┌───────────────────────────┐ |
|               | │       ProjectionExec      │ |
|               |--------------------   │ |
|               |sum(t.x): sum(t.x)@1|
|               | └─────────────┬─────────────┘ |
|               | ┌─────────────┴─────────────┐ |
|               | │       AggregateExec       │ |
|               |--------------------   │ |
|               | │       aggr: sum(t.x)      │ |
|               | │     group_by: b@0 as b    │ |
|               | │                           │ |
|               | │           mode:           │ |
|               | │      FinalPartitioned     │ |
|               | └─────────────┬─────────────┘ |
|               | ┌─────────────┴─────────────┐ |
|               | │    CoalesceBatchesExec    │ |
|               | └─────────────┬─────────────┘ |
|               | ┌─────────────┴─────────────┐ |
|               | │      RepartitionExec      │ |
|               |--------------------   │ |
|               | │   input_partition_count:  │ |
|               |1|
|               | │                           │ |
|               | │    partitioning_scheme:   │ |
|               | │      Hash([b@0], 16)      │ |
|               | └─────────────┬─────────────┘ |
|               | ┌─────────────┴─────────────┐ |
|               | │       AggregateExec       │ |
|               |--------------------   │ |
|               | │       aggr: sum(t.x)      │ |
|               | │     group_by: b@1 as b    │ |
|               | │       mode: Partial       │ |
|               | └─────────────┬─────────────┘ |
|               | ┌─────────────┴─────────────┐ |
|               | │       DataSourceExec      │ |
|               |--------------------   │ |
|               | │         bytes: 224|
|               | │       format: memory      │ |
|               |rows: 1|
|               | └───────────────────────────┘ |
|               |                               |
+---------------+-------------------------------+
1 row(s) fetched.
Elapsed 0.016 seconds.

indent format

The indent format shows both the logical and physical plan, with one line for each operator in the plan. Child plans are indented to show the hierarchy.

See Reading Explain Plans for more information on how to interpret these plans.

sql
> CREATE TABLE t(x int, b int) AS VALUES (1, 2), (2, 3);
0 row(s) fetched.
Elapsed 0.004 seconds.

> EXPLAIN FORMAT INDENT SELECT SUM(x) FROM t GROUP BY b;
+---------------+-------------------------------------------------------------------------------+
| plan_type     | plan                                                                          |
+---------------+-------------------------------------------------------------------------------+
| logical_plan  | Projection: sum(t.x)                                                          |
|               |   Aggregate: groupBy=[[t.b]], aggr=[[sum(CAST(t.x AS Int64))]]                |
|               |     TableScan: t projection=[x, b]                                            |
| physical_plan | ProjectionExec: expr=[sum(t.x)@1 as sum(t.x)]                                 |
|               |   AggregateExec: mode=FinalPartitioned, gby=[b@0 as b], aggr=[sum(t.x)]       |
|               |     CoalesceBatchesExec: target_batch_size=8192                               |
|               |       RepartitionExec: partitioning=Hash([b@0], 16), input_partitions=1       |
|               |         AggregateExec: mode=Partial, gby=[b@1 as b], aggr=[sum(t.x)]          |
|               |           DataSourceExec: partitions=1, partition_sizes=[1]                   |
|               |                                                                               |
+---------------+-------------------------------------------------------------------------------+
2 row(s) fetched.
Elapsed 0.004 seconds.

pgjson format

The pgjson format is modeled after Postgres JSON format.

You can use this format to visualize the plan in existing plan visualization tools, such as dalibo

sql
> EXPLAIN FORMAT PGJSON SELECT SUM(x) FROM t GROUP BY b;
+--------------+----------------------------------------------------+
| plan_type    | plan                                               |
+--------------+----------------------------------------------------+
| logical_plan | [                                                  |
|              |   {                                                |
|              |     "Plan": {                                      |
|              |       "Expressions": [                             |
|              |         "sum(t.x)"                                 |
|              |       ],                                           |
|              |       "Node Type": "Projection",                   |
|              |       "Output": [                                  |
|              |         "sum(t.x)"                                 |
|              |       ],                                           |
|              |       "Plans": [                                   |
|              |         {                                          |
|              |           "Aggregates": "sum(CAST(t.x AS Int64))", |
|              |           "Group By": "t.b",                       |
|              |           "Node Type": "Aggregate",                |
|              |           "Output": [                              |
|              |             "b",                                   |
|              |             "sum(t.x)"                             |
|              |           ],                                       |
|              |           "Plans": [                               |
|              |             {                                      |
|              |               "Node Type": "TableScan",            |
|              |               "Output": [                          |
|              |                 "x",                               |
|              |                 "b"                                |
|              |               ],                                   |
|              |               "Plans": [],                         |
|              |               "Relation Name": "t"                 |
|              |             }                                      |
|              |           ]                                        |
|              |         }                                          |
|              |       ]                                            |
|              |     }                                              |
|              |   }                                                |
|              | ]                                                  |
+--------------+----------------------------------------------------+
1 row(s) fetched.
Elapsed 0.008 seconds.

graphviz format

The graphviz format uses the DOT language that can be used with Graphviz to generate a visual representation of the plan.

sql
> EXPLAIN FORMAT GRAPHVIZ SELECT SUM(x) FROM t GROUP BY b;
+--------------+------------------------------------------------------------------------------------------------------------------------------+
| plan_type    | plan                                                                                                                         |
+--------------+------------------------------------------------------------------------------------------------------------------------------+
| logical_plan |                                                                                                                              |
|              | // Begin DataFusion GraphViz Plan,                                                                                           |
|              | // display it online here: https://dreampuf.github.io/GraphvizOnline                                                         |
|              |                                                                                                                              |
|              | digraph {                                                                                                                    |
|              |   subgraph cluster_1                                                                                                         |
|              |   {                                                                                                                          |
|              |     graph[label="LogicalPlan"]                                                                                               |
|              |     2[shape=box label="Projection: sum(t.x)"]                                                                                |
|              |     3[shape=box label="Aggregate: groupBy=[[t.b]], aggr=[[sum(CAST(t.x AS Int64))]]"]                                        |
|              |     2 -> 3 [arrowhead=none, arrowtail=normal, dir=back]                                                                      |
|              |     4[shape=box label="TableScan: t projection=[x, b]"]                                                                      |
|              |     3 -> 4 [arrowhead=none, arrowtail=normal, dir=back]                                                                      |
|              |   }                                                                                                                          |
|              |   subgraph cluster_5                                                                                                         |
|              |   {                                                                                                                          |
|              |     graph[label="Detailed LogicalPlan"]                                                                                      |
|              |     6[shape=box label="Projection: sum(t.x)\nSchema: [sum(t.x):Int64;N]"]                                                    |
|              |     7[shape=box label="Aggregate: groupBy=[[t.b]], aggr=[[sum(CAST(t.x AS Int64))]]\nSchema: [b:Int32;N, sum(t.x):Int64;N]"] |
|              |     6 -> 7 [arrowhead=none, arrowtail=normal, dir=back]                                                                      |
|              |     8[shape=box label="TableScan: t projection=[x, b]\nSchema: [x:Int32;N, b:Int32;N]"]                                      |
|              |     7 -> 8 [arrowhead=none, arrowtail=normal, dir=back]                                                                      |
|              |   }                                                                                                                          |
|              | }                                                                                                                            |
|              | // End DataFusion GraphViz Plan                                                                                              |
|              |                                                                                                                              |
+--------------+------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.010 seconds.

EXPLAIN ANALYZE

Shows the execution plan and metrics of a statement. Note that EXPLAIN ANALYZE only supports the indent format.

sql
EXPLAIN ANALYZE SELECT SUM(x) FROM table GROUP BY b;

+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type         | plan                                                                                                                                                      |
+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+
| Plan with Metrics | CoalescePartitionsExec, metrics=[]                                                                                                                        |
|                   |   ProjectionExec: expr=[SUM(table.x)@1 as SUM(x)], metrics=[]                                                                                             |
|                   |     HashAggregateExec: mode=FinalPartitioned, gby=[b@0 as b], aggr=[SUM(x)], metrics=[outputRows=2]                                                       |
|                   |       CoalesceBatchesExec: target_batch_size=4096, metrics=[]                                                                                             |
|                   |         RepartitionExec: partitioning=Hash([Column { name: "b", index: 0 }], 16), metrics=[sendTime=839560, fetchTime=122528525, repartitionTime=5327877] |
|                   |           HashAggregateExec: mode=Partial, gby=[b@1 as b], aggr=[SUM(x)], metrics=[outputRows=2]                                                          |
|                   |             RepartitionExec: partitioning=RoundRobinBatch(16), metrics=[fetchTime=5660489, repartitionTime=0, sendTime=8012]                              |
|                   |               DataSourceExec: file_groups={1 group: [[/tmp/table.csv]]}, has_header=false, metrics=[]                                                        |
+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+

By default EXPLAIN ANALYZE shows the aggregated metrics from all partitions for each operator. If you need to display per-partition metrics, use EXPLAIN ANALYZE VERBOSE.

You can also set datafusion.explain.analyze_level from the configuration value to control the detail level for the metrics displayed.