Back to Materialize

EXPLAIN ANALYZE

doc/user/content/sql/explain-analyze.md

12319.0 KB
Original Source

EXPLAIN ANALYZE:

  • Summarizes cluster status.
  • Reports on the performance of indexes and materialized views.
  • Provide the execution plan annotated with TopK hints. The TopK query pattern groups by some key and return the first K elements within each group according to some ordering.

{{< warning >}} EXPLAIN is not part of Materialize's stable interface and is not subject to our backwards compatibility guarantee. The syntax and output of EXPLAIN may change arbitrarily in future versions of Materialize. {{< /warning >}}

Syntax

mzsql
EXPLAIN ANALYZE
      CPU [, MEMORY] [WITH SKEW]
    | MEMORY [, CPU] [WITH SKEW]
    | HINTS
FOR INDEX <name> | MATERIALIZED VIEW <name>
[ AS SQL ]
;

EXPLAIN ANALYZE CLUSTER
      CPU [, MEMORY] [WITH SKEW]
    | MEMORY [, CPU] [WITH SKEW]
[ AS SQL ]
;

{{< tip >}} If you want to specify both CPU or MEMORY, they may be listed in any order; however, each may appear only once. {{</ tip >}}

ParameterDescription
CPUReports consumed CPU time information total_elapsed for each operator (not inclusive of its child operators; FOR INDEX, FOR MATERIALIZED VIEW) or for each object in the current cluster (CLUSTER).
MEMORYReports consumed memory information total_memory and number of records total_records for each operator (not including child operators; FOR INDEX, FOR MATERIALIZED VIEW) or for each object in the current cluster (CLUSTER).
WITH SKEWOptional. If specified, includes additional information about average and per-worker consumption and ratios (of CPU and/or MEMORY).
HINTSAnnotates the LIR plan with TopK hints (FOR INDEX, FOR MATERIALIZED VIEW).
AS SQLOptional. If specified, returns the SQL associated with the specified EXPLAIN ANALYZE command without executing it. You can modify this SQL as a starting point to create customized queries.

Privileges

The privileges required to execute this statement are:

{{% include-headless "/headless/sql-command-privileges/explain-analyze" %}}

Examples

EXPLAIN ANALYZE attributes runtime metrics to PHYSICAL PLAN operators.

The attribution examples in this section reference the wins_by_item index (and the underlying winning_bids view) from the quickstart guide:

sql
CREATE SOURCE auction_house
FROM LOAD GENERATOR AUCTION
(TICK INTERVAL '1s', AS OF 100000)
FOR ALL TABLES;

CREATE VIEW winning_bids AS
  SELECT DISTINCT ON (a.id) b.*, a.item, a.seller
    FROM auctions AS a
    JOIN bids AS b
      ON a.id = b.auction_id
   WHERE b.bid_time < a.end_time
     AND mz_now() >= a.end_time
   ORDER BY a.id, b.amount DESC, b.bid_time, b.buyer;

CREATE INDEX wins_by_item ON winning_bids (item);

EXPLAIN ANALYZE MEMORY

The following examples reports on the memory usage of the index wins_by_item:

mzsql
EXPLAIN ANALYZE MEMORY FOR INDEX wins_by_item;

For the index, EXPLAIN ANALYZE MEMORY reports on the memory usage and the number of records for each operator in the dataflow:

operatortotal_memorytotal_records
Arrange386 kB15409
  Stream u8
Non-monotonic TopK36 MB731975
  Differential Join %0 » %1
    Arrange2010 kB84622
      Stream u5
    Arrange591 kB15410
      Read u4

The results show the TopK operator is overwhelmingly responsible for memory usage.

EXPLAIN ANALYZE CPU

The following examples reports on the cpu usage of the index wins_by_item:

mzsql
EXPLAIN ANALYZE CPU FOR INDEX wins_by_item;

For the index, EXPLAIN ANALYZE CPU reports on total time spent in each operator (not inclusive of its child operators) in the dataflow:

operatortotal_elapsed
Arrange00:00:00.161341
  Stream u8
Non-monotonic TopK00:00:15.153963
  Differential Join %0 » %100:00:00.978381
    Arrange00:00:00.536282
      Stream u5
    Arrange00:00:00.171586
      Read u4

EXPLAIN ANALYZE CPU, MEMORY

You can report on both CPU and memory usage simultaneously:

mzsql
EXPLAIN ANALYZE CPU, MEMORY FOR INDEX wins_by_item;

You can specify both CPU or MEMORY in any order; however, each may appear only once. The order of CPU and MEMORY in the statement determines the order of the output columns

For example, in the above example where the CPU was listed before MEMORY, the CPU time (total_elasped) column is listed before the MEMORY information total_memory and total_records.

operatortotal_elapsedtotal_memorytotal_records
Arrange00:00:00.190801389 kB15435
  Stream u8
Non-monotonic TopK00:00:16.19338136 MB733457
  Differential Join %0 » %100:00:01.107056
    Arrange00:00:00.5928182017 kB84793
      Stream u5
    Arrange00:00:00.214064595 kB15436
      Read u4

EXPLAIN ANALYZE ... WITH SKEW

In clusters with more than one worker, worker skew can occur when data is unevenly distributed across workers. Extreme cases of skew can seriously impact performance. You can use EXPLAIN ANALYZE ... WITH SKEW to identify this scenario. The WITH SKEW option includes the per worker and average worker performance numbers for each operator, along with each worker's ratio compared to the average.

For the below example, assume there are 2 workers in the cluster.

{{< tip >}}

To determine how many workers a given cluster size has, you can query mz_catalog.mz_cluster_replica_sizes.

{{</ tip >}}

You can explain MEMORY and/or CPU with the WITH SKEW option. For example, the following runs EXPLAIN ANALYZE MEMORY WITH SKEW:

mzsql
EXPLAIN ANALYZE MEMORY WITH SKEW FOR INDEX wins_by_item;

The results include the per worker and average worker performance numbers for each operator, along with each worker's ratio compared to the average:

operatorworker_idmemory_ratioworker_memoryavg_memorytotal_memoryrecords_ratioworker_recordsavg_recordstotal_records
Arrange00.878 kB97 kB389 kB0.83099386215448
Arrange11.59154 kB97 kB389 kB1.586113386215448
Arrange21.61157 kB97 kB389 kB1.616236386215448
Arrange30272 bytes97 kB389 kB00386215448
  Stream u8
Non-monotonic TopK019225 kB9261 kB36 MB1183148183486.75733947
Non-monotonic TopK119222 kB9261 kB36 MB1183319183486.75733947
Non-monotonic TopK219301 kB9261 kB36 MB1183585183486.75733947
Non-monotonic TopK319293 kB9261 kB36 MB1183895183486.75733947
  Differential Join %0 » %1
    Arrange00.97487 kB505 kB2019 kB12116521213.584854
    Arrange10.97489 kB505 kB2019 kB12127421213.584854
    Arrange21.1555 kB505 kB2019 kB12129821213.584854
    Arrange30.96487 kB505 kB2019 kB12111721213.584854
      Stream u5
    Arrange01149 kB149 kB595 kB138623862.515450
    Arrange11148 kB149 kB595 kB138623862.515450
    Arrange21149 kB149 kB595 kB138633862.515450
    Arrange31149 kB149 kB595 kB138633862.515450
      Read u4

The ratio column tells you whether a worker is particularly over- or under-loaded:

  • a ratio below 1 indicates a worker doing a below average amount of work.

  • a ratio above 1 indicates a worker doing an above average amount of work.

While there will always be some amount of variation, very high ratios indicate a skewed workload. Here the memory ratios are mostly close to 1, indicating there is very little worker skew everywhere but at the top level arrangement, where worker 3 has no records.

EXPLAIN ANALYZE HINTS

EXPLAIN ANALYZE HINTS can annotate your plan (specifically, each TopK operator) with suggested TopK hints; i.e., DISTINCT ON INPUT GROUP SIZE= value.

For example, the following runs EXPLAIN ANALYZE HINTS on the wins_by_item index:

mzsql
EXPLAIN ANALYZE HINTS FOR INDEX wins_by_item;

The result shows that the wins_by_item index has only one TopK operator and suggests the hint (i.e, the DISTINCT ON INPUT GROUP SIZE= value) of 255.0.

operatorlevelsto_cuthintsavings
Arrange
  Stream u8
Non-monotonic TopK8625526 MB
  Differential Join %0 » %1
    Arrange
      Stream u5
    Arrange
      Read u4

With the hint information, you can recreate the view and index to improve memory usage:

sql
DROP VIEW winning_bids CASCADE;

CREATE VIEW winning_bids AS
    SELECT DISTINCT ON (a.id) b.*, a.item, a.seller
      FROM auctions AS a
      JOIN bids AS b
        ON a.id = b.auction_id
     WHERE b.bid_time < a.end_time
       AND mz_now() >= a.end_time
   OPTIONS (DISTINCT ON INPUT GROUP SIZE = 255) -- use hint!
  ORDER BY a.id,
    b.amount DESC,
    b.bid_time,
    b.buyer;

CREATE INDEX wins_by_item ON winning_bids (item);

Re-running the TopK-hints query will show only null hints; i.e., there are no hints because our TopK is now appropriately sized.

To see if the indexe's memory usage has improved with the hint, rerun the following EXPLAIN ANALYZE MEMORY command:

mzsql
EXPLAIN ANALYZE MEMORY FOR INDEX wins_by_item;

The results show that the TopK operator uses 11MB of memory, less than a third of the ~36MB of memory it was using before:

operatortotal_memorytotal_records
Arrange391 kB15501
  Stream u10
Non-monotonic TopK11 MB226706
  Differential Join %0 » %1
    Arrange1994 kB85150
      Stream u5
    Arrange601 kB15502
      Read u4

EXPLAIN ANALYZE CLUSTER ...

It is possible to look at overall cluster status, rather than individual indexes or materialized views. This is useful for quickly identifying skewed dataflows as well as which dataflows are taking up the most resources.

Running EXPLAIN ANALYZE CLUSTER MEMORY, CPU will identify which dataflows are using the most resources. Running this statement on a small cluster with 4 workers, we find:

objectglobal_idtotal_elapsedtotal_memorytotal_records
materialize.public.wins_by_itemu800:00:50.73103342 MB861512
materialize.public.wins_by_itemu900:00:00.992696406 kB15950

Note that the output is sorted by total_elapsed---the output is ordered by whichever property is listed first. Here it also happens to be sorted by total_memory and total_records: the dataflows processing the most data took the most time. On a cluster with dozens of indexes and materialized views, EXPLAIN ANALYZE CLUSTER reveals which dataflows are consuming the most resources.

We can quickly find skewed dataflows on a cluster by running EXPLAIN ANALYZE CLUSTER MEMORY WITH SKEW; here is an example on a small cluster with 4 workers:

objectglobal_idworker_idmax_operator_memory_ratioworker_memoryavg_memorytotal_memorymax_operator_records_ratioworker_recordsavg_recordstotal_records
materialize.public.wins_by_itemu921.63164 kB101 kB404 kB1.6264113968.515874
materialize.public.wins_by_itemu911.58159 kB101 kB404 kB1.5862863968.515874
materialize.public.wins_by_itemu811.0610 MB10 MB41 MB1213718214325.5857302
materialize.public.wins_by_itemu801.0110 MB10 MB41 MB1215075214325.5857302
materialize.public.wins_by_itemu831.0110 MB10 MB41 MB1214020214325.5857302
materialize.public.wins_by_itemu82110 MB10 MB41 MB1214489214325.5857302
materialize.public.wins_by_itemu900.7980 kB101 kB404 kB0.831773968.515874
materialize.public.wins_by_itemu930272 bytes101 kB404 kB3968.515874

The u9 and u8 dataflows make up the wins_by_item dataflow (where u8 does the work and u9 arranges it). Both dataflows run on all four workers. We report the max_operator_memory_ratio for each worker on each dataflow: what is the ratio of that dataflow's memory usage to the average memory usage across all workers? Note that the output results are sorted by max_operator_memory_ratio, making it easy to spot skew. Here, workers 1 and 2 hold most of the records; worker 0 has half as many, and worker 3 has none at all.

EXPLAIN ANALYZE ... AS SQL

Under the hood:

You can append AS SQL to any EXPLAIN ANALYZE statement to see the SQL that would be run (without running it). You can then customize this SQL to report finer grained or other information. For example:

mzsql
EXPLAIN ANALYZE HINTS FOR INDEX wins_by_item AS SQL;

The results show the SQL that EXPLAIN ANALYZE would run to get the TopK hints for the wins_by_items index.