docs/content/stable/launch-and-manage/monitor-and-alert/query-tuning/explain-analyze.md
Use EXPLAIN and EXPLAIN ANALYZE to understand how the query planner has decided to execute a query and view actual runtime performance statistics. These commands provide valuable insights into how YugabyteDB processes a query and by analyzing this output, you can identify potential bottlenecks, such as inefficient index usage, excessive sorting or inefficient join strategy, and other performance issues. This information can guide you to optimize queries, create appropriate indexes, or restructure the underlying data to improve query performance.
{{<note>}} The query is executed only when using EXPLAIN ANALYZE. With vanilla EXPLAIN (that is, without ANALYZE), the output displays only estimates. {{</note>}}
The output is effectively the plan tree followed by a summary of timing and counters. Each line represents a step (a.k.a node) involved in processing the query. The top line shows the overall plan and its estimated cost. Subsequent indented lines detail sub-steps involved in the execution. The structure of this output can be complex but contains valuable information to understand how YugabyteDB processes a query. Let's break down the typical structure of the EXPLAIN ANALYZE output:
The following examples show how to improve query performance using output of the EXPLAIN ANALYZE. First, set up a local cluster for testing out the examples.
{{<setup/local status="no" dataplacement="no">}}
Create a key value table as follows:
CREATE TABLE kvstore (
key VARCHAR,
value VARCHAR,
PRIMARY KEY(key)
);
Add 10,000 key-values into the tables as follows:
SELECT setseed(0.5); -- to help generate the same random values
INSERT INTO kvstore(key, value)
SELECT substr(md5(random()::text), 1, 7), substr(md5(random()::text), 1, 10)
FROM generate_series(1,10000);
Run the ANALYZE command on the database to gather statistics for the query planner to use.
ANALYZE kvstore;
Use EXPLAIN on a basic query to fetch a value from the kvstore table.
{{<note>}}
For brevity, all the output for the following examples excludes metrics/counters with value zero. COSTS OFF option has also been added to show just the estimated costs.
{{</note>}}
EXPLAIN (ANALYZE, DIST, COSTS OFF) select value from kvstore where key='cafe32c';
You should see an output similar to the following:
QUERY PLAN
-------------------------------------------------------------------------------------
Index Scan using kvstore_pkey on kvstore (actual time=1.087..1.090 rows=1 loops=1)
Index Cond: ((key)::text = 'cafe32c'::text)
Storage Table Read Requests: 1
Storage Table Read Execution Time: 0.971 ms
Storage Table Rows Scanned: 1
Planning Time: 0.063 ms
Execution Time: 1.150 ms
Storage Read Requests: 1
Storage Read Execution Time: 0.971 ms
Storage Rows Scanned: 1
Storage Execution Time: 0.971 ms
From line 3, you can see that an index scan was done on the table (via Index Scan) using the primary key kvstore_pkey to retrieve one row (via rows=1). The Storage Rows Scanned: 1 indicates that as only one row was looked up, this was an optimal execution.
Fetch the key for a given value:
EXPLAIN (ANALYZE, DIST, COSTS OFF) select * from kvstore where value='85d083991d';
You should see an output similar to the following:
QUERY PLAN
-------------------------------------------------------------------
Seq Scan on kvstore (actual time=3.719..3.722 rows=1 loops=1)
Storage Filter: ((value)::text = '85d083991d'::text)
Storage Table Read Requests: 1
Storage Table Read Execution Time: 3.623 ms
Storage Table Rows Scanned: 10000
Planning Time: 1.176 ms
Execution Time: 3.771 ms
Storage Read Requests: 1
Storage Read Execution Time: 3.623 ms
Storage Rows Scanned: 10000
Catalog Read Requests: 1
Catalog Read Execution Time: 0.963 ms
Storage Execution Time: 4.586 ms
Immediately from line 3, you can see that this was a sequential scan (via Seq Scan). The actual rows=1 attribute indicates that only one row was returned, but Storage Rows Scanned: 10000 on line 12 indicates that all the 10,000 rows in the table were looked up to find one row. So, the execution time 3.771 ms was higher than the previous query. You can use this information to convert the sequential scan into an index scan by creating an index as follows:
CREATE INDEX idx_value_1 ON kvstore(value);
If you run the same fetch by value query, you should see an output similar to the following:
# EXPLAIN (ANALYZE, DIST, COSTS OFF) select * from kvstore where value='85d083991d';
QUERY PLAN
-----------------------------------------------------------------------------------
Index Scan using idx_value_1 on kvstore (actual time=1.884..1.888 rows=1 loops=1)
Index Cond: ((value)::text = '85d083991d'::text)
Storage Table Read Requests: 1
Storage Table Read Execution Time: 0.861 ms
Storage Table Rows Scanned: 1
Storage Index Read Requests: 1
Storage Index Read Execution Time: 0.867 ms
Storage Index Rows Scanned: 1
Planning Time: 0.063 ms
Execution Time: 1.934 ms
Storage Read Requests: 2
Storage Read Execution Time: 1.729 ms
Storage Rows Scanned: 2
Storage Execution Time: 1.729 ms
Peak Memory Usage: 24 kB
In this run, Index scan was used and the execution time has been lowered (1.934 ms - 40% improvement). Only one row was returned (via rows=1), but two rows were scanned (Storage Rows Scanned: 2) as one row was looked up from the index (via Storage Index Rows Scanned: 1) and one row was looked up from the table (via Storage Table Rows Scanned: 1). This is because, the executor looked up the row for the matching value: 85d083991d from the index but had to go to the main table to fetch the key, as we have specified SELECT * in our query. Only the values are present in the index and not the keys.
You can avoid this by including the key column in the index as follows:
CREATE INDEX idx_value_2 ON kvstore(value) INCLUDE(key);
If you run the same fetch by value query, you should see an output similar to the following:
# EXPLAIN (ANALYZE, DIST, COSTS OFF) select * from kvstore where value='85d083991d';
QUERY PLAN
----------------------------------------------------------------------------------------
Index Only Scan using idx_value_2 on kvstore (actual time=1.069..1.072 rows=1 loops=1)
Index Cond: (value = '85d083991d'::text)
Storage Index Read Requests: 1
Storage Index Read Execution Time: 0.965 ms
Storage Index Rows Scanned: 1
Planning Time: 0.074 ms
Execution Time: 1.117 ms
Storage Read Requests: 1
Storage Read Execution Time: 0.965 ms
Storage Rows Scanned: 1
Storage Execution Time: 0.965 ms
Notice that the operation has become an Index Only Scan instead of the Index Scan as before. Only one row has been scanned (via Storage Rows Scanned: 1) as only the index row was scanned (via Storage Index Rows Scanned: 1). The execution time has now been lowered to 1.117 ms, a 40% improvement over previous execution.
Fetch all the values starting with a prefix, say ca.
EXPLAIN (ANALYZE, DIST, COSTS OFF) SELECT * FROM kvstore WHERE value LIKE 'ca%' ORDER BY VALUE;
You should see an output similar to the following:
QUERY PLAN
-----------------------------------------------------------------------------
Sort (actual time=4.007..4.032 rows=41 loops=1)
Sort Key: value
Sort Method: quicksort Memory: 28kB
-> Seq Scan on kvstore (actual time=3.924..3.957 rows=41 loops=1)
Storage Filter: ((value)::text ~~ 'ca%'::text)
Storage Table Read Requests: 1
Storage Table Read Execution Time: 3.846 ms
Storage Table Rows Scanned: 10000
Planning Time: 1.119 ms
Execution Time: 4.101 ms
Storage Read Requests: 1
Storage Read Execution Time: 3.846 ms
Storage Rows Scanned: 10000
Catalog Read Requests: 1
Catalog Read Execution Time: 0.877 ms
Storage Execution Time: 4.723 ms
Only 41 rows were returned (rows=41) but a sequential scan (Seq Scan on kvstore) was performed on all the rows (Storage Rows Scanned: 10000). Also, the data had to be sorted (via Sort Method: quicksort) on value (via Sort Key: value). You can improve this execution by modifying the above created index into a range index, so that you can avoid the sorting operation, and reduce the number of rows scanned.
CREATE INDEX idx_value_3 ON kvstore(value ASC) INCLUDE(key);
If you run the same query, you should see the following output:
# EXPLAIN (ANALYZE, DIST, COSTS OFF) SELECT * FROM kvstore WHERE value LIKE 'ca%' ORDER BY VALUE;
QUERY PLAN
-----------------------------------------------------------------------------------------
Index Only Scan using idx_value_3 on kvstore (actual time=0.884..0.935 rows=41 loops=1)
Index Cond: ((value >= 'ca'::text) AND (value < 'cb'::text))
Storage Filter: ((value)::text ~~ 'ca%'::text)
Storage Index Read Requests: 1
Storage Index Read Execution Time: 0.792 ms
Storage Index Rows Scanned: 41
Planning Time: 4.290 ms
Execution Time: 1.008 ms
Storage Read Requests: 1
Storage Read Execution Time: 0.792 ms
Storage Rows Scanned: 41
Catalog Read Execution Time: 3.824 ms
Storage Execution Time: 4.616 ms
Now, only 41 rows were scanned (via Storage Rows Scanned: 41) to retrieve 41 rows (via rows=41) and there was no sorting involved as the data is stored sorted in the range index in ASC order. Even though YugabyteDB has optimizations for reverse scans, if most of your queries are going to retrieve the data in DESC order, then it would be better to define the ordering as DESC when creating a range sharded table/index.