Back to Cockroach

Statistics Generation

docs/tech-notes/observability/statistics_generation.md

26.1.315.3 KB
Original Source

Statistics Generation

Last Update: February 2024

Original author: maryliag

This document details the statistics collected for both Statements and Transactions, going over all columns on system.statement_statistics and system.transaction_statistics.

Table of Contents:

Statement

The columns stored per statement can be divided in: key values, metadata, statistics e computed columns.

Key Values

These are the values used as keys on the table and their combination will be unique for each new row.

  • aggregated_ts: the timestamp in which the statistics are being grouped. The default period is 1h, meaning all executions on 1:xx are saved with aggregated_ts of 1:00 and so on. Ex: 2024-02-07 15:00:00+00
  • fingerprint_id: the fingerprint ID of the statement, which is generated by hashing the statement fingerprint (the query with the constants redacted), its database and failure status, and if it was part of an implicit transaction. Ex: \x0a5cc4970527253a
  • transaction_fingerprint_id: the transaction fingerprint ID of the transaction to which this statement was part of. Ex: \xa53f79db832692e5
  • plan_hash: the hash of the plan gist (described below). Ex: \x92fb94a0f3ac5a36
  • app_name: the application that ran the statement. Ex: insights
  • node_id: the ID of the gateway node in which the statement was executed. This value will be set to 0 if we disable the collection of gateway node with ths cluster setting sql.metrics.statement_details.gateway_node.enabled. Disabling this value help customers that have a lot of nodes and will generate a lot of unique rows for the same statement fingerprint. Ex: 1

Metadata

Metadata is a JSONB column with some information about the execution. The keys of this JSONB are:

  • db: the database on which the statement was executed on. Ex: insights
  • distsql: if was executed as a distributed SQL. Ex: true
  • failed: if the statement failed to be executed. Ex: false
  • fullScan: if the statement had a FULL SCAN as part of the plan. Ex: true
  • implicitTxn: if the statement was part of an implicit transaction. Ex: true
  • query: the statement itself with the parameters redacted. Ex: SELECT balance FROM insights_workload_table_1 ORDER BY balance DESC LIMIT $1
  • querySummary: the shorter version of the statement that is used on the Console UI, to improve readability on tables with a lot of information, such as the table on SQL Activity pages. Ex: SELECT balance FROM insights_workload_table_1
  • stmtType: the type of the statement1. Ex: TypeDML
  • vec: if was vectorized. Ex: true

Statistics

The statistics column is a JSONB that can be divided in 2 groups: executions and statistics. The majority of the numeric values on this column have its value as an object with mean and a sqDiff (square difference).

The values from execution_statistics are only populated using sampling:

  • cnt: how many times this fingerprint was sampled on the aggregation period. This is helpful to know how many execution these statistics are based on. Ex: 3
  • contentionTime: the time the statement was in contention with other statements. This time does not include the time it takes to stream results back to the client. Ex: {"mean": 0.0035941523333333334, "sqDiff": 0.0000036056329785926688}
  • cpuSQLNanos: the SQL CPU time spent executing. It does not include SQL planning time nor KV execution time. Ex: {"mean": 193173, "sqDiff": 767794898}
  • maxDiskUsage: the maximum disk usage. Ex: {"mean": 0, "sqDiff": 0}
  • maxMemUsage: the maximum memory usage. Ex: {"mean": 2.2528E+5, "sqDiff": 0}
  • networkBytes: the amount of data transferred over the network (e.g., between regions and nodes). If this value is 0, the statement was executed on a single node. Ex: {"mean": 0, "sqDiff": 0}
  • networkMsgs: the amount of network messages over the network. Ex: {"mean": 0, "sqDiff": 0}
  • mvccIteratorStats: statistics by the MVCC iterator. Ex: {"blockBytes": {"mean": 0, "sqDiff": 0}, "blockBytesInCache": {"mean": 0, "sqDiff": 0}, "keyBytes": {"mean": 0, "sqDiff": 0}, "pointCount": {"mean": 1006, "sqDiff": 0}, "pointsCoveredByRangeTombstones": {"mean": 0, "sqDiff": 0}, "rangeKeyContainedPoints": {"mean": 0, "sqDiff": 0}, "rangeKeyCount": {"mean": 0, "sqDiff": 0}, "rangeKeySkippedPoints": {"mean": 0, "sqDiff": 0}, "seekCount": {"mean": 1E+1, "sqDiff": 0}, "seekCountInternal": {"mean": 1E+1, "sqDiff": 0}, "stepCount": {"mean": 1E+3, "sqDiff": 0}, "stepCountInternal": {"mean": 1006, "sqDiff": 0}, "valueBytes": {"mean": 149912, "sqDiff": 0}}

The values from statistics are updated on every execution:

  • bytesRead: the aggregation of all bytes read from disk across all operators for this statement. Ex: {"mean": 173717.41176470587, "sqDiff": 38420.117647061816}
  • cnt: amount of executions of this statement. Notice this value will always be greater or equal the cnt on execution_statistics. Ex: 17
  • firstAttemptCnt: amount of first attempts of this statement. The difference between this value and the cnt will be the amount of retries. Ex: 17
  • idleLat: the time between statement executions in a transaction. Ex: {"mean": 0, "sqDiff": 0}
  • indexes: the list of indexes used by this statement. The format will be tableID@indexID. Ex: ["115@1"]
  • lastErrorCode: the last error code if the statement failed. Ex: XXUUU
  • lastExecAt: the timestamp of the last execution withing this aggregated timestamp of the statement. Ex: 2024-02-07T15:37:58.201843Z
  • latencyInfo: information about latency. The values for max and min are based on all execution of the statement within the aggregated timestamp, and the percentiles are sampled and can include a period longer than the aggregated timestamp of the statement. Ex: {"max": 2.82403375, "min": 0.002160625, "p50": 0.007772875, "p90": 0.007772875, "p99": 0.007772875}
  • maxRetries: amount of retries. Ex: 0
  • nodes: a list of all nodes that were part of the statement execution. Note that even if we disable the gateway node column, it will still show the value here. Ex: [1]
  • numRows: number of rows affected by this statement. Ex: {"mean": 365.70588235294116, "sqDiff": 1104041.5294117648}
  • ovhLat: overhead latency, includes transaction and retry management, error checking, etc. Ex: {"mean": 0.000003968117647044402, "sqDiff": 2.116522637650619E-10}
  • parseLat: the time to transform the SQL string into an abstract syntax tree (AST). Ex: {"mean": 0, "sqDiff": 0}
  • planGists: the plan gist used by the statement. Plan gist is a fingerprint for a query plan, representing the logical query plan with items like constants removed and table names replaced with table IDs. A plan gist can be decoded to a plan. Ex: ["AgHmAQIAAgAAABgGAg=="]
  • planLat: the time to transform the AST into a logical query plan. Ex: {"mean": 0.0005751862941176471, "sqDiff": 0.00003654328361436952}
  • regions: all regions in which the statement was executed. Ex: ["us-east1"]
  • rowsRead: the amount of rows read by the statement. Ex: {"mean": 1E+3, "sqDiff": 0}
  • rowsWritten: the amount of rows written by the statement. Ex: {"mean": 0, "sqDiff": 0}
  • runLat: the time to run the statement and fetch or compute the result rows. Ex: {"mean": 0.37770100005882346, "sqDiff": 9.522893593797042}
  • svcLat: the time to service the query, from start of parse to end of execute. Ex: {"mean": 0.37828015447058816, "sqDiff": 9.518556061599973}

There is also a value for index_recommendations as a array of strings, that exists both as part of the statistics column and as its own column to improve on selects by this value. Ex: {"creation : CREATE INDEX ON insights.public.insights_workload_table_1 (balance DESC);"}

Computed columns

These are values that exist on the statistics column, but are computed again here to be their own columns. This is helpful so indexes can be created directly on these columns, which is helpful when running selects with filters on these values. All the indexes filter out app_name that start with $ internal. The definition of those values (when not described) are the same from the items with the same name listed above.

  • indexes_usage: this column have a INVERTED INDEX. Ex: ["115@1"]
  • execution_count. Ex: 17
  • service_latency. Ex: 0.37828015447058816
  • cpu_sql_nanos. Ex: 193173
  • contention_time. Ex: 0.0035941523333333334
  • p99_latency. Ex: 0.007772875
  • total_estimated_execution_time: the value os the count multiplied by the mean service latency. Ex: 6.430762625999999

Extra

There are a few other columns that don't qualify as any of the categories above.

  • agg_interval: the interval used to aggregate the executions. Ex: 01:00:00
  • plan: a sampled plan. As of 23.1+ this column is no longer populated to improve performance. The value can be populated again by enabling the cluster setting sql.metrics.statement_details.plan_collection.enabled. Ex: {"Children": [], "Name": ""} (with sampling disabled)

Transaction

The columns stored per transaction can be divided in: key values, metadata, statistics e computed columns.

Key Values

These are the values used as keys on the table and their combination will be unique for each new row.

  • aggregated_ts: the timestamp in which the statistics are being grouped. The default period is 1h, meaning all executions on 1:xx are saved with aggregated_ts of 1:00 and so on. Ex: 2024-02-06 18:00:00+00
  • fingerprint_id: the fingerprint ID of the transaction, which is generated by hashing the individual statement fingerprint IDs that comprised the transaction. Ex: \xab97426ea38ee52d
  • app_name: the application the ran the transaction. Ex: $ cockroach demo
  • node_id: the ID of the gateway node in which the statement was executed. This value will be set to 0 if we disable the collection of gateway node with ths cluster setting sql.metrics.statement_details.gateway_node.enabled. Disabling this value help customers that have a lot of nodes and will generate a lot of unique rows for the same statement fingerprint. Ex: 1

Metadata

Metadata is a JSONB column with some information about the execution.

  • stmtFingerprintIDs: a list of all statement fingerprint IDs that were part of this transaction. We limit the list size with the cluster setting sql.metrics.transaction_details.max_statement_ids (default 1000). Ex: ["ab75c017ed028514"]

Statistics

The statistics column is a JSONB that can be divided in 2 groups: executions and statistics. The majority of the numeric values on this column have its value as an object with mean and a sqDiff (square difference).

The values from execution_statistics are only populated using sampling:

  • cnt: how many times this fingerprint was sampled on the aggregation period. This is helpful to know how many execution these statistics are based on. Ex: 1
  • contentionTime: the total time that all the statements of this transaction were in contention with other statements. This time does not include the time it takes to stream results back to the client. Ex: {"mean": 0.0035941523333333334, "sqDiff": 0.0000036056329785926688}
  • cpuSQLNanos: the SQL CPU time spent by all statements of this transaction executing. It does not include SQL planning time nor KV execution time. Ex: {"mean": 193173, "sqDiff": 767794898}
  • maxDiskUsage: the maximum disk usage. Ex: {"mean": 0, "sqDiff": 0}
  • maxMemUsage: the maximum memory usage. Ex: {"mean": 2.2528E+5, "sqDiff": 0}
  • networkBytes: the amount of data transferred over the network (e.g., between regions and nodes). If this value is 0, the transaction was executed on a single node. Ex: {"mean": 0, "sqDiff": 0}
  • networkMsgs: the amount of network messages over the network. Ex: {"mean": 0, "sqDiff": 0}
  • mvccIteratorStats: statistics by the MVCC iterator. Ex: {"blockBytes": {"mean": 0, "sqDiff": 0}, "blockBytesInCache": {"mean": 0, "sqDiff": 0}, "keyBytes": {"mean": 0, "sqDiff": 0}, "pointCount": {"mean": 1006, "sqDiff": 0}, "pointsCoveredByRangeTombstones": {"mean": 0, "sqDiff": 0}, "rangeKeyContainedPoints": {"mean": 0, "sqDiff": 0}, "rangeKeyCount": {"mean": 0, "sqDiff": 0}, "rangeKeySkippedPoints": {"mean": 0, "sqDiff": 0}, "seekCount": {"mean": 1E+1, "sqDiff": 0}, "seekCountInternal": {"mean": 1E+1, "sqDiff": 0}, "stepCount": {"mean": 1E+3, "sqDiff": 0}, "stepCountInternal": {"mean": 1006, "sqDiff": 0}, "valueBytes": {"mean": 149912, "sqDiff": 0}}

The values from statistics are updated on every execution:

  • bytesRead: the aggregation of all bytes read from disk across all operators for this transaction. Ex: {"mean": 173717.41176470587, "sqDiff": 38420.117647061816}
  • cnt: amount of executions of this transaction. Notice this value will always be greater or equal the cnt on execution_statistics. Ex: 17
  • commitLat: the time to commit the transaction. Ex: {"mean": 0.0000020357142857142854, "sqDiff": 1.8496728571428566E-12}
  • idleLat: the time between statement executions of the transaction. Ex: {"mean": 0, "sqDiff": 0}
  • maxRetries: amount of retries. Ex: 0
  • numRows: number of rows affected by this transaction. Ex: {"mean": 365.70588235294116, "sqDiff": 1104041.5294117648}
  • retryLat: the time spent retrying the transaction. Ex: {"mean": 0, "sqDiff": 0}
  • rowsRead: the amount of rows read by the transaction. Ex: {"mean": 1E+3, "sqDiff": 0}
  • rowsWritten: the amount of rows written by the transaction. Ex: {"mean": 0, "sqDiff": 0}
  • svcLat: the time to service the transaction, from start of parse to end of execute. Ex: {"mean": 0.37828015447058816, "sqDiff": 9.518556061599973}

Computed columns

These are values that exist on the statistics column, but are computed again here to be their own columns. This is helpful so indexes can be created directly on these columns, which is helpful when running selects with filters on these values. All the indexes filter out app_name that start with $ internal. The definition of those values (when not described) are the same from the items with the same name listed above.

  • execution_count. Ex: 6
  • service_latency. Ex: 0.049141375
  • cpu_sql_nanos. Ex: 577084
  • contention_time. Ex: 0
  • p99_latency. Ex: 0
  • total_estimated_execution_time: the value os the count multiplied by the mean service latency. Ex: 0.049141375

Extra

There is another column that don't qualify as any of the categories above.

  • agg_interval: the interval used to aggregate the executions. Ex: 01:00:00

Footnotes

  1. A statement type can be:

    1. DDL (Data Definition Language): deals with database schemas and descriptions.
    2. DML (Data Manipulation Language) deals with data manipulation and it is used to store, modify, retrieve, delete and update data in a database.
    3. DCL (Data Control Language) deals with commands such as GRANT and mostly concerned with rights, permissions and other controls of the database system.
    4. TCL (Transaction Control Language) deals with a transaction within a database.