docs/RFCS/20170505_monitoring_queries_and_sessions.md
This feature would add a mechanism to list currently active sessions and currently executing queries
in the admin UI and sql shell, along with their start timestamps / durations. In this document,
queries are defined as every executable statement; not just SELECTs. Everything that is converted
into a planNode is a query.
This RFC answers these use-cases:
The following concerns are out of scope for this RFC:
Currently, there's no visibility into what queries/sessions are running on the cluster at any given point of time. Adding visibility into this has been a common customer request for a while.
Albeit cancellation is out of scope for this RFC, being able to view sessions and queries along with information like application names, duration, etc, would be important information for DBAs to have, in order to make an appropriate cancellation decision (when that is implemented).
Since some DB deployments can be shared between multiple client applications, some level of functionality to filter sessions and queries by client application names, both in the admin UI and in the CLI, would also be useful. And since not all DBAs are familiar with SQL, a non-SQL interface to monitor queries and sessions would be an asset.
The inspiration behind some of the design comes from this PR from Raphael which implemented node-local session registries exposed via a virtual table.
The session struct is currently not thread-safe; since all operations on it
currently happen in one thread. Making it thread-safe involves making a nested struct
inside it (called mu), similar to how the Txn struct is structured. There
would be an RWMutex on this nested mu struct.
These fields would be moved from the main Session struct, to the mu sub-struct:
These two new fields will be added to mu:
This active query field will be a map of QueryMeta pointers (type map[*queryMeta]struct{}).
We already have parallelizeQueue of a map type with planNode as a key, but because of the way
parallelization code is currently written, adding non-parallel queries to parallelizeQueue
would be non-trivial. So, a new map containing all queries in flight and their metadata would be
a conceptually simpler solution.
Write locks on this struct will be acquired when:
resetApplicationName())Furthermore, the txnState struct within Session will be updated to have a mu
locked struct within it. txnState.txn will be moved into it, to become txnState.mu.txn,
and updates to that variable in Executor.execRequest and reads from SHOW SESSIONS will require
acquiring that mutex.
The session registry will be an in-memory set (type map[*session]struct{}) that store
references to all session objects on that node. Addition and deletion of new
sessions to this set would require acquiring a mutex lock.
This addition/deletion into the set can be made in NewSession(...) and Session.Finish(...)
in pkg/sql/session.go.
A new struct, QueryMeta, would be made that contains the below fields:
phase denoting whether the query is being prepared for execution, or is executing.QueryMetas would be stored in a map in Session.mu, with the map having type map[*queryMeta]struct{}.
The planner struct will also have a reference to the corresponding queryMeta.
A new RPC endpoint would be made: LocalSessions. This would return the node's local
list of sessions and any nested lists of queries as necessary. If a username
is provided in the request, only sessions belonging to that user must be returned.
This endpoint would be added to the status RPC server.
For sessions, a new SQL statement will be added to the grammar: SHOW [LOCAL|CLUSTER] SESSIONS.
SHOW LOCAL SESSIONS will return the gateway node's local list of sessions only, while
SHOW CLUSTER SESSIONS will return the entire cluster's list of sessions. Default is CLUSTER
if unspecified.
The CLUSTER call will involve looping through all nodes and issue RPC calls to LocalSessions
with timeouts. The results of all the calls that do succeed, will be returned as a table.
RaftDebug in pkg/server/status.go is an example of a procedure that does a similar aggregation
of results across all known nodes in the cluster.
Example usage:
root@:26257/> SHOW SESSIONS
+---------+----------+------------------+------------------+------------------------------------------------------+---------------+--------------------+-----------+
| node_id | username | client_address | application_name | active_queries | session_start | oldest_query_start | kv_txn_id |
+---------+----------+------------------+------------------+------------------------------------------------------+---------------+--------------------+-----------+
| 1 | root | 192.168.1.5:8080 | test | SELECT * FROM users; INSERT INTO test VALUES (...); | 1234567890 | 1234567891 | 1e5f6bac |
| 2 | bilal | 192.168.1.7:8080 | pg.js | | 1494524202 | | |
+---------+----------+------------------+------------------+------------------------------------------------------+---------------+--------------------+-----------+
(2 rows)
Note that the active_queries field is a delimiter-separated list of all queries currently
executing under that session, with SQL parallelization taken into account.
Additions to the SQL grammar will be made to support filtering using SELECT statements and nesting of SHOWs as tables:
root@:26257/> SELECT * FROM (SHOW SESSIONS) AS a WHERE a.user_name = 'bilal';
+---------+-----------+------------------+------------------+------------------------------------------------------+---------------+--------------------+-----------+
| node_id | user_name | client_address | application_name | active_queries | session_start | oldest_query_start | kv_txn_id |
+---------+-----------+------------------+------------------+------------------------------------------------------+---------------+--------------------+-----------+
| 2 | bilal | 192.168.1.7:8080 | pg.js | | 1494524202 | | |
+---------+-----------+------------------+------------------+------------------------------------------------------+---------------+--------------------+-----------+
(1 row)
Similarly, SHOW [CLUSTER|LOCAL] QUERIES will return currently executing queries.
Example usage:
root@:26257/> SHOW QUERIES
+---------+----------+------------+--------------------------------+------------------+------------------+------------------+------------+
| node_id | username | start | query | client_address | application_name | distributed | phase |
+---------+----------+------------+--------------------------------+------------------+------------------+------------------+------------+
| 1 | root | 1234567894 | SELECT * FROM users; | 192.168.1.5:8080 | test | NULL | preparing |
| 1 | root | 1234567891 | INSERT INTO test VALUES (...); | 192.168.1.5:8080 | test | true | executing |
+---------+----------+------------+--------------------------------+------------------+------------------+------------------+------------+
(1 row)
Currently, we only let the root user run SET CLUSTER SETTING. Following that convention,
SHOW SESSIONS/QUERIES will only show the current user's sessions/queries if the user
is not root, or all users if the user is root.
The RWMutex lock inside the Session object could see contention if a SHOW SESSIONS or SHOW QUERIES
is running in parallel to an operation within the session that updates any of the fields inside
session.mu. However there should be little to no performance penalty in the common use-cases of
queries and sessions running without being inspected by a SHOW command.
To ensure that we don't unintentionally cause a performance regression, the included benchmarks
in pkg/sql/bench_test.go will be run before and after the change, and the results posted in the PR
for this feature.
In a production-scale cluster, running SHOW SESSIONS will return a large number of rows - which
will all be aggregated on the gateway node in memory before any filtering is done. This can be reduced
by issuing RPCs to other nodes in batches (so the gateway node doesn't get too many results all at
once) , and then sending off the results to the client connection or parent planNode just as a
LocalSessions call returns. This way, only a subset of all sessions would be in memory
on the gateway node at once; however, the gateway node won't be able to enforce a default
sort order. This solution will also not make a significant difference if there's a parent
filter planNode (eg. SELECT * FROM (SHOW SESSIONS) WHERE ...) - the gateway node
will have to buffer all results in memory anyway.
The memory cost of these SHOW commands will be assessed in a large cluster with generated load before this feature lands, to gauge the value of implementing streaming.
We considered using virtual tables in the crdb_internal database to expose sessions and queries,
however that idea was dropped in favour of a SHOW statement for a couple reasons.
The generator functions for virtual tables (which are run every time a query to that table is made),
are not aware of filtering expressions in the query being made; so even a query with a WHERE nodeid=self
clause would necessitate aggregating info from the entire cluster. A full refactor of the vtable code was
deemed to be out of scope for this project.
In addition, virtual tables give the user the intuition that the data is persistent and cached/stored locally, which it isn't. If a node in the cluster dies, the virtual table would have partial info. A SHOW statement sets a more accurate expectation; that the sessions/queries are fetched upon request.
See the list of unanswered questions that are out of scope for this RFC, in the Summary section above.
Any unanswered questions within this RFC's scope that pop up, will be added here.