doc/developer/design/20230720_single_statement_explitic_transaction.md
Many kinds of statements (importantly all DDL CREATE statements) can only be executed in an implied single-statement transaction.
These are transactions without a BEGIN (hence implied) and known to be exactly one statement long.
DDLs are disallowed because we cannot prospectively execute them until a commit.
Many GUI and other tools however always wrap their statements with BEGIN and COMMIT,
thus creating an explicit transaction, and are thus prevented from ever executing DDL or other prohibited statements.
BEGIN; <stmt>; COMMIT.Add a new transaction op that holds onto a single Statement but does not execute it.
The statements tag is generated and returned as if it did execute.
We can thus only execute statements that can generate their tag from a Statement or Plan without execution.
If any statement attempts to execute after that, the op will fail to add it, thus enforcing a single statement in the transaction.
On COMMIT, the statement executes, returning any errors as normal.
Many statements cannot be executed without observable side effects to other sessions, for example all DDLs.
However, these and many other statements do have a known ExecuteResponse that can be determined without execution (as opposed to statements that return a row count).
We will add a new transaction operation mode called SingleStatement.
Currently those modes are Peeks, Writes, and Subscribe.
SingleStatement will record the statement, then generate and return to the user an ExecuteResponse that assummes success.
The SingleStatement operation can only be entered in explicit transactions (must start with BEGIN).
On COMMIT, the coordinator clears the current transaction, fetching and processing its inner operation (this already occurs before this document).
If the inner operation is SingleStatement, the transaction status is set to Started (a status that can execute any statement), the recorded statement is executed, and an implicit COMMIT is executed.
The implicit COMMIT puts the transaction into the same final state (Default) as the explicit COMMIT.
Here we describe the state machine and invariants for a single session's transaction before this design was implemented. We will then describe how this design modifies that state machine but still meets all invariants.
Each Session has a tranasction property of type enum TransactionStatus.
It is the responsibility of the various protocol handlers (pgwire, http, ws) to correctly manage this.
A TransactionStatus is an enum with variants:
DefaultStartedInTransactionInTransactionImplicitFailedDefault is the initial state, and also the state when there is no in-progress transaction.
In order for a session to execute a statement, the protocol handler must set this to one of the other states then end the transaction with a commit or rollback.
Started is used for a single statement, so an immediate and implied commit follows.
InTransaction is used when a user explicitly types BEGIN, and can only be exited when a user sends COMMIT or ROLLBACK.
InTransactionImplicit is used when a user sends multiple statements in the same query (SELECT 1; SELECT 2), and an implicit BEGIN and COMMIT wrap it.
Failed can only be entered when a statement in a transaction has error'd from the InTransaction state.
If failures occur in the other in-progress states, an implicit ROLLBACK must be sent by the handler.
All status except Default have an inner Transaction object that tracks what operations have happened in a transaction determined by the first statement.
These operations are: Peeks, Subscribe, Writes, None.
None is the default and can transition to any other operation.
The other operations can be merged with like operations, except Subscribe which supports only a single SUBSCRIBE.
When a transaction is being COMMITed, the transaction is cleared (set to Default) and its inner operation is able to do arbitrary post processing.
Peeks can enqueue the session back into the Coordinator in order to wait for a strict serializable verification.
Writes enqueues the writes along with their session in order to wait for writing to the storage controller.
The ReadyForQuery message is sent by the server when it is ready to receive new queries, and it includes the current transaction status code: Idle, InTransaction, or Failed.
InTransaction and Failed correspond to the similarly named variants above and can only appear in explicit transactions.
Idle is return in the Default state, because all other transaction states implicitly close and should never be active when deciding the status code.
The pgwire, http, and ws handlers enforce these requirements and state changes.
This design adds a new operation to the inner transaction: SingleStatement.
This operation records a single statement.
If a second statement is added to the transaction with this operation, the transaction fails.
The post processing for this operation enqueues to the Coordinator a message containing the statement and session, which is processed by a new sequence_execute_single_statement_transaction method.
This method:
Default, which should be a side effect of running it through sequence_end_transaction from the COMMIT.Started (single-statement, implicit transaction).Default.The sequence_execute_single_statement_transaction function is acting as a handler and must ensure the correct state transitions and invariants as the other handlers.
Default.BEGIN from the query of the previous step. Transaction status changes to InTransaction.SingleStatement.COMMIT. The transaction status is set to Default and its inner operation saved.SingleStatement enqueues a Coordinator message with the statement and session.Started.COMMIT is executed. The transaction status is set to Default and its inner operation saved.COMMIT statement from step 5, and control of the session is returned to the user.The simple query protocol allows passing multiple statements in a single query string.
We could parse that and examine it for the BEGIN; <stmt>; COMMIT form.
We could then execute any statement instead of only ones whose tag we can generate without execution.
This is limited because it would not work for: