website/docs/dev/transactions.md
Garnet supports two types of transactions:
Custom transactions allows adding a new transaction and registering it with Garnet on the server side. This registered transaction can then be invoked from any Garnet client to perform a transaction on the Garnet server. Read more on developing custom server side transactions in the Transactions page under the Extensions section.
You can read more here: Redis Transactions. In this design, transaction operations come in a MULTI/EXEC scope. Every operation in this scope is part of the transaction. The model does not allow you to use the result of reads inside the MULTI/EXEC scope but allows you to read and monitor keys before (i.e., watch), and if they are unchanged at the time of execution, the transaction will commit.
WATCH mykey
val = GET mykey
val = val + 1 # not Redis command this happens outside
MULTI
SET mykey $val
EXEC
In the above example, if mykey changes before EXEC command, the transaction will abort since the calculation of val is invalidated.
Transactions in Garnet are implemented using the following classes:
TransactionManagerWatchVersionMapWatchedKeyContainerRespCommandsInfoMULTI command, TxnManager will queue any command in this state except EXECWhen TxnManager goes to Started state, it will (1) queue any command afterward and (2) save any key that is used in those commands to lock at the execution time using 2PL.
In order to queue commands, they are let to live in the network buffer. Using the TrySkip function in RespServerSession. To lock the keys at the time of execution, we save pointers to the actual memory location of keys in the network buffer using an array of TxnKeyEntry that has an ArgSlice and the lock type (Shared or Exclusive).
TrySkip function uses RespCommandsInfo class to skip the correct number of tokens and detects syntax errors. RespCommandsinfo stores the number of Arity or arguments of each command. E.g., the GET command's arity is two. The command token GET and one key. We store the minimum number of arguments with a negative value for the commands that can have multiple arguments. SET command's arity is -3 means that it requires at least three arguments (including command toke).
During the TrySkip we call TransactionManager.GetKeys, which goes over the arguments and stores TxnKeyEntry for each key in the arguments.
When the the TxnState is Started and we encounter the EXEC we call TransactionManager.Run(). What this functions does:
LockableContext for the main store and/or object store based on the store type.TxnKeyEntrys and locks all the needed keys.WatchedKeyContainer.ValidateWatchVersion()
TransactionManager.Reset(true) to reset the transaction manager. The true argument we pass to Reset says that it also needs to unlock the keys.After that, the TxnState is set to Running and the network readHead is set to the first command after MULTI, and this time we start actually running those commands. When the execution reaches to EXEC again, and we are in Running state, it calls TransactionManager.Commit(). What it does:
RunTransactionManager and WatchedKeyContainerGarnet does regular checkpoints and changes its version between those checkpoints. In order to get checkpoint consistency, we require transaction operations to have the same version or in other words be in the same checkpoint window.
To enforce this right now, we do the following:
Prepare phase, we do not let a transaction start execution to let checkpoint finishPrepare we don't let version change happen until the transaction finishes the execution.session.IsInPreparePhase and two while loop at the beginning of Run functionIt is used to implement optimistic locking.
Modified bit in TsavoriteKV and a VersionMap in GarnetIt Monitors modifications on the keys. Every time a watched key gets modified, we increment its version in the version map.
Hash IndexFor in-memory records, we only increment version watched keys. The keys that are watched in Garnet use the Modified bit in Tsavorite to track modification (more on Modified bit Below)
For records in the disk, we increment the version for copy-update RMWs and Upserts. We intentionally accept this overhead because copy updates are less often, and the overhead is not crucial.
Increment the version in MainStoreFunctions and ObjectStoreFunctions:
InPlaceUpdater if it is watchedConcurrentWriter if it is watchedConcurrentDeleter if it is watchedPostSingleWriterPostInitialUpdaterPostCopyUpdaterPostSingleDeleterThe Modified bit tracks modifications in records in Tsavorite. The modified bit for each record gets set to "1" when they get modified and Remains "1" until somebody Reset it to zero using the ResetModified API.
ClientSesssion.ResetModified(ref Key key) API.
RecordInfo word into the same word, but with the modified bit reset.ResetModified API and store that key in WatchedKeyContainer.WatchedKeyContainer.WatchedKeyContainer and if their version is still the same, we proceed with the transactionsUnwatch API in Garnet we simply just reset the WatchedKeyContainerDISCARD, EXEC, UNWATCH command we unwatch everythingWe have written a micro-benchmark TxnPerfBench to test client transactions. The benchmark contains four different workloads:
It looks like the online benchmark, and can have different percentages of different workloads:
dotnet run -c Release -t 2 -b 1 --dbsize 1024 -x --client SERedis --op-workload WATCH_TXN --op-percent 100
dotnet run -c Release -t 2 -b 1 --dbsize 1024 -x --client SERedis --op-workload READ_TXN,WRITE_TXN --op-percent 50,50
dotnet run -c Release -t 2 -b 1 --dbsize 1024 -x --client SERedis --op-workload READ_WRITE_TXN --op-percent 100
Before running the benchmark, we load data with opts.DbSize number of records. It also accepts the number of reads and writes per transaction:
TxnPerfBench(..., int readPerTxn = 4, int writePerTxn = 4)
Runs a transaction with readPerTxn number of GET requests;
Runs a transaction with writePerTxn number of SET requests;
Runs a mix of SET and GET request (readPerTxn, writePerTxn)
This workload watches readPerTxn number of keys. Then starts a transaction, reads the watched keys, and writes to writePerTxn number of keys.
readPerTxn = 2
writePerTxn = 2
WATCH x1
WATCH x2
MULTI
GET x1
GET x2
SET x3 v3
SET x4 v4
EXEC