Back to Foundationdb

How a commit is done in FDB

design/Commit/How a commit is done in FDB.md

7.4.630.2 KB
Original Source

How a commit is done in FDB

This doc describes how commit is done in FDB 6.3+. The commit path in FDB 6.3 and before is documented in documentation/sphinx/source/read-write-path.rst.

Overall description

Legend:

  • alt means alternative paths
    • The texts in [] are conditions
    • The texts above the arrow are messages.

The diagrams are generated using https://sequencediagram.org. The source code of the diagrams are the *.sequence files.

Description of each sections

Before all RPCs mentioned below, the client would first verify if the commit proxies and GRV proxies are changed, by comparing the client information ID it holds to the ID the cluster coordinator holds. If they are different, the proxies are changed and the client will refresh the proxies list.

GetReadVersion Section

  • The GRV Proxy sends a request to master to retrieve the current commit version. This version is the read version of the request.

Preresolution Section

  • The commit proxy sends a request for commit version, with a request number.

    • The request number is a monotonically increasing number per commit proxy.
    • This ensures for each proxy, the master will process the requests in order.
  • The master server waits until the request number is current.

    When the current request number is larger than the incoming request number

    • If a commit version is already assigned to the incoming request number, return the commit version and the previous commit version. (i.e. prevVersion)

    • Otherwise return Never

    • Increase current commit version, return it back to the commit proxy.

      • Only one process serves as master. Thus the commit version is unique for each cluster.

      • The monotonically increasing commit version will ensure that each transaction is processed in a strict serial order.

Resolution section

  • The commit proxy sends the transaction to the resolver.
  • Resolver waits until its version reaches prevVersion
    • Ensures all transactions having version smaller than this transaction are resolved.
    • Detects conflicts for the given transaction:
      • If there is no conflict, return TransactionCommitted as the status
      • Any conflict, return TransactionConflict status
      • If the read snapshot is not in MVCC, return TransactionTooOld status

Post Resolution section

  • The proxy waits until the local batch number is current
  • The proxy updates the metadata keys and attaches corresponding storage servers' tags to all mutations.
  • The proxy then waits until the commit version is current, i.e. the proxy's committed version is catching up with the commit version of the batch and these two versions are within the MVCC window.
  • The proxy pushes the commit data to TLogs.
  • TLog waits the commit version to be current, then persists the commit.
  • Wait until all TLogs return the transaction result.

Reply section

  • The proxy updates the master with the committed version for next GRV request at the master.
  • Reply the result to the client, base on the result from the resolver.

Tracking the process using g_traceBatch

g_traceBatch can be used for querying the transactions and commits. A typical query in the trace logs is:

Type=type Location=location

The format of location is, in general, <source_file_name>.<function/actor name>.<log information>, e.g.

NativeAPI.getConsistentReadVersion.Before

means the location is at NativeAPI.actor.cpp, ACTOR getConsistentReadVersion, Before requesting the read version from GRV Proxy.

Some example queries are:

Type=TransactionDebug Location=NativeAPI*
LogGroup=loggroup Type=CommitDebug Location=Resolver.resolveBatch.*

In the following sections, <span style="color:green">green</span> tag indicates an attach; <span style="color:blue">blue</span> tag indicates an event that the location follows the format mentioned above, where only the <log information> is included; <span style="color:lightblue">light-blue</span> tag indicates an event that the location is not following the format, where the full location is included. All the g_traceBatch events are tabularized after the diagram.

contrib/commit_debug.py can be used to visualize the commit process.

Get Read Version

RoleFile nameFunction/ActorTraceTypeLocation
ClientNativeAPITransaction::getReadVersion
readVersionBatcherTransactionAttachID
getConsistentReadVersionBeforeTransactionDebugNativeAPI.getConsistentReadVersion.Before
GRVProxyGrvProxyServerqueueGetReadVersionRequestsBeforeTransactionDebugGrvProxyServer.queueTransactionStartRequests.Before
transactionStarterTransactionAttachID
AskLiveCommittedVersionFromMasterTransactionDebugGrvProxyServer.transactionStarter.AskLiveCommittedVersionFromMaster
getLiveCommittedVersionconfirmEpochLiveTransactionDebugGrvProxyServer.getLiveCommittedVersion.confirmEpochLive
MasterMasterServerserveLiveCommittedVersionGetRawCommittedVersionTransactionDebugMasterServer.serveLiveCommittedVersion.GetRawCommittedVersion
GRVProxyGrvProxyServergetLiveCommittedVersionAfterTransactionDebugGrvProxyServer.getLiveCommittedVersion.After
ClientNativeAPIgetConsistentReadVersionAfterTransactionDebugNativeAPI.getConsistentReadVersion.After

Get

RoleFile nameFunction/ActorTraceNameLocationNotes
ClientNativeAPITransaction::get
Transaction::getReadVersion(Refer to GetReadVersion)
getKeyLocationBeforeTransactionDebugNativeAPI.getKeyLocation.BeforegetKeyLocation is called by getValue, getKeyLocation actually calls getKeyLocation_internal
AfterTransactionDebugNativeAPI.getKeyLocation.After
getValueGetValueAttachID
BeforeGetValueDebugNativeAPI.getValue.Before
Storage ServerStorageServerserveGetValueRequestsreceivedGetValueDebugStorageServer.received
getValueQDoReadGetValueDebuggetValueQ.DoRead
AfterVersionGetValueDebuggetValueQ.AfterVersion
KeyValueStoreSQLiteKeyValueStoreSQLite::Reader::actionBeforeGetValueDebugReader.Before
AfterGetValueDebugReader.After
StorageServerAfterReadGetValueDebuggetValueQ.AfterRead
ClientNativeAPIgetValueAfterGetValueDebugNativeAPI.getValue.After(When successful)
ErrorGetValueDebugNativeAPI.getValue.Error(When failure)

Get Range

RoleFile nameFunction/ActorTraceNameLocationNotes
ClientNativeAPITransaction::getRange
Transaction::getReadVersion(Refer to GetReadVersion)
getKeyLocationBeforeTransactionDebugNativeAPI.getKeyLocation.BeforegetKeyLocation is called by getRange
AfterTransactionDebugNativeAPI.getKeyLocation.After
getRangeBeforeTransactionDebugNativeAPI.getRange.Before
Storage ServerstorageservergetKeyValuesQBeforeTransactionDebugstorageserver.getKeyValues.Before
AfterVersionTransactionDebugstorageserver.getKeyValues.AfterVersion
AfterKeysTransactionDebugstorageserver.getKeyValues.AfterKeys
SendTransactionDebugstorageserver.getKeyValues.Send(When no keys found)
AfterReadRangeTransactionDebugstorageserver.getKeyValues.AfterReadRange(When found keys in this SS)
ClientNativeAPIgetRangeAfterTransactionDebugNativeAPI.getRange.After(When successful)
ErrorTransactionDebugNativeAPI.getRange.Error(When failure)

GetRange Fallback

RoleFile nameFunction/ActorTraceTypeLocationNotes
ClientNativeAPIgetRangeFallback
getKeyGetKeyAttachID
AfterVersionGetKeyDebugNativeAPI.getKey.AfterVersion
BeforeGetKeyDebugNativeAPI.getKey.Before
AfterGetKeyDebugNativeAPI.getKey.AfterSuccess
ErrorGetKeyDebugNativeAPI.getKey.ErrorError
getReadVersion(Refer to GetReadVersion)
getKeyRangeLocationsBeforeTransactionDebugNativeAPI.getKeyLocations.Before
AfterTransactionDebugNativeAPI.getKeyLocations.After
getExactRangeBeforeTransactionDebugNativeAPI.getExactRange.BeforegetKeyRangeLocations is called by getExactRange
AfterTransactionDebugNativeAPI.getExactRange.After

Commit

RoleFile nameFunction/ActorTraceTypeLocationNotes
ClientNativeAPITransaction::commit
commitAndWatch
tryCommitcommitAttachID
BeforeCommitDebugNativeAPI.commit.Before
Commit ProxyCommitProxyServercommitBatcherbatcherCommitDebugCommitProxyServer.batcher
commitBatch
CommitBatchContext::setupTraceBatchCommitAttachID
BeforeCommitDebugCommitProxyServer.commitBatch.Before
CommitBatchContext::preresolutionProcessingGettingCommitVersionCommitDebugCommitProxyServer.commitBatch.GettingCommitVersion
GotCommitVersionCommitDebugCommitProxyServer.commitBatch.GotCommitVersion
ResolverResolverresolveBatchCommitAttachID
BeforeCommitDebugResolver.resolveBatch.Before
AfterQueueSizeCheckCommitDebugResolver.resolveBatch.AfterQueueSizeCheck
AfterOrdererCommitDebugResolver.resolveBatch.AfterOrderer
AfterCommitDebugResolver.resolveBatch.After
Commit ProxyCommitProxyServerCommitBatchContext::postResolutionProcessingMutationsCommitDebugCommitProxyServer.CommitBatch.ProcessingMutations
AfterStoreCommitsCommitDebugCommitProxyServer.CommitBatch.AfterStoreCommits
TLogTLogServertLogCommitcommitAttachID
BeforeWaitForVersionCommitDebugTLogServer.tLogCommit.BeforeWaitForVersion
BeforeCommitDebugTLog.tLogCommit.Before
AfterTLogCommitCommitDebugTLog.tLogCommit.AfterTLogCommit
AfterCommitDebugTLog.tLogCommit.After
Commit ProxyCommitProxyServerCommitBatchContext::replyAfterLogPushCommitDebugCommitProxyServer.CommitBatch.AfterLogPush
ClientNativeAPItryCommitAfterCommitDebugNativeAPI.commit.After
commitAndWatch
watchValueWatchValueAttachID
BeforeWatchValueDebugNativeAPI.watchValue.Before
AfterWatchValueDebugNativeAPI.watchValue.After