design/Commit/How a commit is done in FDB.md
This doc describes how commit is done in FDB 6.3+. The commit path in FDB 6.3 and before is documented in documentation/sphinx/source/read-write-path.rst.
Legend:
alt means alternative paths
[] are conditionsThe diagrams are generated using https://sequencediagram.org. The source code of the diagrams are the *.sequence files.
Before all RPCs mentioned below, the client would first verify if the commit proxies and GRV proxies are changed, by comparing the client information ID it holds to the ID the cluster coordinator holds. If they are different, the proxies are changed and the client will refresh the proxies list.
The commit proxy sends a request for commit version, with a request number.
The master server waits until the request number is current.
When the current request number is larger than the incoming request number
If a commit version is already assigned to the incoming request number, return the commit version and the previous commit version. (i.e. prevVersion)
Otherwise return Never
Increase current commit version, return it back to the commit proxy.
Only one process serves as master. Thus the commit version is unique for each cluster.
The monotonically increasing commit version will ensure that each transaction is processed in a strict serial order.
prevVersion
TransactionCommitted as the statusTransactionConflict statusTransactionTooOld statusg_traceBatchg_traceBatch can be used for querying the transactions and commits. A typical query in the trace logs is:
Type=type Location=location
The format of location is, in general, <source_file_name>.<function/actor name>.<log information>, e.g.
NativeAPI.getConsistentReadVersion.Before
means the location is at NativeAPI.actor.cpp, ACTOR getConsistentReadVersion, Before requesting the read version from GRV Proxy.
Some example queries are:
Type=TransactionDebug Location=NativeAPI*
LogGroup=loggroup Type=CommitDebug Location=Resolver.resolveBatch.*
In the following sections, <span style="color:green">green</span> tag indicates an attach; <span style="color:blue">blue</span> tag indicates an event that the location follows the format mentioned above, where only the <log information> is included; <span style="color:lightblue">light-blue</span> tag indicates an event that the location is not following the format, where the full location is included. All the g_traceBatch events are tabularized after the diagram.
contrib/commit_debug.py can be used to visualize the commit process.
| Role | File name | Function/Actor | Trace | Type | Location |
|---|---|---|---|---|---|
| Client | NativeAPI | Transaction::getReadVersion | |||
| readVersionBatcher | TransactionAttachID | ||||
| getConsistentReadVersion | Before | TransactionDebug | NativeAPI.getConsistentReadVersion.Before | ||
| GRVProxy | GrvProxyServer | queueGetReadVersionRequests | Before | TransactionDebug | GrvProxyServer.queueTransactionStartRequests.Before |
| transactionStarter | TransactionAttachID | ||||
| AskLiveCommittedVersionFromMaster | TransactionDebug | GrvProxyServer.transactionStarter.AskLiveCommittedVersionFromMaster | |||
| getLiveCommittedVersion | confirmEpochLive | TransactionDebug | GrvProxyServer.getLiveCommittedVersion.confirmEpochLive | ||
| Master | MasterServer | serveLiveCommittedVersion | GetRawCommittedVersion | TransactionDebug | MasterServer.serveLiveCommittedVersion.GetRawCommittedVersion |
| GRVProxy | GrvProxyServer | getLiveCommittedVersion | After | TransactionDebug | GrvProxyServer.getLiveCommittedVersion.After |
| Client | NativeAPI | getConsistentReadVersion | After | TransactionDebug | NativeAPI.getConsistentReadVersion.After |
| Role | File name | Function/Actor | Trace | Name | Location | Notes |
|---|---|---|---|---|---|---|
| Client | NativeAPI | Transaction::get | ||||
| Transaction::getReadVersion | (Refer to GetReadVersion) | |||||
| getKeyLocation | Before | TransactionDebug | NativeAPI.getKeyLocation.Before | getKeyLocation is called by getValue, getKeyLocation actually calls getKeyLocation_internal | ||
| After | TransactionDebug | NativeAPI.getKeyLocation.After | ||||
| getValue | GetValueAttachID | |||||
| Before | GetValueDebug | NativeAPI.getValue.Before | ||||
| Storage Server | StorageServer | serveGetValueRequests | received | GetValueDebug | StorageServer.received | |
| getValueQ | DoRead | GetValueDebug | getValueQ.DoRead | |||
| AfterVersion | GetValueDebug | getValueQ.AfterVersion | ||||
| KeyValueStoreSQLite | KeyValueStoreSQLite::Reader::action | Before | GetValueDebug | Reader.Before | ||
| After | GetValueDebug | Reader.After | ||||
| StorageServer | AfterRead | GetValueDebug | getValueQ.AfterRead | |||
| Client | NativeAPI | getValue | After | GetValueDebug | NativeAPI.getValue.After | (When successful) |
| Error | GetValueDebug | NativeAPI.getValue.Error | (When failure) |
| Role | File name | Function/Actor | Trace | Name | Location | Notes |
|---|---|---|---|---|---|---|
| Client | NativeAPI | Transaction::getRange | ||||
| Transaction::getReadVersion | (Refer to GetReadVersion) | |||||
| getKeyLocation | Before | TransactionDebug | NativeAPI.getKeyLocation.Before | getKeyLocation is called by getRange | ||
| After | TransactionDebug | NativeAPI.getKeyLocation.After | ||||
| getRange | Before | TransactionDebug | NativeAPI.getRange.Before | |||
| Storage Server | storageserver | getKeyValuesQ | Before | TransactionDebug | storageserver.getKeyValues.Before | |
| AfterVersion | TransactionDebug | storageserver.getKeyValues.AfterVersion | ||||
| AfterKeys | TransactionDebug | storageserver.getKeyValues.AfterKeys | ||||
| Send | TransactionDebug | storageserver.getKeyValues.Send | (When no keys found) | |||
| AfterReadRange | TransactionDebug | storageserver.getKeyValues.AfterReadRange | (When found keys in this SS) | |||
| Client | NativeAPI | getRange | After | TransactionDebug | NativeAPI.getRange.After | (When successful) |
| Error | TransactionDebug | NativeAPI.getRange.Error | (When failure) |
| Role | File name | Function/Actor | Trace | Type | Location | Notes |
|---|---|---|---|---|---|---|
| Client | NativeAPI | getRangeFallback | ||||
| getKey | GetKeyAttachID | |||||
| AfterVersion | GetKeyDebug | NativeAPI.getKey.AfterVersion | ||||
| Before | GetKeyDebug | NativeAPI.getKey.Before | ||||
| After | GetKeyDebug | NativeAPI.getKey.After | Success | |||
| Error | GetKeyDebug | NativeAPI.getKey.Error | Error | |||
| getReadVersion | (Refer to GetReadVersion) | |||||
| getKeyRangeLocations | Before | TransactionDebug | NativeAPI.getKeyLocations.Before | |||
| After | TransactionDebug | NativeAPI.getKeyLocations.After | ||||
| getExactRange | Before | TransactionDebug | NativeAPI.getExactRange.Before | getKeyRangeLocations is called by getExactRange | ||
| After | TransactionDebug | NativeAPI.getExactRange.After |
| Role | File name | Function/Actor | Trace | Type | Location | Notes |
|---|---|---|---|---|---|---|
| Client | NativeAPI | Transaction::commit | ||||
| commitAndWatch | ||||||
| tryCommit | commitAttachID | |||||
| Before | CommitDebug | NativeAPI.commit.Before | ||||
| Commit Proxy | CommitProxyServer | commitBatcher | batcher | CommitDebug | CommitProxyServer.batcher | |
| commitBatch | ||||||
| CommitBatchContext::setupTraceBatch | CommitAttachID | |||||
| Before | CommitDebug | CommitProxyServer.commitBatch.Before | ||||
| CommitBatchContext::preresolutionProcessing | GettingCommitVersion | CommitDebug | CommitProxyServer.commitBatch.GettingCommitVersion | |||
| GotCommitVersion | CommitDebug | CommitProxyServer.commitBatch.GotCommitVersion | ||||
| Resolver | Resolver | resolveBatch | CommitAttachID | |||
| Before | CommitDebug | Resolver.resolveBatch.Before | ||||
| AfterQueueSizeCheck | CommitDebug | Resolver.resolveBatch.AfterQueueSizeCheck | ||||
| AfterOrderer | CommitDebug | Resolver.resolveBatch.AfterOrderer | ||||
| After | CommitDebug | Resolver.resolveBatch.After | ||||
| Commit Proxy | CommitProxyServer | CommitBatchContext::postResolution | ProcessingMutations | CommitDebug | CommitProxyServer.CommitBatch.ProcessingMutations | |
| AfterStoreCommits | CommitDebug | CommitProxyServer.CommitBatch.AfterStoreCommits | ||||
| TLog | TLogServer | tLogCommit | commitAttachID | |||
| BeforeWaitForVersion | CommitDebug | TLogServer.tLogCommit.BeforeWaitForVersion | ||||
| Before | CommitDebug | TLog.tLogCommit.Before | ||||
| AfterTLogCommit | CommitDebug | TLog.tLogCommit.AfterTLogCommit | ||||
| After | CommitDebug | TLog.tLogCommit.After | ||||
| Commit Proxy | CommitProxyServer | CommitBatchContext::reply | AfterLogPush | CommitDebug | CommitProxyServer.CommitBatch.AfterLogPush | |
| Client | NativeAPI | tryCommit | After | CommitDebug | NativeAPI.commit.After | |
| commitAndWatch | ||||||
| watchValue | WatchValueAttachID | |||||
| Before | WatchValueDebug | NativeAPI.watchValue.Before | ||||
| After | WatchValueDebug | NativeAPI.watchValue.After |