Feature Name: txn-gc
Status: completed
Start Date: 2015-11-12
RFC PR: #3100
Cockroach Issue: #2062

Summary

rename {Response->Sequence}Cache, Transaction{Table->Cache}.
store client's intents in Txn record on (successful) EndTransaction.
GC transaction records asynchronously following the client's EndTransaction call after successful resolution of (possibly) outstanding intents.
GC sequence cache entries with ResolveIntent{,Range} which are carried out as a consequence of client's EndTransaction (successful or not).
Check the sequence cache on non-writes as well and store the epoch along with the sequence (to allow for the next point):
Poison the sequence cache when aborting an intent after Push during ResolveIntent{,Range}, preventing the anomaly in #2231 (own writes vanishing).
Keep the "slow" gc queue which cleans up in case of abandoned txns or node crashes.

Motivation

Both transaction and sequence cache records should be deleted when they're no longer useful, ideally without introducing extra work. The procedure outlined here accomplishes that in the vast majority of cases (including all non-abandoned transactions).

Detailed design

Refresher:

there is exactly one txn record on a single range (the one that the transaction is initiated on via BeginTransaction).
there is exactly one sequence cache entry for each Range mutated by the txn with a sequence counter (mutation with non-increasing sequence number triggers a txn restart).
in the absence of a transaction record, a Push always aborts the transaction.

GCing Txn Records

The above means that we can always garbage collect aborted transactions with only a best-effort attempt to clean up their intents (but we'll do it only after the client's EndTransaction or, if that never happens, the "slow way" via the GC queue; see below). For committed transactions, we must guarantee that no open intents exist before deleting the entry (we already synchronously resolve all intents local to the transaction record and GC the record right away if no external intents exist). The straightforward solution is to have EndTransaction persist the external intents on the transaction record and let the goroutine which resolves them asynchronously do a little more work: after successfully carrying out the batch worth of ResolveIntent, it can delete the corresponding txn record. All of this is best effort: we're still going to have a gc queue which walks over old transaction entries, poking old transactions and retrying their intent resolution for the .0001% of transactions which are left hanging.

GCing Sequence Cache Entries

it's safe to remove a sequence cache entry when we know the transaction isn't running any more. That's after the client's EndTransaction gets executed (regardless of its outcome), but not when a concurrent transaction manages to abort it by means of a Push.
there are sequence cache entries on any range mutated by the transaction, and we'll be sending ResolveIntent there anyway, so we simply make clearing (idempotently) the sequence cache entry a side effect of a ResolveIntent{,Range} (when it's carried out as part of EndTransaction).

Intentional Sequence Cache Poisoning

on ResolveIntent triggered through an aborting Push, we can actually deal with #2231 nicely. The issue there is that a running transaction may not know that it's been aborted already, which leads to anomalies related to the fact that its intents may be gone (so it may not read what it wrote). The key, again, is ResolveIntent{,Range}:

store not only the sequence number, but also the minimal expected epoch.
(epoch, seq) < (epoch', seq') iff epoch < epoch' || (epoch == epoch' && seq < seq').
upon aborting an intent after a Push, we simply poison the sequence cache on that range (setting sequence=math.MaxInt64). Assuming that we check the sequence cache on every batch (not only for writes), we trigger a transaction restart should the transaction come back to the Range. If checking the sequence cache on reads shows up in performance considerations, there are going to be ways to avoid disk I/O in most cases. The retry increases the epoch, so when the txn comes back, it will be able to perform normally.

Interaction with Splits and Merges

On both Split and Merge we'll copy the entry (keeping the larger one on collision).

"Slow" GC Path

The slow path to sequence cache GC takes place in the following situations:

a transaction is abandoned (so a list of intents is never sent by the client)
a sequence cache entry is duplicated during Split to a Range not touched by ResolveIntent{,Range} for its transaction.

In the same queue which grooms the transaction cache, we'll also groom the local sequence cache with the goal of finding "inactive" entries, pinging their transaction and removing according to the outcome. To be able to do that, we need to persist more information into the response cache key:

Txn.Key (to get the range which holds the transaction entry, but we only have Txn.ID from the sequence cache entry), and
Txn.Timestamp (to figure out whether this is "probably" an inactive transaction)

Some of the additional overhead could be avoided if transaction IDs encoded some of that information. For example, instead of UUID4 transaction IDs we could adopt the scheme <hlc_wall=64bit,hlc_logical=32bit><random=32bit>, but the entropy is considerably lower. This is out of scope for this RFC.

Drawbacks

Possibly checking the sequence cache on reads can show up in performance tuning (not necessarily expected though), in which case some extra caching should do the trick to avoid I/O.

Likewise, deleting the txn entries may need some batching up for performance (to save Raft proposals; again straightforward to do).

Alternatives

The original design proposed keeping track of the cluster-wide oldest intent's timestamp, which would allow all txn entries older than that timestamp to be GC'ed. The mechanism with its global characteristics doesn't seem preferable to the one outlined above (especially since little complexity and no significant performance hits or new RPCs are introduced there) and does not immediately solve #2231.

Summary

Summary

Motivation

Detailed design

GCing Txn Records

GCing Sequence Cache Entries

Intentional Sequence Cache Poisoning

Interaction with Splits and Merges

"Slow" GC Path

Drawbacks

Alternatives

Unresolved questions