docs/src/format/table/transaction.md
Lance implements Multi-Version Concurrency Control (MVCC) to provide ACID transaction guarantees for concurrent readers and writers. Each commit creates a new immutable table version through atomic storage operations. All table versions form a serializable history, enabling features such as time travel and schema evolution.
Transactions are the fundamental unit of change in Lance. A transaction describes a set of modifications to be applied atomically to create a new table version. The transaction model supports concurrent writes through optimistic concurrency control with automatic conflict resolution.
Lance commits rely on atomic write operations provided by the underlying object store:
These primitives guarantee that exactly one writer succeeds when multiple writers attempt to create the same manifest file concurrently.
Lance supports two manifest naming schemes:
{version}.manifest - Monotonically increasing version numbers (e.g., 1.manifest, 2.manifest){u64::MAX - version:020}.manifest - Reverse-sorted lexicographic ordering (e.g., 18446744073709551614.manifest for version 1)The V2 scheme enables efficient discovery of the latest version through lexicographic object listing.
Transaction files store the serialized transaction protobuf message for each commit attempt. These files serve two purposes:
The commit process attempts to atomically write a new manifest file using the storage primitives described above. When concurrent writers conflict, the system loads transaction files to detect conflicts and attempts to rebase the transaction if possible. If the atomic commit fails, the process retries with updated transaction state. For detailed conflict detection and resolution mechanisms, see the Conflict Resolution section.
The authoritative specification for transaction types is defined in protos/transaction.proto.
Each transaction contains a read_version field indicating the table version from which the transaction was built,
a uuid field uniquely identifying the transaction, and an operation field specifying one of the following transaction types:
In the following section, we will describe each transaction type and its compatibility with other transaction types. This compatibility is not always bi-directional. We are describing it from the perspective of the operation being committed. For example, we say that an Append is not compatible with an Overwrite which means that if we are trying to commit an Append, and an Overwrite has already been committed (since we started the Append), then the Append will fail. On the other hand, when describing the Overwrite operation, we say that it does not conflict with Append. This is because, if we are trying to commit an Overwrite, and an Append operation has occurred in the meantime, we still allow the Overwrite to proceed.
Adds new fragments to the table without modifying existing data. Fragment IDs are not assigned at transaction creation time; they are assigned during manifest construction.
<details> <summary>Append protobuf message</summary>%%% proto.message.Append %%%
The append operation is one of the most common operations and is designed to be compatible with most other operations, even itself. This is to ensure that multiple writers can append without worry about conflicts. These are the operations that conflict with append:
Marks rows as deleted using deletion vectors.
May update fragments (adding deletion vectors) or delete entire fragments.
The predicate field stores the deletion condition, enabling conflict detection with concurrent transactions.
%%% proto.message.Delete %%%
Delete modifies an existing fragment, so there may be conflicts with other operations on overlapping fragments. Generally these conflicts are rebaseable or retryable.
These are the operations that conflict with delete:
These operations conflict with delete but can be retried:
These operations conflict with delete but can potentially be rebased. The deletion masks from the two operations will be merged. However, if both operations modified the same rows, then the conflict becomes a retryable conflict.
Creates or completely overwrites the table with new data, schema, and configuration.
<details> <summary>Overwrite protobuf message</summary>%%% proto.message.Overwrite %%%
An overwrite operation completely overwrites the table. Generally, we do not care what has happened since the read version.
However, the overwrite does not necessarily rewrite the table config. As a result, we consider the following to be retryable conflicts:
Adds, replaces, or removes secondary indices (vector indices, scalar indices, full-text search indices).
<details> <summary>CreateIndex protobuf message</summary>%%% proto.message.CreateIndex %%%
Indexes record which fragments are covered by the index and we don't require all fragments be covered. As a result, it is typically ok for an index to be created concurrently with the addition of new fragments. These new fragments will simply be unindexed.
Updates and deletes are also compatible with index creation. This is because it is ok for an index to refer to deleted rows. Those results will be filtered out after the index search. If an update occurs then the old value will be filtered out and the new value will be considered part of the unindexed set.
If two CreateIndex operations are committed concurrently then it is allowed. If the indexes have different names this is no problem. If the indexes have the same name then the second operation will win and replace the first.
These operations conflict with index creation:
Data replacement operations will conflict with index creation if the column being replaced is being indexed. Rewrite operations will conflict with index creation if the rewritten fragments are covered by the index. This is because an index refers to row addresses and the rewrite operation changes the row addresses. However, if a fragment reuse index is being used, or if the stable row ids feature is enable, then the rewrite operation is compatible with index creation. As a result, these are the operations that are retryable conflicts with index creation:
Some indices are special singleton indices. For example, the fragment reuse index and the mem wal index. If a conflict occurs between two operations that are modifying the same singleton index, then we must rebase the operation and merge the indexes. As a result, these are the operations that are rebaseable conflicts with index creation:
Reorganizes data without semantic modification.
This includes operations such as compaction, defragmentation, and re-ordering.
Rewrite operations change row addresses, requiring index updates.
New fragment IDs must be reserved via ReserveFragments before executing a Rewrite transaction.
%%% proto.message.Rewrite %%%
Rewrite operations do not change data but they can materialize deletions and they do replace fragments. As a result, they can potentially conflict with other operations that modify the fragments being rewritten.
These are the operations that conflict with rewrite:
Rewrite is not compatible with CreateIndex by default because the operation will change the row addresses that the CreateIndex refers to. However, a fragment reuse index or the stable row ids feature can allow these operations to be compatible.
Several operations modify existing fragments. As a result, they can potentially conflict with Rewrite if they modify the same fragments. However, Merge is overly general and so no conflict detection is possible. As a result, here are the operations that are retryable conflicts with Rewrite:
There is one case where a Rewrite will rebase. This is when the Rewrite operation has a fragment reuse index and there is a CreateIndex operation that is writing the fragment reuse index. In this case the Rewrite will rebase and update its fragment reuse index to include the conflicting fragment reuse index.
As a result, these are the operations that are rebaseable conflicts with Rewrite:
Adds new columns to the table, modifying the schema. All fragments must be updated to include the new columns.
<details> <summary>Merge protobuf message</summary>%%% proto.message.Merge %%%
The Merge operation is a very generic operation. The set of fragments provided in the operation will be the final set of fragments in the resulting dataset. As a result, it has a high potential for conflicts with other operations. If possible, more restrictive operations such as Rewrite, DataReplacement, or Append should be preferred over Merge.
As mentioned above, Merge is a very generic operation, as a result it has a high potential for conflicts with other operations. The following operations conflict with Merge:
These operations are retryable conflicts with Merge:
Removes columns from the table, modifying the schema. This is a metadata-only operation; data files are not modified.
<details> <summary>Project protobuf message</summary>%%% proto.message.Project %%%
Since project only modifies the schema, it is compatible with most other operations. However, it is not compatible with Merge because the Merge operation modifies the schema (can potentially add columns) and the logic to rebase those changes does not currently exist (project is cheap and easy enough to retry).
These are the operations that conflict with Project:
The following operations are retryable conflicts with Project:
Reverts the table to a previous version.
<details> <summary>Restore protobuf message</summary>%%% proto.message.Restore %%%
The Restore operation reverts the table to a previous version. It's generally assumed this trumps any other operation. Here are the operations that conflict with Restore:
Pre-allocates fragment IDs for use in future Rewrite operations.
This allows rewrite operations to reference fragment IDs before the rewrite transaction is committed.
%%% proto.message.ReserveFragments %%%
The ReserveFragments operation is fairly trivial. The only thing it changes is the max fragment id. So this only conflicts with operations that modify the max fragment id. Here are the operations that conflict with ReserveFragments:
Creates a shallow or deep copy of the table.
Shallow clones are metadata-only copies that reference original data files through base_paths.
Deep clones are full copies using object storage native copy operations (e.g., S3 CopyObject).
%%% proto.message.Clone %%%
The Clone operation can only be the first operation in a dataset. If there is an existing dataset, then the Clone operation will fail. As a result, there is no such thing as a conflict with Clone.
Modifies row values without adding or removing rows. Supports two execution modes: REWRITE_ROWS deletes rows in current fragments and rewrites them in new fragments, which is optimal when the majority of columns are modified or only a small number of rows are affected; REWRITE_COLUMNS fully rewrites affected columns within fragments by tombstoning old column versions, which is optimal when most rows are affected but only a subset of columns are modified.
<details> <summary>Update protobuf message</summary>%%% proto.message.Update %%%
Here are the operations that conflict with Update:
An update operation is both a delete and an append operation. Like a Delete operation, it will modify fragments to change the deletion mask. As a result, there will be a retryable conflict with other operations that modify the same fragments. Here are the operations that are retryable conflicts with Update:
Similar to Delete, the Update operation can rebase other modifications to the deletion mask. Here are the operations that are rebaseable conflicts with Update:
Modifies table configuration, table metadata, schema metadata, or field metadata without changing data.
<details> <summary>UpdateConfig protobuf message</summary>%%% proto.message.UpdateConfig %%%
An UpdateConfig operation only modifies table config and tends to be compatible with other operations. Here are the operations that conflict with UpdateConfig:
Replaces data in specific column regions with new data files.
<details> <summary>DataReplacement protobuf message</summary>%%% proto.message.DataReplacement %%%
A DataReplacement operation only replaces a single column's worth of data. As a result, it can be safer and simpler than Merge or Update operations. Here are the operations that conflict with DataReplacement:
The following operations are retryable conflicts with DataReplacement:
Updates the state of MemWal indices (write-ahead log based indices).
<details> <summary>UpdateMemWalState protobuf message</summary>%%% proto.message.UpdateMemWalState %%%
Adds new base paths to the table, enabling reference to data files in additional locations.
<details> <summary>UpdateBases protobuf message</summary>%%% proto.message.UpdateBases %%%
An UpdateBases operation only modifies the base paths. As a result, it only conflicts with other UpdateBases operations and even then only conflicts if the two operations have base paths with the same id, name, or path.
When concurrent transactions attempt to commit against the same read version, Lance employs conflict resolution to determine whether the transactions can coexist. Three outcomes are possible:
Rebasable: The transaction can be modified to incorporate concurrent changes while preserving its semantic intent. The transaction is transformed to account for the concurrent modification, then the commit is retried automatically within the commit layer.
Retryable: The transaction cannot be rebased, but the operation can be re-executed at the application level with updated data. The implementation returns a retryable conflict error, signaling that the application should re-read the data and retry the operation. The retried operation is expected to produce semantically equivalent results.
Incompatible: The transactions conflict in a fundamental way where retrying would violate the operation's assumptions or produce semantically different results than expected. The commit fails with a non-retryable error. Callers should proceed with extreme caution if they decide to retry, as the transaction may produce different output than originally intended.
The TransactionRebase structure tracks the state necessary to rebase a transaction against concurrent commits:
When a concurrent transaction is detected, the rebase process:
affected_rows to detect whether the same rows were modifiedThe following diagram illustrates a rebasable conflict where two Delete operations modify different rows in the same fragment:
gitGraph
commit id: "v1"
commit id: "v2"
branch writer-a
branch writer-b
checkout writer-a
commit id: "Delete rows 100-199" tag: "read_version=2"
checkout writer-b
commit id: "Delete rows 500-599" tag: "read_version=2"
checkout main
merge writer-a tag: "v3"
checkout writer-b
commit id: "Rebase: merge deletion vectors" type: HIGHLIGHT
checkout main
merge writer-b tag: "v4"
In this scenario:
affected_rows do not overlapThe following diagram illustrates a retryable conflict where an Update operation encounters a concurrent Rewrite (compaction) that prevents automatic rebasing:
gitGraph
commit id: "v1"
commit id: "v2"
branch writer-a
branch writer-b
checkout writer-a
commit id: "Compact fragments 1-5" tag: "read_version=2"
checkout writer-b
commit id: "Update rows in fragment 3" tag: "read_version=2"
checkout main
merge writer-a tag: "v3: fragments compacted"
checkout writer-b
commit id: "Detect conflict: cannot rebase" type: REVERSE
In this scenario:
The following diagram illustrates an incompatible conflict where a Delete operation encounters a concurrent Restore that fundamentally invalidates the operation:
gitGraph
commit id: "v1"
commit id: "v2"
commit id: "v3"
branch writer-a
branch writer-b
checkout writer-a
commit id: "Restore to v1" tag: "read_version=3"
checkout writer-b
commit id: "Delete rows added in v2-v3" tag: "read_version=3"
checkout main
merge writer-a tag: "v4: restored to v1"
checkout writer-b
commit id: "Detect conflict: incompatible" type: REVERSE
In this scenario:
If the backing object store does not support atomic operations (rename-if-not-exists or put-if-not-exists), an external manifest store can be used to enable concurrent writers.
An external manifest store is a key-value store that supports put-if-not-exists operations. The external manifest store supplements but does not replace the manifests in object storage. A reader unaware of the external manifest store can still read the table, but may observe a version up to one commit behind the true latest version.
The commit process follows a four-step protocol:
Stage manifest: PUT_OBJECT_STORE {dataset}/_versions/{version}.manifest-{uuid}
Commit to external store: PUT_EXTERNAL_STORE base_uri, version, {dataset}/_versions/{version}.manifest-{uuid}
Finalize in object store: COPY_OBJECT_STORE {dataset}/_versions/{version}.manifest-{uuid} → {dataset}/_versions/{version}.manifest
Update external store pointer: PUT_EXTERNAL_STORE base_uri, version, {dataset}/_versions/{version}.manifest
Fault Tolerance:
If the writer fails after step 2 but before step 4, the external store and object store are temporarily out of sync. Readers detect this condition and attempt to complete the synchronization. If synchronization fails, the reader refuses to load to ensure dataset portability.
The reader follows a validation and synchronization protocol:
Query external store: GET_EXTERNAL_STORE base_uri, version → path
Synchronize to object store: COPY_OBJECT_STORE {dataset}/_versions/{version}.manifest-{uuid} → {dataset}/_versions/{version}.manifest
Update external store: PUT_EXTERNAL_STORE base_uri, version, {dataset}/_versions/{version}.manifest
Return finalized path: Return {dataset}/_versions/{version}.manifest
This protocol ensures that datasets using external manifest stores remain portable: copying the dataset directory preserves all data without requiring the external store.