docs/src/content/docs/delta-row-tracking.mdx
import { Tabs, TabItem, Aside } from "@astrojs/starlight/components";
Row tracking allows Delta Lake to track row-level lineage in a Delta Lake table. When enabled on a Delta Lake table, row tracking adds two new metadata fields to the table:
MERGE or UPDATE statement.MERGE or UPDATE statement.You must explicitly enable row tracking using one of the following methods:
delta.enableRowTracking = true in the CREATE TABLE command.```sql
-- Create an empty table
CREATE TABLE student (id INT, name STRING, age INT)
TBLPROPERTIES ('delta.enableRowTracking' = 'true');
-- Using a CTAS statement
CREATE TABLE course_new
TBLPROPERTIES ('delta.enableRowTracking' = 'true')
AS SELECT * FROM course_old;
-- Using a LIKE statement to copy configuration
CREATE TABLE graduate LIKE student;
-- Using a CLONE statement to copy configuration
CREATE TABLE graduate CLONE student;
```
'delta.enableRowTracking' = 'true' in the ALTER TABLE command.```sql
ALTER TABLE grade SET TBLPROPERTIES ('delta.enableRowTracking' = 'true');
```
spark.databricks.delta.properties.defaults.enableRowTracking = true for the current session in the SET command.```sql
SET spark.databricks.delta.properties.defaults.enableRowTracking = true;
```
```python
spark.conf.set("spark.databricks.delta.properties.defaults.enableRowTracking", True)
```
```scala
spark.conf.set("spark.databricks.delta.properties.defaults.enableRowTracking", true)
```
Enabling row tracking may increase the size of the table. Delta Lake stores row tracking metadata fields in hidden metadata columns in the data files. Some operations, such as insert-only operations do not use these hidden columns and instead track the row ids and row commit versions using metadata in the Delta Lake log. Data reorganization operations such as OPTIMIZE and REORG cause the row ids and row commit versions to be tracked using the hidden metadata column, even when they were stored using metadata.
Row tracking adds the following metadata fields that can be accessed when reading a table:
| Column name | Type | Values |
|---|---|---|
_metadata.row_id | Long | The unique identifier of the row. |
_metadata.row_commit_version | Long | The table version at which the row was last inserted or updated. |
The row ids and row commit versions metadata fields are not automatically included when reading the table. Instead, these metadata fields must be manually selected from the hidden _metadata column which is available for all tables in Apache Spark.
```sql
SELECT _metadata.row_id, _metadata.row_commit_version, * FROM table_name;
```
```python
spark.read.table("table_name") \
.select("_metadata.row_id", "_metadata.row_commit_version", "*")
```
```scala
spark.read.table("table_name")
.select("_metadata.row_id", "_metadata.row_commit_version", "*")
```
Row tracking can be disabled to reduce the storage overhead of the metadata fields. After disabling row tracking the metadata fields remain available, but all rows always get assigned a new id and commit version whenever they are touched by an operation.
<Tabs syncKey="code-examples"> <TabItem label="SQL">```sql
ALTER TABLE table_name SET TBLPROPERTIES (delta.enableRowTracking = false);
```
```python
spark.sql("ALTER TABLE table_name SET TBLPROPERTIES (delta.enableRowTracking = false)")
```
```scala
spark.sql("ALTER TABLE table_name SET TBLPROPERTIES (delta.enableRowTracking = false)")
```
The following limitations exist: