docs/en/data_source/catalog/iceberg/procedures.md
StarRocks Iceberg Catalog supports a variety of procedures for managing Iceberg tables, including snapshot management, branch management, data maintenance, metadata management, and table management.
You must have the appropriate privileges to execute procedures. For more information about privileges, see Privileges.
Rolls back the table to a specific snapshot. This operation sets the table's current snapshot to the specified snapshot ID.
rollback_to_snapshot SyntaxALTER TABLE [catalog.][database.]table_name
EXECUTE rollback_to_snapshot(<snapshot_id>)
snapshot_id: ID of the snapshot to which you want to roll back the table.
Roll back the table to snapshot with ID 98765:
ALTER TABLE iceberg.sales.order
EXECUTE rollback_to_snapshot(98765);
Cherry picks a specific snapshot and applies it to the current state of the table. This operation creates a new snapshot based on an existing snapshot, while the original snapshot remains unchanged.
cherrypick_snapshot SyntaxALTER TABLE [catalog.][database.]table_name
EXECUTE cherrypick_snapshot(<snapshot_id>)
snapshot_id: ID of the snapshot which you want to cherry pick.
ALTER TABLE iceberg.sales.order
EXECUTE cherrypick_snapshot(54321);
Fast-forwards one branch to another branch's latest snapshot. This operation updates the source branch's snapshot to match the target branch's snapshot.
fast_forward SyntaxALTER TABLE [catalog.][database.]table_name
EXECUTE fast_forward('<from_branch>', '<to_branch>')
from_branch: The branch you want to fast forward. Wrap the branch name in quotes.to_branch: The branch to which you want to fast forward the from_branch. Wrap the branch name in quotes.Fast forward the main branch to the branch test-branch:
ALTER TABLE iceberg.sales.order
EXECUTE fast_forward('main', 'test-branch');
Rewrites data files to optimize file layout. This procedure merges small files to improve query performance and reduce metadata overhead.
rewrite_data_files SyntaxALTER TABLE [catalog.][database.]table_name
EXECUTE rewrite_data_files
("key"=value [,"key"=value, ...])
[WHERE <predicate>]
rewrite_data_files properties"key"=value pairs that declare the manual compaction behaviors. Note that you need to wrap the key in double quotes.
min_file_size_bytesbatch_sizerewrite_allbatch_parallelismWHERE clauseThe following example performs manual Compaction on specific partitions in the Iceberg table t1. The partitions are represented by the clause WHERE part_col = 'p1'. In these partitions, data files that are smaller than 134,217,728 bytes (128 MB) will be merged during the Compaction.
ALTER TABLE t1 EXECUTE rewrite_data_files("min_file_size_bytes"= 134217728) WHERE part_col = 'p1';
Expires snapshots older than a specific timestamp. This operation deletes the data files of the expired snapshots, helping to manage storage usage.
expire_snapshots SyntaxALTER TABLE [catalog.][database.]table_name
EXECUTE expire_snapshots(
[ [older_than =] '<datetime>' ] [, [retain_last =] <int> ]
)
older_thanretain_lastExpire snapshots before '2023-12-17 00:14:38' and retain two snapshots:
-- With the parameter key specified:
ALTER TABLE iceberg.sales.order
EXECUTE expire_snapshots(older_than = '2023-12-17 00:14:38', retain_last = 2);
-- With the parameter key unspecified:
ALTER TABLE iceberg.sales.order
EXECUTE expire_snapshots('2023-12-17 00:14:38', 2);
Removes orphan files from the table that are not referenced by any valid snapshot and are older than a specified timestamp. This operation helps clean up unused files and reclaim storage space.
remove_orphan_files SyntaxALTER TABLE [catalog.][database.]table_name
EXECUTE remove_orphan_files(
[ [older_than =] '<datetime>' ] [, [location =] '<string>' ]
)
older_thanlocationRemove orphan files older than '2024-01-01 00:00:00' from the sub-directory sub_dir of the table location:
-- With the parameter key specified:
ALTER TABLE iceberg.sales.order
EXECUTE remove_orphan_files(older_than = '2024-01-01 00:00:00', location = 's3://iceberg-bucket/iceberg_db/iceberg_table/sub_dir');
-- With the parameter key unspecified:
ALTER TABLE iceberg.sales.order
EXECUTE remove_orphan_files('2024-01-01 00:00:00', 's3://bucket-test/iceberg_db/iceberg_table/sub_dir');
Rewrites data manifest files and merges them by partition to avoid performance degradation caused by excessive small manifests.
:::note This operation rewrites the data manifests of the current snapshot only. :::
rewrite_manifests SyntaxALTER TABLE [catalog.][database.]table_name
EXECUTE rewrite_manifests()
Rewrite the data manifests of the current snapshot:
ALTER TABLE iceberg.sales.order
EXECUTE rewrite_manifests()
Adds data files to an Iceberg table from either a source table or a specific location. This procedure supports Parquet and ORC file formats.
add_files SyntaxALTER TABLE [catalog.][database.]table_name
EXECUTE add_files(
[source_table = '<source_table>' | location = '<location>', file_format = '<format>']
[, recursive = <boolean>]
)
Either source_table or location must be provided, but not both.
source_tablelocationfile_formatlocation)recursiveAdd files from a source table:
ALTER TABLE iceberg.sales.order
EXECUTE add_files(source_table = 'hive_catalog.sales.source_order');
Add files from a specific location with Parquet format:
ALTER TABLE iceberg.sales.order
EXECUTE add_files(location = 's3://bucket/data/order/', file_format = 'parquet', recursive = true);
Add files from a single file:
ALTER TABLE iceberg.sales.order
EXECUTE add_files(location = 's3://bucket/data/order/data.parquet', file_format = 'parquet');
Registers an Iceberg table using a metadata file. This procedure allows you to add an existing Iceberg table to the catalog without migrating data.
register_table SyntaxCALL [catalog.]system.register_table(
database_name = '<database_name>',
table_name = '<table_name>',
metadata_file = '<metadata_file_path>'
)
database_nametable_namemetadata_fileRegister a table using a metadata file:
CALL iceberg_catalog.system.register_table(
database_name = 'sales',
table_name = 'order',
metadata_file = 's3://bucket/metadata/sales/order/metadata/00001-xxxxx-xxxxx-xxxxx.metadata.json'
);
Or use the current catalog:
CALL system.register_table(
database_name = 'sales',
table_name = 'order',
metadata_file = 's3://bucket/metadata/sales/order/metadata/00001-xxxxx-xxxxx-xxxxx.metadata.json'
);