docs/en/data_source/catalog/iceberg/DML.md
StarRocks Iceberg Catalog supports a variety of Data Manipulation Language (DML) operations, including inserting data into Iceberg tables.
You must have the appropriate privileges to perform DML operations. For more information about privileges, see Privileges.
Inserts data into an Iceberg table. This feature is supported from v3.1 onwards.
Similar to loading data into StarRocks native tables, if you have the INSERT privilege on an Iceberg table, you can use the INSERT statement to sink the data to the Iceberg table. Currently, only Parquet-formatted Iceberg tables are supported.
INSERT {INTO | OVERWRITE} <table_name>
[ (column_name [, ...]) ]
{ VALUES ( { expression | DEFAULT } [, ...] ) [, ...] | query }
-- If you want to sink data to specified partitions, use the following syntax:
INSERT {INTO | OVERWRITE} <table_name>
PARTITION (par_col1=<value> [, par_col2=<value>...])
{ VALUES ( { expression | DEFAULT } [, ...] ) [, ...] | query }
:::note
NULL values are not allowed in partition columns. Therefore, you must make sure that no empty values are loaded into the partition columns of the Iceberg table.
:::
Appends the data to the Iceberg table.
Overwrites the existing data of the Iceberg table.
The name of the destination column to which you want to load data. You can specify one or more columns. Multiple columns are separated with commas (,).
NULL to the destination column.:::note
You cannot specify the column_name property if you have specified the PARTITION clause.
:::
Expression that assigns values to the destination column.
Assigns a default value to the destination column.
Query statement whose result will be loaded into the Iceberg table. It can be any SQL statement supported by StarRocks.
The partitions into which you want to load data. You must specify all partition columns of the Iceberg table in this property. The partition columns that you specify in this property can be in a different sequence than the partition columns that you have defined in the table creation statement.
:::note
You cannot specify the column_name property if you have specified the PARTITION clause.
:::
Insert three data rows into the partition_tbl_1 table:
INSERT INTO partition_tbl_1
VALUES
("buy", 1, "2023-09-01"),
("sell", 2, "2023-09-02"),
("buy", 3, "2023-09-03");
Insert the result of a SELECT query, which contains simple computations, into the partition_tbl_1 table:
INSERT INTO partition_tbl_1 (id, action, dt) SELECT 1+1, 'buy', '2023-09-03';
Insert the result of a SELECT query, which reads data from the partition_tbl_1 table, into the same table:
INSERT INTO partition_tbl_1 SELECT 'buy', 1, date_add(dt, INTERVAL 2 DAY)
FROM partition_tbl_1
WHERE id=1;
Insert the result of a SELECT query into the partitions that meet two conditions, dt='2023-09-01' and id=1, of the partition_tbl_2 table:
INSERT INTO partition_tbl_2 SELECT 'order', 1, '2023-09-01';
Or
INSERT INTO partition_tbl_2 partition(dt='2023-09-01',id=1) SELECT 'order';
Overwrite all action column values in the partitions that meet two conditions, dt='2023-09-01' and id=1, of the partition_tbl_1 table with close:
INSERT OVERWRITE partition_tbl_1 SELECT 'close', 1, '2023-09-01';
Or
INSERT OVERWRITE partition_tbl_1 partition(dt='2023-09-01',id=1) SELECT 'close';
You can use the DELETE statement to delete data from Iceberg tables based on specified conditions. This feature is supported from v4.1 and later.
DELETE FROM <table_name> WHERE <condition>
table_name: The name of the Iceberg table you want to delete data from. You can use:
catalog_name.database_name.table_namedatabase_name.table_nametable_namecondition: The condition to identify which rows to delete. It can include:
=, !=, >, <, >=, <=, <>AND, OR, NOTIN and NOT IN clausesBETWEEN and LIKE operatorsIS NULL and IS NOT NULLIN or EXISTSDelete rows matching a simple condition:
DELETE FROM iceberg_catalog.db.table1 WHERE id = 3;
Delete multiple rows using IN clause:
DELETE FROM iceberg_catalog.db.table1 WHERE id IN (18, 20, 22);
DELETE FROM iceberg_catalog.db.table1 WHERE id NOT IN (100, 101, 102);
Combine multiple conditions:
DELETE FROM iceberg_catalog.db.table1 WHERE age > 30 AND salary < 70000;
DELETE FROM iceberg_catalog.db.table1 WHERE status = 'inactive' OR last_login < '2023-01-01';
Use LIKE for pattern-based deletion:
DELETE FROM iceberg_catalog.db.table1 WHERE name LIKE 'A%';
DELETE FROM iceberg_catalog.db.table1 WHERE email LIKE '%@example.com';
Use BETWEEN for range-based deletion:
DELETE FROM iceberg_catalog.db.table1 WHERE age BETWEEN 30 AND 40;
DELETE FROM iceberg_catalog.db.table1 WHERE created_date BETWEEN '2023-01-01' AND '2023-12-31';
Delete rows with or without NULL values:
DELETE FROM iceberg_catalog.db.table1 WHERE name IS NULL;
DELETE FROM iceberg_catalog.db.table1 WHERE email IS NULL AND phone IS NULL;
DELETE FROM iceberg_catalog.db.table1 WHERE age IS NOT NULL;
Use sub-queries to identify rows to delete:
-- DELETE with IN sub-query
DELETE FROM iceberg_catalog.db.table1 WHERE id IN (SELECT id FROM temp_table WHERE expired = true);
-- DELETE with EXISTS sub-query
DELETE FROM iceberg_catalog.db.table1 t1 WHERE EXISTS (SELECT user_id FROM inactive_users t2 WHERE t2.user_id = t1.user_id);
You can use the TRUNCATE TABLE statement to quickly delete all data from Iceberg tables.
TRUNCATE TABLE <table_name>
table_name: The name of the Iceberg table that you want to truncate data from. You can use:
catalog_name.database_name.table_namedatabase_name.table_nametable_nameTRUNCATE TABLE iceberg_catalog.my_db.my_table;
SET CATALOG iceberg_catalog;
TRUNCATE TABLE my_db.my_table;
SET CATALOG iceberg_catalog;
USE my_db;
TRUNCATE TABLE my_table;