presto-docs/src/main/sphinx/connector/iceberg.rst
The Iceberg connector allows querying data stored in Iceberg tables.
Iceberg tables store most of the metadata in the metadata files, along with the data on the
filesystem, but it still requires a central place to find the current location of the
current metadata pointer for a table. This central place is called the Iceberg Catalog.
The Presto Iceberg connector supports different types of Iceberg Catalogs : HIVE,
NESSIE, REST, and HADOOP.
To configure the Iceberg connector, create a catalog properties file
etc/catalog/iceberg.properties. To define the catalog type, iceberg.catalog.type property
is required along with the following contents, with the property values replaced as follows:
Hive Metastore catalog ^^^^^^^^^^^^^^^^^^^^^^
The Iceberg connector supports the same configuration for
HMS <https://prestodb.io/docs/current/connector/hive.html#metastore-configuration-properties>_
as a Hive connector.
.. code-block:: none
connector.name=iceberg
hive.metastore.uri=hostname:port
iceberg.catalog.type=hive
File-Based Metastore ^^^^^^^^^^^^^^^^^^^^
For testing or development purposes, this connector can be configured to use a local
filesystem directory as a Hive Metastore. See :ref:installation/deployment:File-Based Metastore.
Glue catalog ^^^^^^^^^^^^
The Iceberg connector supports the same configuration for
Glue <https://prestodb.io/docs/current/connector/hive.html#aws-glue-catalog-configuration-properties>_
as a Hive connector.
.. code-block:: none
connector.name=iceberg
hive.metastore=glue
iceberg.catalog.type=hive
There are additional configurations available when using the Iceberg connector configured with Hive or Glue catalogs.
======================================================== ============================================================= ============
Property Name Description Default
======================================================== ============================================================= ============
hive.metastore.uri The URI(s) of the Hive metastore to connect to using the
Thrift protocol. If multiple URIs are provided, the first
URI is used by default, and the rest of the URIs are
fallback metastores.
Example: ``thrift://192.0.2.3:9083`` or
``thrift://192.0.2.3:9083,thrift://192.0.2.4:9083``.
This property is required if the
``iceberg.catalog.type`` is ``hive`` and ``hive.metastore``
is ``thrift``.
hive.metastore.catalog.name Specifies the catalog name to be passed to the metastore.
iceberg.hive-statistics-merge-strategy Comma separated list of statistics to use from the
Hive Metastore to override Iceberg table statistics.
The available values are NUMBER_OF_DISTINCT_VALUES
and TOTAL_SIZE_IN_BYTES.
**Note**: Only valid when the Iceberg connector is
configured with Hive.
iceberg.hive.table-refresh.backoff-min-sleep-time The minimum amount of time to sleep between retries when 100ms
refreshing table metadata.
iceberg.hive.table-refresh.backoff-max-sleep-time The maximum amount of time to sleep between retries when 5s
refreshing table metadata.
iceberg.hive.table-refresh.max-retry-time The maximum amount of time to take across all retries before 1min
failing a table metadata refresh operation.
iceberg.hive.table-refresh.retries The number of times to retry after errors when refreshing 20
table metadata using the Hive metastore.
iceberg.hive.table-refresh.backoff-scale-factor The multiple used to scale subsequent wait time between 4.0
retries.
iceberg.engine.hive.lock-enabled Whether to use locks to ensure atomicity of commits. true
This will turn off locks but is overridden at a table level
with the table configuration engine.hive.lock-enabled.
======================================================== ============================================================= ============
Nessie catalog ^^^^^^^^^^^^^^
To use a Nessie catalog, configure the catalog type as
iceberg.catalog.type=nessie.
.. code-block:: none
connector.name=iceberg
iceberg.catalog.type=nessie
iceberg.catalog.warehouse=/tmp
iceberg.nessie.uri=https://localhost:19120/api/v1
Additional supported properties for the Nessie catalog:
==================================================== ============================================================
Property Name Description
==================================================== ============================================================
iceberg.nessie.ref The branch/tag to use for Nessie, defaults to main.
iceberg.nessie.uri Nessie API endpoint URI (required).
Example: https://localhost:19120/api/v1
iceberg.nessie.auth.type The authentication type to use.
Available values are BASIC or BEARER.
Example: BEARER
**Note:** Nessie BASIC authentication type is deprecated,
this will be removed in upcoming release
iceberg.nessie.auth.basic.username The username to use with BASIC authentication.
Example: test_user
iceberg.nessie.auth.basic.password The password to use with BASIC authentication.
Example: my$ecretPass
iceberg.nessie.auth.bearer.token The token to use with BEARER authentication.
Example: SXVLUXUhIExFQ0tFUiEK
iceberg.nessie.read-timeout-ms The read timeout in milliseconds for requests
to the Nessie server.
Example: 5000
iceberg.nessie.connect-timeout-ms The connection timeout in milliseconds for the connection
requests to the Nessie server.
Example: 10000
iceberg.nessie.compression-enabled Configuration of whether compression should be enabled or
not for requests to the Nessie server, defaults to true.
iceberg.nessie.client-builder-impl Configuration of the custom ClientBuilder implementation
class to be used.
==================================================== ============================================================
Setting Up Nessie With Docker
To set up a Nessie instance locally using the Docker image, see `Setting up Nessie <https://projectnessie.org/try/docker/>`_. Once the Docker instance is up and running, you should see logs similar to the following example:
.. code-block:: none
2023-09-05 13:11:37,905 INFO [io.quarkus] (main) nessie-quarkus 0.69.0 on JVM (powered by Quarkus 3.2.4.Final) started in 1.921s. Listening on: http://0.0.0.0:19120
2023-09-05 13:11:37,906 INFO [io.quarkus] (main) Profile prod activated.
2023-09-05 13:11:37,906 INFO [io.quarkus] (main) Installed features: [agroal, amazon-dynamodb, cassandra-client, cdi, google-cloud-bigtable, hibernate-validator, jdbc-postgresql, logging-sentry, micrometer, mongodb-client, narayana-jta, oidc, opentelemetry, reactive-routes, resteasy, resteasy-jackson, security, security-properties-file, smallrye-context-propagation, smallrye-health, smallrye-openapi, swagger-ui, vertx]
If log messages related to Nessie's OpenTelemetry collector appear similar to the following example, you can disable OpenTelemetry using the configuration option ``quarkus.otel.sdk.disabled=true``.
.. code-block:: none
2023-08-27 11:10:02,492 INFO [io.qua.htt.access-log] (executor-thread-1) 172.17.0.1 - - [27/Aug/2023:11:10:02 +0000] "GET /api/v1/config HTTP/1.1" 200 62
2023-08-27 11:10:05,007 SEVERE [io.ope.exp.int.grp.OkHttpGrpcExporter] (OkHttp http://localhost:4317/...) Failed to export spans. The request could not be executed. Full error message: Failed to connect to localhost/127.0.0.1:4317
For example, start the Docker image using the following command:
``docker run -p 19120:19120 -e QUARKUS_OTEL_SDK_DISABLED=true ghcr.io/projectnessie/nessie``
For more information about this configuration option and other related options, see the `OpenTelemetry Configuration Reference <https://quarkus.io/guides/opentelemetry#quarkus-opentelemetry_quarkus.otel.sdk.disabled>`_.
For more information about troubleshooting OpenTelemetry traces, see `Troubleshooting traces <https://projectnessie.org/try/configuration/#troubleshooting-traces>`_.
If an error similar to the following example is displayed, this is probably because you are interacting with an http server, and not an https server. You need to set ``iceberg.nessie.uri`` to ``http://localhost:19120/api/v1``.
.. code-block:: none
Caused by: javax.net.ssl.SSLException: Unsupported or unrecognized SSL message
at sun.security.ssl.SSLSocketInputRecord.handleUnknownRecord(SSLSocketInputRecord.java:448)
at sun.security.ssl.SSLSocketInputRecord.decode(SSLSocketInputRecord.java:174)
at sun.security.ssl.SSLTransport.decode(SSLTransport.java:111)
at sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1320)
at sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1233)
at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:417)
at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:389)
at sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:558)
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:201)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.connect(HttpsURLConnectionImpl.java:167)
at org.projectnessie.client.http.impl.jdk8.UrlConnectionRequest.executeRequest(UrlConnectionRequest.java:71)
... 42 more
REST catalog
^^^^^^^^^^^^
To use a REST catalog, configure the catalog type as
``iceberg.catalog.type=rest``. A minimal configuration includes:
.. code-block:: none
connector.name=iceberg
iceberg.catalog.type=rest
iceberg.rest.uri=https://localhost:8181
Additional supported properties for the REST catalog:
==================================================== ============================================================
Property Name Description
==================================================== ============================================================
``iceberg.rest.uri`` REST API endpoint URI (required).
Example: ``https://localhost:8181``
``iceberg.rest.auth.type`` The authentication type to use.
Available values are ``NONE`` or ``OAUTH2`` (default: ``NONE``).
``OAUTH2`` requires either a credential or token.
``iceberg.rest.auth.oauth2.uri`` OAUTH2 server endpoint URI.
Example: ``https://localhost:9191``
``iceberg.rest.auth.oauth2.credential`` The credential to use for OAUTH2 authentication.
Example: ``key:secret``
``iceberg.rest.auth.oauth2.token`` The Bearer token to use for OAUTH2 authentication.
Example: ``SXVLUXUhIExFQ0tFUiEK``
``iceberg.rest.auth.oauth2.scope`` The scope to use for OAUTH2 authentication.
This property is only applicable when using
``iceberg.rest.auth.oauth2.credential``.
Example: ``PRINCIPAL_ROLE:ALL``
``iceberg.rest.nested.namespace.enabled`` In REST Catalogs, tables are grouped into namespaces, that can be
nested. But if a large number of recursive namespaces result in
lower performance, querying nested namespaces can be disabled.
Defaults to ``true``.
``iceberg.rest.session.type`` The session type to use when communicating with the REST catalog.
Available values are ``NONE`` or ``USER`` (default: ``NONE``).
``iceberg.catalog.warehouse`` A catalog warehouse root path for Iceberg tables (optional).
Example: ``s3://warehouse/``
==================================================== ============================================================
Hadoop catalog
^^^^^^^^^^^^^^
To use a Hadoop catalog, configure the catalog type as
``iceberg.catalog.type=hadoop``. A minimal configuration includes:
.. code-block:: none
connector.name=iceberg
iceberg.catalog.type=hadoop
iceberg.catalog.warehouse=hdfs://hostname:port
Hadoop catalog configuration properties:
======================================================= ============================================================= ============
Property Name Description Default
======================================================= ============================================================= ============
``iceberg.catalog.warehouse`` The catalog warehouse root path for Iceberg tables.
The Hadoop catalog requires a file system that supports
an atomic rename operation, such as HDFS, to maintain
metadata files in order to implement an atomic transaction
commit.
Example: ``hdfs://nn:8020/warehouse/path``
Do not set ``iceberg.catalog.warehouse`` to a path in object
stores or local file systems in the production environment.
This property is required if the ``iceberg.catalog.type`` is
``hadoop``. Otherwise, it will be ignored.
``iceberg.catalog.hadoop.warehouse.datadir`` The catalog warehouse root data path for Iceberg tables.
It is only supported with the Hadoop catalog.
Example: ``s3://iceberg_bucket/warehouse``.
This optional property can be set to a path in object
stores or HDFS.
If set, all tables in this Hadoop catalog default to saving
their data and delete files in the specified root
data directory.
``iceberg.catalog.cached-catalog-num`` The number of Iceberg catalogs to cache. This property is ``10``
required if the ``iceberg.catalog.type`` is ``hadoop``.
Otherwise, it will be ignored.
======================================================= ============================================================= ============
Configure the `Amazon S3 <https://prestodb.io/docs/current/connector/hive.html#amazon-s3-configuration>`_
properties to specify a S3 location as the warehouse data directory for the Hadoop catalog. This way,
the data and delete files of Iceberg tables are stored in S3. An example configuration includes:
.. code-block:: none
connector.name=iceberg
iceberg.catalog.type=hadoop
iceberg.catalog.warehouse=hdfs://nn:8020/warehouse/path
iceberg.catalog.hadoop.warehouse.datadir=s3://iceberg_bucket/warehouse
hive.s3.use-instance-credentials=false
hive.s3.aws-access-key=accesskey
hive.s3.aws-secret-key=secretkey
hive.s3.endpoint=http://192.168.0.103:9878
hive.s3.path-style-access=true
Presto C++ Support
^^^^^^^^^^^^^^^^^^
``HIVE``, ``NESSIE``, ``REST``, and ``HADOOP`` Iceberg catalogs are supported in Presto C++.
Configuration Properties
------------------------
.. note::
The Iceberg connector supports configuration options for
`Amazon S3 <https://prestodb.io/docs/current/connector/hive.html#amazon-s3-configuration>`_
as a Hive connector.
The following configuration properties are available for all catalog types:
======================================================= ============================================================= ================================== =================== =============================================
Property Name Description Default Presto Java Support Presto C++ Support
======================================================= ============================================================= ================================== =================== =============================================
``iceberg.catalog.type`` The catalog type for Iceberg tables. The available values ``HIVE`` Yes Yes, only needed on coordinator
are ``HIVE``, ``HADOOP``, and ``NESSIE`` and ``REST``.
``iceberg.hadoop.config.resources`` The path(s) for Hadoop configuration resources. Yes Yes, only needed on coordinator
Example: ``/etc/hadoop/conf/core-site.xml.`` This property
is required if the iceberg.catalog.type is ``hadoop``.
Otherwise, it will be ignored.
``iceberg.file-format`` The storage file format for Iceberg tables. The available ``PARQUET`` Yes No, write is not supported yet
values are ``PARQUET`` and ``ORC``.
``iceberg.compression-codec`` The compression codec to use when writing files. The ``ZSTD`` Yes No, write is not supported yet
available values are ``NONE``, ``SNAPPY``, ``GZIP``,
``LZ4``, and ``ZSTD``.
Note: ``LZ4`` is only available when
``iceberg.file-format=ORC``.
``iceberg.max-partitions-per-writer`` The maximum number of partitions handled per writer. ``100`` Yes No, write is not supported yet
``iceberg.minimum-assigned-split-weight`` A decimal value in the range (0, 1] is used as a minimum ``0.05`` Yes Yes
for weights assigned to each split. A low value may improve
performance on tables with small files. A higher value may
improve performance for queries with highly skewed
aggregations or joins.
``iceberg.enable-merge-on-read-mode`` Enable reading base tables that use merge-on-read for ``true`` Yes Yes, only needed on coordinator
updates.
``iceberg.delete-as-join-rewrite-enabled`` When enabled, equality delete row filtering is applied ``true`` Yes No, Equality delete read is not supported
as a join with the data of the equality delete files.
Deprecated: This property is deprecated and will be removed
in a future release. Use the
``iceberg.delete-as-join-rewrite-max-delete-columns``
configuration property instead.
``iceberg.delete-as-join-rewrite-max-delete-columns`` When set to a number greater than 0, this property enables ``400`` Yes No, Equality delete read is not supported
equality delete row filtering as a join with the data of the
equality delete files. The value of this property is the
maximum number of columns that can be used in the equality
delete files. If the number of columns in the equality delete
files exceeds this value, then the optimization is not
applied and the equality delete files are applied directly to
each row in the data files.
This property is only applicable when
``iceberg.delete-as-join-rewrite-enabled`` is set to
``true``.
``iceberg.enable-parquet-dereference-pushdown`` Enable parquet dereference pushdown. ``true`` Yes No
``iceberg.statistic-snapshot-record-difference-weight`` The amount that the difference in total record count matters Yes Yes, only needed on coordinator
when calculating the closest snapshot when picking
statistics. A value of 1 means a single record is equivalent
to 1 millisecond of time difference.
``iceberg.pushdown-filter-enabled`` Experimental: Enable filter pushdown for Iceberg. This is ``false`` No Yes
only supported with Presto C++.
``iceberg.rows-for-metadata-optimization-threshold`` The maximum number of partitions in an Iceberg table to ``1000`` Yes Yes
allow optimizing queries of that table using metadata. If
an Iceberg table has more partitions than this threshold,
metadata optimization is skipped.
Set to ``0`` to disable metadata optimization.
``iceberg.split-manager-threads`` Number of threads to use for generating Iceberg splits. ``Number of available processors`` Yes Yes, only needed on coordinator
``iceberg.metadata-previous-versions-max`` The maximum number of old metadata files to keep in ``100`` Yes No, write is not supported yet
current metadata log.
``iceberg.metadata-delete-after-commit`` Set to ``true`` to delete the oldest metadata files after ``false`` Yes No, write is not supported yet
each commit.
``iceberg.metrics-max-inferred-column`` The maximum number of columns for which metrics ``100`` Yes No, write is not supported yet
are collected.
``iceberg.max-statistics-file-cache-size`` Maximum size in bytes that should be consumed by the ``256MB`` Yes Yes, only needed on coordinator
statistics file cache.
``iceberg.aggregate-push-down-enabled`` Controls whether to push down aggregate (MIN/MAX/COUNT) to ``true`` Yes Yes
Iceberg based on data file stats.
======================================================= ============================================================= ================================== =================== =============================================
Table Properties
------------------------
Table properties set metadata for the underlying tables. This is key for
CREATE TABLE/CREATE TABLE AS statements. Table properties are passed to the
connector using a WITH clause:
.. code-block:: sql
CREATE TABLE tablename
WITH (
property_name = property_value,
...
)
The following table properties are available, which are specific to the Presto Iceberg connector:
======================================================== =============================================================== ===================== =================== =============================================
Property Name Description Default Presto Java Support Presto C++ Support
======================================================== =============================================================== ===================== =================== =============================================
``commit.retry.num-retries`` Determines the number of attempts for committing the metadata ``4`` Yes No, write is not supported yet
in case of concurrent upsert requests, before failing.
``format-version`` Optionally specifies the format version of the Iceberg ``2`` Yes No, write is not supported yet
specification to use for new tables, either ``1`` or ``2``.
``location`` Optionally specifies the file system location URI for Yes Yes
the table.
``partitioning`` Optionally specifies table partitioning. If a table Yes Yes
is partitioned by columns ``c1`` and ``c2``, the partitioning
property is ``partitioning = ARRAY['c1', 'c2']``.
``read.split.target-size`` The target size for an individual split when generating splits ``134217728`` (128MB) Yes Yes
for a table scan. Generated splits may still be larger or
smaller than this value. Must be specified in bytes.
``write.data.path`` Optionally specifies the file system location URI for Yes No, write is not supported yet
storing the data and delete files of the table. This only
applies to files written after this property is set. Files
previously written aren't relocated to reflect this
parameter.
``write.delete.mode`` Optionally specifies the write delete mode of the Iceberg ``merge-on-read`` Yes No, write is not supported yet
specification to use for new tables, either ``copy-on-write``
or ``merge-on-read``.
``write.format.default`` Optionally specifies the format of table data files, ``PARQUET`` Yes No, write is not supported yet
either ``PARQUET`` or ``ORC``.
``write.metadata.previous-versions-max`` Optionally specifies the max number of old metadata files to ``100`` Yes No, write is not supported yet
keep in current metadata log.
``write.metadata.delete-after-commit.enabled`` Set to ``true`` to delete the oldest metadata file after ``false`` Yes No, write is not supported yet
each commit.
``write.metadata.metrics.max-inferred-column-defaults`` Optionally specifies the maximum number of columns for which ``100`` Yes No, write is not supported yet
metrics are collected.
``write.update.mode`` Optionally specifies the write update mode of the Iceberg ``merge-on-read`` Yes No, write is not supported yet
specification to use for new tables, either ``copy-on-write``
or ``merge-on-read``.
``engine.hive.lock-enabled`` Whether to use Hive metastore locks when committing to Yes No
a Hive metastore
======================================================== =============================================================== ===================== =================== =============================================
The table definition below specifies format ``ORC``, partitioning by columns ``c1`` and ``c2``,
and a file system location of ``s3://test_bucket/test_schema/test_table``:
.. code-block:: sql
CREATE TABLE test_table (
c1 bigint,
c2 varchar,
c3 double
)
WITH (
format = 'ORC',
partitioning = ARRAY['c1', 'c2'],
location = 's3://test_bucket/test_schema/test_table')
)
Deprecated Table Properties
^^^^^^^^^^^^^^^^^^^^^^^^^^^
Some table properties have been deprecated or removed. The following table lists the deprecated
properties and their replacements. Update queries to use the new property names as soon as
possible. They will be removed in a future version.
======================================= =======================================================
Deprecated Property Name New Property Name
======================================= =======================================================
``format`` ``write.format.default``
``format_version`` ``format-version``
``commit_retries`` ``commit.retry.num-retries``
``delete_mode`` ``write.delete.mode``
``metadata_previous_versions_max`` ``write.metadata.previous-versions-max``
``metadata_delete_after_commit`` ``write.metadata.delete-after-commit.enabled``
``metrics_max_inferred_column`` ``write.metadata.metrics.max-inferred-column-defaults``
======================================= =======================================================
Session Properties
------------------
Session properties set behavior changes for queries executed within the given session.
.. list-table::
:header-rows: 1
:widths: 30 50 10 10
* - Property Name
- Description
- Presto Java Support
- Presto C++ Support
* - .. _iceberg-sess-delete-as-join-rewrite-enabled:
``iceberg.delete_as_join_rewrite_enabled``
- Overrides the behavior of the connector property
``iceberg.delete-as-join-rewrite-enabled`` in the current session.
Deprecated: This property is deprecated and will be removed. Use
``iceberg.delete_as_join_rewrite_max_delete_columns`` instead.
- Yes
- No, Equality delete read is not supported
* - .. _iceberg-sess-delete-as-join-rewrite-max-delete-columns:
``iceberg.delete_as_join_rewrite_max_delete_columns``
- Overrides the behavior of the connector property
``iceberg.delete-as-join-rewrite-max-delete-columns`` in the current session.
- Yes
- No, Equality delete read is not supported
* - .. _iceberg-sess-hive-statistics-merge-strategy:
``iceberg.hive_statistics_merge_strategy``
- Overrides the behavior of the connector property
``iceberg.hive-statistics-merge-strategy`` in the current session.
- Yes
- Yes
* - .. _iceberg-sess-rows-for-metadata-optimization-threshold:
``iceberg.rows_for_metadata_optimization_threshold``
- Overrides the behavior of the connector property
``iceberg.rows-for-metadata-optimization-threshold`` in the current session.
- Yes
- Yes
* - .. _iceberg-sess-target-split-size-bytes:
``iceberg.target_split_size_bytes``
- Overrides the target split size for all tables in a query in bytes. Set to 0 to
use the value in each Iceberg table's ``read.split.target-size`` property.
- Yes
- Yes
* - .. _iceberg-sess-affinity-scheduling-file-section-size:
``iceberg.affinity_scheduling_file_section_size``
- When the ``node_selection_strategy`` or ``hive.node-selection-strategy`` property
is set to ``SOFT_AFFINITY``, this configuration property will change the size of
a file chunk that is hashed to a particular node when determining which worker
to assign a split to. Splits which read data from the same file within the same
chunk will hash to the same node. A smaller chunk size will result in a higher
probability of splits being distributed evenly across the cluster, but reduce
locality. See :ref:`develop/connectors:Node Selection Strategy`.
- Yes
- Yes
* - .. _iceberg-sess-parquet-dereference-pushdown-enabled:
``iceberg.parquet_dereference_pushdown_enabled``
- Overrides the behavior of the connector property
``iceberg.enable-parquet-dereference-pushdown`` in the current session.
- Yes
- No
* - .. _iceberg-sess-materialized-view-storage-table-name-prefix:
``materialized_view_storage_table_name_prefix``
- Prefix for automatically generated materialized view storage table names.
Default: ``__mv_storage__``
- Yes
- Yes
* - .. _iceberg-sess-materialized-view-missing-base-table-behavior:
``materialized_view_missing_base_table_behavior``
- Behavior when a base table referenced by a materialized view is missing. Valid
values: ``FAIL``, ``IGNORE``. Default: ``FAIL``
- Yes
- Yes
* - .. _iceberg-sess-materialized-view-max-changed-partitions:
``materialized_view_max_changed_partitions``
- Maximum number of changed partitions to track for materialized view staleness
detection. If the number of changed partitions exceeds this threshold, the
materialized view falls back to a full recompute. Default: ``100``
- Yes
- Yes
* - .. _iceberg-sess-materialized-view-default-max-snapshots-per-refresh:
``materialized_view_default_max_snapshots_per_refresh``
- Default upper bound on snapshots consumed per base table per refresh when the
materialized view does not override it with the ``max_snapshots_per_refresh``
table property. ``0`` means unbounded. Default: ``0``
- Yes
- Yes
* - .. _iceberg-sess-materialized-view-stitching-strategy:
``materialized_view_stitching_strategy``
- Controls when query-time stitching applies to partially stale materialized views.
Only takes effect when ``materialized_view_stale_read_behavior`` is ``USE_STITCHING``.
- ``ALWAYS`` (default): stitch whenever possible.
- ``NEVER``: always use the full view query.
- ``AUTOMATIC``: the cost-based optimizer chooses between the stitched union and
the full view query; falls back to row-count comparison when stats are unknown.
- Yes
- Yes
* - .. _iceberg-sess-materialized-view-incremental-refresh-strategy:
``materialized_view_incremental_refresh_strategy``
- Controls when incremental (delta) refresh applies to materialized views with
``refresh_type = 'INCREMENTAL'``.
- ``ALWAYS`` (default): use delta refresh whenever it is buildable.
- ``NEVER``: always perform a full refresh.
- ``AUTOMATIC``: the cost-based optimizer chooses between delta and full refresh;
falls back to row-count comparison when stats are unknown.
- Yes
- Yes
* - .. _iceberg-sess-max-partitions-per-writer:
``max_partitions_per_writer``
- Overrides the behavior of the connector property
``iceberg.max-partitions-per-writer`` in the current session.
- Yes
- No
* - .. _iceberg-sess-aggregate-push-down-enabled:
``aggregate_push_down_enabled``
- Overrides the behavior of the connector property
``iceberg.aggregate-push-down-enabled`` in the current session.
- Yes
- Yes
Caching Support
---------------
Statistics File Caching
^^^^^^^^^^^^^^^^^^^^^^^
Support for Puffin-based statistics caching. It is enabled by default.
JMX query to get the metrics and verify the cache usage::
SELECT * FROM jmx.current."com.facebook.presto.iceberg.statistics:name=iceberg,type=statisticsfilecache";
Manifest File Caching
^^^^^^^^^^^^^^^^^^^^^
As of Iceberg version 1.1.0, Apache Iceberg provides a mechanism to cache the contents of Iceberg manifest files in memory. This feature helps
to reduce repeated reads of small Iceberg manifest files from remote storage.
The following configuration properties are available:
==================================================== ============================================================= ============
Property Name Description Default
==================================================== ============================================================= ============
``iceberg.io.manifest.cache-enabled`` Enable or disable the manifest caching feature. ``true``
``iceberg.io-impl`` Custom FileIO implementation to use in a catalog. It must ``org.apache.iceberg.hadoop.HadoopFileIO``
be set to enable manifest caching. This is only needed for
Hadoop, Nessie and REST catalogs.
``iceberg.io.manifest.cache.max-total-bytes`` Maximum size of cache size in bytes. ``104857600``
``iceberg.io.manifest.cache.expiration-interval-ms`` Maximum time duration in milliseconds for which an entry ``60000``
stays in the manifest cache. Set to 0 to disable entry
expiration.
``iceberg.io.manifest.cache.max-content-length`` Maximum length of a manifest file to be considered for ``8388608``
caching in bytes. Manifest files with a length exceeding
this size will not be cached.
==================================================== ============================================================= ============
JMX query to get the metrics and verify the cache usage::
SELECT * FROM jmx.current."com.facebook.presto.iceberg:name=iceberg,type=manifestfilecache";
.. note::
Manifest file cache statistics are only available through the JMX connector when the Iceberg connector is configured with a HIVE catalog type.
Presto C++ Support
~~~~~~~~~~~~~~~~~~
Manifest file caching is supported in Presto C++.
Alluxio Data Cache
^^^^^^^^^^^^^^^^^^
A Presto worker caches remote storage data in its original form (compressed and possibly encrypted) on local SSD upon read.
The following configuration properties are required to set in the Iceberg catalog file (catalog/iceberg.properties):
.. code-block:: none
cache.enabled=true
cache.base-directory=file:///mnt/flash/data
cache.type=ALLUXIO
cache.alluxio.max-cache-size=1600GB
hive.node-selection-strategy=SOFT_AFFINITY
JMX queries to get the metrics and verify the cache usage::
SELECT * FROM jmx.current."com.facebook.alluxio:name=client.cachehitrate,type=gauges";
SELECT * FROM jmx.current."com.facebook.alluxio:name=client.cachebytesreadcache,type=meters";
SHOW TABLES FROM jmx.current like '%alluxio%';
Presto C++ Support
~~~~~~~~~~~~~~~~~~
Alluxio data caching is applicable for Presto Java. Async data cache is supported in Presto C++. See :ref:`async_data_caching_and_prefetching`.
File And Stripe Footer Cache
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Caches open file descriptors and stripe or file footer information in leaf worker memory. These pieces of data are mostly frequently accessed when reading files.
The following configuration properties are required to set in the Iceberg catalog file (catalog/iceberg.properties):
.. code-block:: none
# scheduling
hive.node-selection-strategy=SOFT_AFFINITY
# orc
iceberg.orc.file-tail-cache-enabled=true
iceberg.orc.file-tail-cache-size=100MB
iceberg.orc.file-tail-cache-ttl-since-last-access=6h
iceberg.orc.stripe-metadata-cache-enabled=true
iceberg.orc.stripe-footer-cache-size=100MB
iceberg.orc.stripe-footer-cache-ttl-since-last-access=6h
iceberg.orc.stripe-stream-cache-size=300MB
iceberg.orc.stripe-stream-cache-ttl-since-last-access=6h
# parquet
iceberg.parquet.metadata-cache-enabled=true
iceberg.parquet.metadata-cache-size=100MB
iceberg.parquet.metadata-cache-ttl-since-last-access=6h
JMX queries to get the metrics and verify the cache usage::
SELECT * FROM jmx.current."com.facebook.presto.hive:name=iceberg_parquetmetadata,type=cachestatsmbean";
Presto C++ Support
~~~~~~~~~~~~~~~~~~
File and stripe footer cache is not applicable for Presto C++.
Metastore Cache
^^^^^^^^^^^^^^^
Iceberg Connector supports Metastore Caching with some exceptions. Iceberg Connector does not allow enabling TABLE cache.
Metastore Caching is only supported when ``iceberg.catalog.type`` is ``HIVE``.
The Iceberg connector supports the same configuration properties for
`Hive Metastore Caching <https://prestodb.io/docs/current/connector/hive.html#metastore-configuration-properties>`_
as a Hive connector.
The following configuration properties are the minimum set of configurations required to be added in the Iceberg catalog file ``catalog/iceberg.properties``:
.. code-block:: none
# Hive Metastore Cache
hive.metastore.cache.disabled-caches=TABLE
hive.metastore.cache.ttl.default=10m
hive.metastore.cache.refresh-interval.default=5m
Extra Hidden Metadata Columns
-----------------------------
The Iceberg connector exposes extra hidden metadata columns. You can query these
as part of a SQL query by including them in your SELECT statement.
``$path`` column
^^^^^^^^^^^^^^^^
The full file system path name of the file for this row.
.. code-block:: sql
SELECT "$path", regionkey FROM "ctas_nation";
.. code-block:: text
$path | regionkey
---------------------------------+-----------
/full/path/to/file/file.parquet | 2
``$data_sequence_number`` column
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The Iceberg data sequence number in which this row was added.
.. code-block:: sql
SELECT "$data_sequence_number", regionkey FROM "ctas_nation";
.. code-block:: text
$data_sequence_number | regionkey
----------------------------------+------------
2 | 3
``$deleted`` column
^^^^^^^^^^^^^^^^^^^
Whether this row is a deleted row. When this column is used, deleted rows
from delete files will be marked as ``true`` instead of being filtered out of the results.
.. code-block:: sql
DELETE FROM "ctas_nation" WHERE regionkey = 0;
SELECT "$deleted", regionkey FROM "ctas_nation";
.. code-block:: text
$deleted | regionkey
----------+-----------
true | 0
false | 1
``$delete_file_path`` column
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The path of the delete file corresponding to a deleted row, or NULL if the row was not deleted.
When this column is used, deleted rows will not be filtered out of the results.
.. code-block:: sql
DELETE FROM "ctas_nation" WHERE regionkey = 0;
SELECT "$delete_file_path", regionkey FROM "ctas_nation";
.. code-block:: text
$delete_file_path | regionkey
-----------------------------------------------------------------------------------+-----------
file:/path/to/table/data/delete_file_d8510b3e-510a-4fc2-b2b2-e59ead7fd386.parquet | 0
NULL | 1
Presto C++ Support
^^^^^^^^^^^^^^^^^^
All above metadata columns are supported in Presto C++.
Extra Hidden Metadata Tables
----------------------------
The Iceberg connector exposes extra hidden metadata tables. You can query these
as a part of a SQL query by appending name to the table.
``$properties`` Table
^^^^^^^^^^^^^^^^^^^^^
General properties of the given table.
.. code-block:: sql
SELECT * FROM "ctas_nation$properties";
.. code-block:: text
key | value | is_supported_by_presto
----------------------+----------+------------------------
write.format.default | PARQUET | true
``$history`` Table
^^^^^^^^^^^^^^^^^^
History of table state changes.
.. code-block:: sql
SELECT * FROM "ctas_nation$history";
.. code-block:: text
made_current_at | snapshot_id | parent_id | is_current_ancestor
--------------------------------------+---------------------+-----------+---------------------
2022-11-25 20:56:31.784 Asia/Kolkata | 7606232158543069775 | NULL | true
``$snapshots`` Table
^^^^^^^^^^^^^^^^^^^^
Details about the table snapshots. For more information see `Snapshots <https://iceberg.apache.org/spec/#snapshots>`_ in the Iceberg Table Spec.
.. code-block:: sql
SELECT * FROM "ctas_nation$snapshots";
.. code-block:: text
committed_at | snapshot_id | parent_id | operation | manifest_list | summary
--------------------------------------+---------------------+-----------+-----------+----------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2022-11-25 20:56:31.784 Asia/Kolkata | 7606232158543069775 | NULL | append | s3://my-bucket/ctas_nation/metadata/snap-7606232158543069775-1-395a2cad-b244-409b-b030-cc44949e5a4e.avro | {changed-partition-count=1, added-data-files=1, total-equality-deletes=0, added-records=25, total-position-deletes=0, added-files-size=1648, total-delete-files=0, total-files-size=1648, total-records=25, total-data-files=1}
``$manifests`` Table
^^^^^^^^^^^^^^^^^^^^
Details about the manifests of different table snapshots. For more information see `Manifests <https://iceberg.apache.org/spec/#manifests>`_ in the Iceberg Table Spec.
.. code-block:: sql
SELECT * FROM "ctas_nation$manifests";
.. code-block:: text
path | length | partition_spec_id | added_snapshot_id | added_data_files_count | existing_data_files_count | deleted_data_files_count | partitions
---------------------------------------------------------------------------------+--------+-------------------+---------------------+------------------------+---------------------------+--------------------------+-----------
s3://my-bucket/ctas_nation/metadata/395a2cad-b244-409b-b030-cc44949e5a4e-m0.avro | 5957 | 0 | 7606232158543069775 | 1 | 0 | 0 | []
``$partitions`` Table
^^^^^^^^^^^^^^^^^^^^^
Detailed partition information for the table.
.. code-block:: sql
SELECT * FROM "ctas_nation$partitions";
.. code-block:: text
row_count | file_count | total_size | nationkey | name | regionkey | comment
-----------+------------+------------+-------------------------------+------------------------------------------+------------------------------+------------------------------------------------------------
25 | 1 | 1648 | {min=0, max=24, null_count=0} | {min=ALGERIA, max=VIETNAM, null_count=0} | {min=0, max=4, null_count=0} | {min= haggle. careful, max=y final packaget, null_count=0}
``$files`` Table
^^^^^^^^^^^^^^^^
Overview of data files in the current snapshot of the table.
.. code-block:: sql
SELECT * FROM "ctas_nation$files";
.. code-block:: text
content | file_path | file_format | record_count | file_size_in_bytes | column_sizes | value_counts | null_value_counts | nan_value_counts | lower_bounds | upper_bounds | key_metadata | split_offsets | equality_ids
---------+------------------------------------------------------------------------------+-------------+--------------+--------------------+-----------------------------+--------------------------+----------------------+------------------+-------------------------------------------+--------------------------------------------+--------------+---------------+-------------
0 | s3://my-bucket/ctas_nation/data/9f889274-6f74-4d28-8164-275eef99f660.parquet | PARQUET | 25 | 1648 | {1=52, 2=222, 3=105, 4=757} | {1=25, 2=25, 3=25, 4=25} | {1=0, 2=0, 3=0, 4=0} | NULL | {1=0, 2=ALGERIA, 3=0, 4= haggle. careful} | {1=24, 2=VIETNAM, 3=4, 4=y final packaget} | NULL | NULL | NULL
``$changelog`` Table
^^^^^^^^^^^^^^^^^^^^
This table lets you view which row-level changes have occurred to the table in a
particular order over time. The ``$changelog`` table presents the history of
changes to the table and makes the data available to process through a
query.
The result of a changelog query always returns a static schema with four
columns:
1. ``operation``: (``VARCHAR``) indicating whether the row was inserted,
updated, or deleted.
2. ``ordinal``: (``int``) A number indicating a relative order that a particular
change needs to be applied to the table relative to all other changes.
3. ``snapshotid``: (``bigint``) Represents the snapshot a row-level
change was made in.
4. ``rowdata``: (``row(T)``) which includes the data for the particular row. The
inner values of this type match the schema of the parent table.
The changelog table can be queried with the following name format:
.. code-block:: sql
... FROM "<table>[@<begin snapshot ID>]$changelog[@<end snapshot ID>]"
- ``<table>`` is the name of the table.
- ``<begin snapshot ID>`` is the snapshot of the table you want to begin viewing
changes from. This parameter is optional. If absent, the oldest available
snapshot is used.
- ``<end snapshot ID>`` is the last snapshot for which you want to view changes.
This parameter is optional. If absent, the most current snapshot of the
table is used.
One use for the ``$changelog`` table would be to find when a record was inserted
or removed from the table. To accomplish this, the ``$changelog`` table can be
used in conjunction with the ``$snapshots`` table. First, choose a snapshot ID
from the ``$snapshots`` table to choose the starting point.
.. code-block:: sql
SELECT * FROM "orders$snapshots";
.. code-block:: text
committed_at | snapshot_id | parent_id | operation | manifest_list | summary
---------------------------------------------+---------------------+---------------------+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2023-09-26 08:45:20.930 America/Los_Angeles | 2423571386296047175 | NULL | append | file:/var/folders/g_/6_hxl7r16qdddw7956j_r88h0000gn/T/PrestoTest8140889264166671718/catalog/tpch/ctas_orders/metadata/snap-2423571386296047175-1-3f288b1c-95a9-406b-9e17-9cfe31a11b48.avro | {changed-partition-count=1, added-data-files=4, total-equality-deletes=0, added-records=100, total-position-deletes=0, added-files-size=9580, total-delete-files=0, total-files-size=9580, total-records=100, total-data-files=4}
2023-09-26 08:45:36.942 America/Los_Angeles | 8702997868627997320 | 2423571386296047175 | append | file:/var/folders/g_/6_hxl7r16qdddw7956j_r88h0000gn/T/PrestoTest8140889264166671718/catalog/tpch/ctas_orders/metadata/snap-8702997868627997320-1-a2e1c714-7eed-4e2c-b144-dae4147ebaa4.avro | {changed-partition-count=1, added-data-files=1, total-equality-deletes=0, added-records=1, total-position-deletes=0, added-files-size=1687, total-delete-files=0, total-files-size=11267, total-records=101, total-data-files=5}
2023-09-26 08:45:39.866 America/Los_Angeles | 7615903782581283889 | 8702997868627997320 | append | file:/var/folders/g_/6_hxl7r16qdddw7956j_r88h0000gn/T/PrestoTest8140889264166671718/catalog/tpch/ctas_orders/metadata/snap-7615903782581283889-1-d94c2114-fd22-4de2-9ab5-c0b5bf67282f.avro | {changed-partition-count=1, added-data-files=3, total-equality-deletes=0, added-records=3, total-position-deletes=0, added-files-size=4845, total-delete-files=0, total-files-size=16112, total-records=104, total-data-files=8}
2023-09-26 08:45:48.404 America/Los_Angeles | 677209275408372885 | 7615903782581283889 | append | file:/var/folders/g_/6_hxl7r16qdddw7956j_r88h0000gn/T/PrestoTest8140889264166671718/catalog/tpch/ctas_orders/metadata/snap-677209275408372885-1-ad69e208-1440-459b-93e8-48e61f961758.avro | {changed-partition-count=1, added-data-files=3, total-equality-deletes=0, added-records=5, total-position-deletes=0, added-files-size=4669, total-delete-files=0, total-files-size=20781, total-records=109, total-data-files=11}
Now that we know the snapshots available to query in the changelog, we can see
what changes were made to the table since it was created. Specifically, this
example uses the earliest snapshot ID: ``2423571386296047175``
.. code-block:: sql
SELECT * FROM "ctas_orders@2423571386296047175$changelog" ORDER BY ordinal;
.. code-block:: text
operation | ordinal | snapshotid | rowdata
-----------+---------+---------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
INSERT | 0 | 8702997868627997320 | {orderkey=37504, custkey=1291, orderstatus=O, totalprice=165509.83, orderdate=1996-03-04, orderpriority=5-LOW, clerk=Clerk#000000871, shippriority=0, comment=c theodolites alongside of the fluffily bold requests haggle quickly against }
INSERT | 1 | 7615903782581283889 | {orderkey=12001, custkey=739, orderstatus=F, totalprice=138635.75, orderdate=1994-07-07, orderpriority=2-HIGH, clerk=Clerk#000000863, shippriority=0, comment=old, even theodolites. regular, special theodolites use furio}
INSERT | 1 | 7615903782581283889 | {orderkey=17989, custkey=364, orderstatus=F, totalprice=133669.05, orderdate=1994-01-17, orderpriority=4-NOT SPECIFIED, clerk=Clerk#000000547, shippriority=0, comment=ously express excuses. even theodolit}
INSERT | 1 | 7615903782581283889 | {orderkey=37504, custkey=1291, orderstatus=O, totalprice=165509.83, orderdate=1996-03-04, orderpriority=5-LOW, clerk=Clerk#000000871, shippriority=0, comment=c theodolites alongside of the fluffily bold requests haggle quickly against }
INSERT | 2 | 677209275408372885 | {orderkey=17991, custkey=92, orderstatus=O, totalprice=20732.51, orderdate=1998-07-09, orderpriority=4-NOT SPECIFIED, clerk=Clerk#000000636, shippriority=0, comment= the quickly express accounts. iron}
INSERT | 2 | 677209275408372885 | {orderkey=17989, custkey=364, orderstatus=F, totalprice=133669.05, orderdate=1994-01-17, orderpriority=4-NOT SPECIFIED, clerk=Clerk#000000547, shippriority=0, comment=ously express excuses. even theodolit}
INSERT | 2 | 677209275408372885 | {orderkey=17990, custkey=458, orderstatus=O, totalprice=218031.58, orderdate=1998-03-18, orderpriority=3-MEDIUM, clerk=Clerk#000000340, shippriority=0, comment=ounts wake final foxe}
INSERT | 2 | 677209275408372885 | {orderkey=18016, custkey=403, orderstatus=O, totalprice=174070.99, orderdate=1996-03-19, orderpriority=1-URGENT, clerk=Clerk#000000629, shippriority=0, comment=ly. quickly ironic excuses are furiously. carefully ironic pack}
INSERT | 2 | 677209275408372885 | {orderkey=18017, custkey=958, orderstatus=F, totalprice=203091.02, orderdate=1993-03-26, orderpriority=1-URGENT, clerk=Clerk#000000830, shippriority=0, comment=sleep quickly bold requests. slyly pending pinto beans haggle in pla}
``$refs`` Table
^^^^^^^^^^^^^^^^^^^^
Details about Iceberg references including branches and tags. For more information see `Branching and Tagging <https://iceberg.apache.org/docs/nightly/branching/>`_.
.. code-block:: sql
SELECT * FROM "ctas_nation$refs";
.. code-block:: text
name | type | snapshot_id | max_reference_age_in_ms | min_snapshots_to_keep | max_snapshot_age_in_ms
------------+--------+---------------------+-------------------------+-----------------------+------------------------
main | BRANCH | 3074797416068623476 | NULL | NULL | NULL
testBranch | BRANCH | 3374797416068698476 | NULL | NULL | NULL
testTag | TAG | 4686954189838128572 | 10 | NULL | NULL
``$metadata_log_entries`` Table
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Provides metadata log entries for the table.
.. code-block:: sql
SELECT * FROM "region$metadata_log_entries";
.. code-block:: text
timestamp | file | latest_snapshot_id | latest_schema_id | latest_sequence_number
-------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+---------------------+------------------+------------------------
2024-12-28 23:41:30.451 Asia/Kolkata | hdfs://localhost:9000/user/hive/warehouse/iceberg_schema.db/region1/metadata/00000-395385ba-3b69-47a7-9c5b-61d056de55c6.metadata.json | 5983271822201743253 | 0 | 1
2024-12-28 23:42:42.207 Asia/Kolkata | hdfs://localhost:9000/user/hive/warehouse/iceberg_schema.db/region1/metadata/00001-61151efc-0e01-4a47-a5e6-7b72749cc4a8.metadata.json | 5841566266546816471 | 0 | 2
2024-12-28 23:42:47.591 Asia/Kolkata | hdfs://localhost:9000/user/hive/warehouse/iceberg_schema.db/region1/metadata/00002-d4a9c326-5053-4a26-9082-d9fbf1d6cd14.metadata.json | 6894018661156805064 | 0 | 3
Presto C++ Support
^^^^^^^^^^^^^^^^^^
All above metadata tables, except `$changelog`, are supported in Presto C++.
Procedures
----------
Use the :doc:`/sql/call` statement to perform data manipulation or administrative tasks. Procedures are available in the ``system`` schema of the catalog.
Register Table
^^^^^^^^^^^^^^
Iceberg tables for which table data and metadata already exist in the
file system can be registered with the catalog. Use the ``register_table``
procedure on the catalog's ``system`` schema to register a table which
already exists but is not known by the catalog.
The following arguments are available:
===================== ========== =============== =======================================================================
Argument Name Required Type Description
===================== ========== =============== =======================================================================
``schema`` Yes string Schema of the table to register
``table_name`` Yes string Name of the table to register
``metadata_location`` Yes string The location of the table metadata which is to be registered
``metadata_file`` string An optionally specified metadata file which is to be registered
===================== ========== =============== =======================================================================
Examples:
* Register a table through supplying the target schema, desired table name, and the location of the table metadata::
CALL iceberg.system.register_table('schema_name', 'table_name', 'hdfs://localhost:9000/path/to/iceberg/table/metadata/dir')
CALL iceberg.system.register_table(table_name => 'table_name', schema => 'schema_name', metadata_location => 'hdfs://localhost:9000/path/to/iceberg/table/metadata/dir')
.. note::
If multiple metadata files of the same version exist at the specified location,
the most recently modified one is used.
* Register a table through additionally supplying a specific metadata file::
CALL iceberg.system.register_table('schema_name', 'table_name', 'hdfs://localhost:9000/path/to/iceberg/table/metadata/dir', '00000-35a08aed-f4b0-4010-95d2-9d73ef4be01c.metadata.json')
CALL iceberg.system.register_table(table_name => 'table_name', schema => 'schema_name', metadata_location => 'hdfs://localhost:9000/path/to/iceberg/table/metadata/dir', metadata_file => '00000-35a08aed-f4b0-4010-95d2-9d73ef4be01c.metadata.json')
.. note::
The Iceberg REST catalog may not support table register depending on the
type of the backing catalog.
.. note::
When registering a table with the Hive metastore, the user calling the procedure
is set as the owner of the table and has ``SELECT``, ``INSERT``, ``UPDATE``, and
``DELETE`` privileges for that table. These privileges can be altered using the
``GRANT`` and ``REVOKE`` commands.
.. note::
When using the Hive catalog, attempts to read registered Iceberg tables
using the Hive connector will fail.
Unregister Table
^^^^^^^^^^^^^^^^
Iceberg tables can be unregistered from the catalog using the ``unregister_table``
procedure on the catalog's ``system`` schema.
The following arguments are available:
===================== ========== =============== ===================================
Argument Name Required Type Description
===================== ========== =============== ===================================
``schema`` Yes string Schema of the table to unregister
``table_name`` Yes string Name of the table to unregister
===================== ========== =============== ===================================
Examples::
CALL iceberg.system.unregister_table('schema_name', 'table_name')
CALL iceberg.system.unregister_table(table_name => 'table_name', schema => 'schema_name')
.. note::
Table data and metadata remain in the filesystem after a call to
``unregister_table`` only when using the Hive catalog. This is similar to
the behavior listed for the `DROP TABLE <#id1>`_ command.
Rollback to Snapshot
^^^^^^^^^^^^^^^^^^^^
Rollback a table to a specific snapshot ID. Iceberg can rollback to a specific snapshot ID by using the ``rollback_to_snapshot`` procedure on Iceberg's ``system`` schema::
CALL iceberg.system.rollback_to_snapshot('schema_name', 'table_name', snapshot_id);
The following arguments are available:
===================== ========== =============== =======================================================================
Argument Name Required Type Description
===================== ========== =============== =======================================================================
``schema`` Yes string Schema of the table to update
``table_name`` Yes string Name of the table to update
``snapshot_id`` Yes long Snapshot ID to rollback to
===================== ========== =============== =======================================================================
Rollback to Timestamp
^^^^^^^^^^^^^^^^^^^^^
Rollback a table to a given point in time. Iceberg can rollback to a specific point in time by using the ``rollback_to_timestamp`` procedure on Iceberg's ``system`` schema.
The following arguments are available:
===================== ========== =============== =======================================================================
Argument Name Required Type Description
===================== ========== =============== =======================================================================
``schema`` Yes string Schema of the table to update
``table_name`` Yes string Name of the table to update
``timestamp`` Yes timestamp Timestamp to rollback to
===================== ========== =============== =======================================================================
Example::
CALL iceberg.system.rollback_to_timestamp('schema_name', 'table_name', TIMESTAMP '1995-04-26 00:00:00.000');
Set Current Snapshot
^^^^^^^^^^^^^^^^^^^^
Set a ``snapshot_id`` or ``ref`` as the current snapshot for a table.
.. note::
Use either ``snapshot_id`` or ``ref``, but do not use both in the same procedure.
The following arguments are available:
===================== ========== =============== =======================================================================
Argument Name Required Type Description
===================== ========== =============== =======================================================================
``schema`` Yes string Schema of the table to update
``table_name`` Yes string Name of the table to update
``snapshot_id`` long Snapshot ID to set as current
``ref`` string Snapshot Reference (branch or tag) to set as current
===================== ========== =============== =======================================================================
Examples:
* Set current table snapshot ID for the given table to 10000 ::
CALL iceberg.system.set_current_snapshot('schema_name', 'table_name', 10000);
* Set current table snapshot ID for the given table to snapshot ID of branch1 ::
CALL iceberg.system.set_current_snapshot(schema => 'schema_name', table_name => 'table_name', ref => 'branch1');
Expire Snapshots
^^^^^^^^^^^^^^^^
Each DML (Data Manipulation Language) action in Iceberg produces a new snapshot while keeping the old data and metadata for snapshot isolation and time travel. Use `expire_snapshots` to remove older snapshots and their files.
This procedure removes old snapshots and their corresponding files, and never removes files which are required by a non-expired snapshot.
The following arguments are available:
===================== ========== =============== =======================================================================
Argument Name Required Type Description
===================== ========== =============== =======================================================================
``schema`` Yes string Schema of the table to update
``table_name`` Yes string Name of the table to update
``older_than`` timestamp Timestamp before which snapshots will be removed (Default: 5 days ago)
``retain_last`` int Number of ancestor snapshots to preserve regardless of older_than
(defaults to 1)
``snapshot_ids`` array of long Array of snapshot IDs to expire
===================== ========== =============== =======================================================================
Examples:
* Remove snapshots older than a specific day and time, but retain the last 10 snapshots::
CALL iceberg.system.expire_snapshots('schema_name', 'table_name', TIMESTAMP '2023-08-31 00:00:00.000', 10);
* Remove snapshots with snapshot ID 10001 and 10002 (note that these snapshot IDs should not be the current snapshot)::
CALL iceberg.system.expire_snapshots(schema => 'schema_name', table_name => 'table_name', snapshot_ids => ARRAY[10001, 10002]);
Remove Orphan Files
^^^^^^^^^^^^^^^^^^^
Use to remove files which are not referenced in any metadata files of an Iceberg table.
The following arguments are available:
===================== ========== =============== =======================================================================
Argument Name Required Type Description
===================== ========== =============== =======================================================================
``schema`` Yes string Schema of the table to clean
``table_name`` Yes string Name of the table to clean
``older_than`` timestamp Remove orphan files created before this timestamp (Default: 3 days ago)
===================== ========== =============== =======================================================================
Examples:
* Remove any files which are not known to the table `db.sample` and older than specified timestamp::
CALL iceberg.system.remove_orphan_files('db', 'sample', TIMESTAMP '2023-08-31 00:00:00.000');
* Remove any files which are not known to the table `db.sample` and created 3 days ago (by default)::
CALL iceberg.system.remove_orphan_files(schema => 'db', table_name => 'sample');
Fast Forward Branch
^^^^^^^^^^^^^^^^^^^
This procedure advances the current snapshot of the specified branch to a more recent snapshot from another branch without replaying any intermediate snapshots.
``branch`` can be fast-forwarded up to the ``to`` snapshot if ``branch`` is an ancestor of ``to``.
The following arguments are available:
===================== ========== =============== =======================================================================
Argument Name required type Description
===================== ========== =============== =======================================================================
``schema`` Yes string Schema of the table to update
``table_name`` Yes string Name of the table to update
``branch`` Yes string The branch you want to fast-forward
``to`` Yes string The branch you want to fast-forward to
===================== ========== =============== =======================================================================
Examples:
* Fast-forward the ``dev`` branch to the latest snapshot of the ``main`` branch: ::
CALL iceberg.system.fast_forward('schema_name', 'table_name', 'dev', 'main');
* Given the branch named ``branch1`` does not exist yet, create a new branch named ``branch1`` and set it's current snapshot equal to the latest snapshot of the ``main`` branch: ::
CALL iceberg.system.fast_forward('schema_name', 'table_name', 'branch1', 'main');
Statistics file cache invalidation procedure
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* Invalidate Statistics file cache: ::
CALL <catalog-name>.system.invalidate_statistics_file_cache();
Manifest file cache invalidation procedure
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* Invalidate Manifest file cache: ::
CALL <catalog-name>.system.invalidate_manifest_file_cache();
Set Table Property
^^^^^^^^^^^^^^^^^^
Set from the catalog using the ``set_table_property`` procedure on the catalog's ``system`` schema.
The following arguments are available:
===================== ========== =============== =======================================================================
Argument Name Required Type Description
===================== ========== =============== =======================================================================
``schema`` Yes string Schema of the table to update
``table_name`` Yes string Name of the table to update
``key`` Yes string Name of the table property
``value`` Yes string Value for the table property
===================== ========== =============== =======================================================================
Examples:
* Set table property ``commit.retry.num-retries`` to ``10`` for a Iceberg table: ::
CALL iceberg.system.set_table_property('schema_name', 'table_name', 'commit.retry.num-retries', '10');
Rewrite Data Files
^^^^^^^^^^^^^^^^^^
Iceberg tracks all data files under different partition specs in a table. More data files require
more metadata to be stored in manifest files, and small data files can cause an unnecessary amount of metadata and
less efficient queries due to file open costs. Also, data files under different partition specs can
prevent metadata level deletion or thorough predicate push down for Presto.
Use ``rewrite_data_files`` to rewrite the data files of a specified table so that they are
merged into fewer but larger files under the newest partition spec. If the table is partitioned, the data
files compaction can act separately on the selected partitions to improve read performance by reducing
metadata overhead and runtime file open cost.
The following arguments are available:
===================== ========== =============== =======================================================================
Argument Name required type Description
===================== ========== =============== =======================================================================
``schema`` Yes string Schema of the table to update.
``table_name`` Yes string Name of the table to update.
``filter`` string Predicate as a string used for filtering the files. Currently
only rewrite of whole partitions is supported. Filter on partition
columns. The default value is ``true``.
``sorted_by`` array of Specify an array of one or more columns to use for sorting. When
strings performing a rewrite, the specified sorting definition must be
compatible with the table's own sorting property, if one exists.
Supports standard column sorting (example, ``'col ASC'``) and
z-order sorting (example, ``'zorder(col1, col2)'``).
``options`` map Options to be used for data files rewrite. See options table below.
===================== ========== =============== =======================================================================
Rewrite Options
~~~~~~~~~~~~~~~
The ``options`` parameter accepts a map of option names to values. The following options are available:
========================== ======== =============== ========================================================================
Option Name Type Default Description
========================== ======== =============== ========================================================================
``min-input-files`` integer 5 Minimum number of files in a partition required for rewriting.
Partitions with fewer files are skipped. This is a **group filter**
that operates at the partition level.
``min-file-size-bytes`` long 0 (disabled) Files smaller than this threshold (in bytes) are selected for
rewriting. This is a **file filter** that operates on individual
files.
``max-file-size-bytes`` long 0 (disabled) Files larger than this threshold (in bytes) are selected for
rewriting. This is a **file filter** that operates on individual
files.
``rewrite-all`` boolean false When set to ``true``, bypasses all filtering (both group and file
level) and rewrites all data files in the table or selected
partitions. Useful for simple, complete table rewrites.
========================== ======== =============== ========================================================================
**Filter Behavior:**
* **File Filters** (``min-file-size-bytes``, ``max-file-size-bytes``): Select individual files using OR logic.
A file is selected if it is too small (< min-file-size-bytes) OR too large (> max-file-size-bytes).
* **Group Filters** (``min-input-files``): Select entire partitions using AND logic. After file filters are
applied, only partitions with at least ``min-input-files`` files are selected for rewriting.
* **Combined Behavior**: File filters are applied first, then group filters. This allows you to target specific
files, like small files, while ensuring you only rewrite partitions with enough files to make the operation
worthwhile.
Examples
~~~~~~~~
* Rewrite all the data files in table ``db.sample`` to the newest partition spec and combine small files to larger ones::
CALL iceberg.system.rewrite_data_files('db', 'sample');
CALL iceberg.system.rewrite_data_files(schema => 'db', table_name => 'sample');
* Rewrite the data files in partitions specified by a filter in table ``db.sample`` to the newest partition spec::
CALL iceberg.system.rewrite_data_files('db', 'sample', 'partition_key = 1');
CALL iceberg.system.rewrite_data_files(schema => 'db', table_name => 'sample', filter => 'partition_key = 1');
* Rewrite the data files in partitions specified by a filter in table ``db.sample`` to the newest partition spec and a sorting definition::
CALL iceberg.system.rewrite_data_files('db', 'sample', 'partition_key = 1', ARRAY['join_date DESC NULLS FIRST', 'emp_id ASC NULLS LAST']);
CALL iceberg.system.rewrite_data_files(schema => 'db', table_name => 'sample', filter => 'partition_key = 1', sorted_by => ARRAY['join_date']);
* Rewrite only small files (less than 100MB) in table ``db.sample``::
CALL iceberg.system.rewrite_data_files(
schema => 'db',
table_name => 'sample',
options => map(array['min-file-size-bytes'], array['104857600'])
);
* Rewrite only partitions with at least 10 files in table ``db.sample``::
CALL iceberg.system.rewrite_data_files(
schema => 'db',
table_name => 'sample',
options => map(array['min-input-files'], array['10'])
);
* Rewrite small files (< 50MB) OR large files (> 1GB) in partitions with at least 3 files::
CALL iceberg.system.rewrite_data_files(
schema => 'db',
table_name => 'sample',
options => map(
array['min-file-size-bytes', 'max-file-size-bytes', 'min-input-files'],
array['52428800', '1073741824', '3']
)
);
* Rewrite all data files without any filtering::
CALL iceberg.system.rewrite_data_files(
schema => 'db',
table_name => 'sample',
options => map(array['rewrite-all'], array['true'])
);
* Combine options with filter and sorting::
CALL iceberg.system.rewrite_data_files(
schema => 'db',
table_name => 'sample',
filter => 'partition_key = 1',
sorted_by => ARRAY['join_date'],
options => map(array['min-input-files'], array['3'])
);
* Use z-order sorting for multi-dimensional data clustering::
CALL iceberg.system.rewrite_data_files(
schema => 'db',
table_name => 'sample',
sorted_by => ARRAY['zorder(customer_id, order_date)']
);
Z-order sorting creates a space-filling curve that interleaves bits from multiple columns,
providing better data locality for queries that filter on multiple dimensions. This is
particularly useful for tables with multiple commonly-queried columns.
Rewrite Manifests
^^^^^^^^^^^^^^^^^
This procedure rewrites the manifest files of an Iceberg table to optimize table metadata.
The procedure is a metadata-only operation and commits a new snapshot with `operation = replace`.
The following arguments are available:
===================== ========== =============== ========================================================================
Argument Name required type Description
===================== ========== =============== ========================================================================
``schema`` Yes string Schema of the table to update
``table_name`` Yes string Name of the table to update
``spec_id`` No integer Partition spec ID to rewrite manifests for.
If not specified, manifests for the curren partition spec are rewritten.
===================== ========== =============== ========================================================================
``rewrite_manifests`` does not modify data files and does not change query results.
The procedure may be a logical no-op if the existing manifests are already optimal.
Delete-only manifests are retained as long as snapshots that reference them are valid.
To allow cleanup of such manifests, old snapshots must first be expired using ``CALL system.expire_snapshots``.
The procedure always commits a snapshot with `operation = replace`, even when no physical rewrite is required.
Examples:
* Rewrite manifests for a table using positional arguments: ::
CALL iceberg.system.rewrite_manifests('schema_name', 'table_name');
* Rewrite manifests for a specific partition spec: ::
CALL iceberg.system.rewrite_manifests('schema_name', 'table_name', 0);
Presto C++ Support
^^^^^^^^^^^^^^^^^^
All above procedures are supported in Presto C++.
Iceberg Functions
-----------------
Z-Order Function
^^^^^^^^^^^^^^^^
The ``zorder`` function computes a z-order value for multi-dimensional data clustering.
Z-order is a space-filling curve that interleaves bits from multiple columns to create
a single sortable value that preserves spatial locality across multiple dimensions.
**Syntax**::
zorder(ROW(column1, column2, ...)) -> varbinary
**Description:**
The ``zorder`` function takes a ROW containing multiple columns and returns a VARBINARY
value representing the z-order curve value. This value can be used for sorting data to
improve query performance when filtering on multiple columns.
**Supported Column Types:**
* ``TINYINT``, ``SMALLINT``, ``INTEGER``, ``BIGINT``
* ``REAL``, ``DOUBLE``
* ``DECIMAL``
* ``DATE``
* ``TIMESTAMP``
**Usage:**
The ``zorder`` function is primarily used with the ``rewrite_data_files`` procedure's
``sorted_by`` parameter to reorganize data files for better multi-dimensional query
performance.
**Examples:**
* Compute z-order value for two columns::
SELECT zorder(ROW(customer_id, order_date)) FROM orders;
* Use with ``rewrite_data_files`` to reorganize table data::
CALL iceberg.system.rewrite_data_files(
schema => 'sales',
table_name => 'orders',
sorted_by => ARRAY['zorder(customer_id, order_date)']
);
* Z-order with three dimensions::
CALL iceberg.system.rewrite_data_files(
schema => 'analytics',
table_name => 'events',
sorted_by => ARRAY['zorder(user_id, event_time, event_type)']
);
**Benefits:**
* Improves query performance for filters on multiple columns
* Better data locality compared to single-column sorting
* Particularly effective for tables with multiple commonly-queried dimensions
* Reduces the amount of data scanned when filtering on any combination of the z-ordered columns
**Note:** Z-order is most effective when the columns have similar cardinality and are
frequently used together in query predicates.
SQL Support
-----------
======================================== ============= ============ ============================================================================
SQL Operation Presto Java Presto C++ Comments
======================================== ============= ============ ============================================================================
``CREATE SCHEMA`` Yes Yes
``CREATE TABLE`` Yes Yes
``CREATE VIEW`` Yes Yes
``INSERT INTO`` Yes No
``CREATE TABLE AS SELECT`` Yes No
``SELECT`` Yes Yes Read is supported in Presto C++ including those with positional delete files.
``ALTER TABLE`` Yes Yes
``ALTER TABLE ADD COLUMN DEFAULT`` Yes Yes
``ALTER TABLE ALTER COLUMN SET DEFAULT`` Yes Yes
``ALTER VIEW`` Yes Yes
``TRUNCATE`` Yes Yes
``DELETE`` Yes No
``DROP TABLE`` Yes Yes
``DROP VIEW`` Yes Yes
``DROP SCHEMA`` Yes Yes
``SHOW CREATE TABLE`` Yes Yes
``SHOW COLUMNS`` Yes Yes
``DESCRIBE`` Yes Yes
``UPDATE`` Yes No
``MERGE`` Yes No
======================================== ============= ============ ============================================================================
The Iceberg connector supports querying and manipulating Iceberg tables and schemas
(databases). Here are some examples of the SQL operations supported by Presto:
CREATE SCHEMA
^^^^^^^^^^^^^
Create a new Iceberg schema named ``web`` that stores tables in an
S3 bucket named ``my-bucket``::
CREATE SCHEMA iceberg.web
WITH (location = 's3://my-bucket/')
CREATE TABLE
^^^^^^^^^^^^
Create a new Iceberg table named ``page_views`` in the ``web`` schema
that is stored using the ORC file format, partitioned by ``ds`` and
``country``::
CREATE TABLE iceberg.web.page_views (
view_time timestamp,
user_id bigint,
page_url varchar,
ds date,
country varchar
)
WITH (
format = 'ORC',
partitioning = ARRAY['ds', 'country']
)
Create an Iceberg table with Iceberg format version 2 and with commit_retries set to 5::
CREATE TABLE iceberg.web.page_views_v2 (
view_time timestamp,
user_id bigint,
page_url varchar,
ds date,
country varchar
)
WITH (
format = 'ORC',
partitioning = ARRAY['ds', 'country'],
format_version = '2',
commit_retries = 5
)
Partition Column Transform
~~~~~~~~~~~~~~~~~~~~~~~~~~
Beyond selecting some particular columns for partitioning, you can use the ``transform`` functions and partition the table
by the transformed value of the column.
Available transforms in the Presto Iceberg connector include:
* ``Bucket`` (partitions data into a specified number of buckets using a hash function)
* ``Truncate`` (partitions the table based on the truncated value of the field and can specify the width of the truncated value)
* ``Identity`` (partitions data using unmodified source value)
* ``Year`` (partitions data using integer value by extracting a date or timestamp year, as years from 1970)
* ``Month`` (partitions data using integer value by extracting a date or timestamp month, as months from 1970-01-01)
* ``Day`` (partitions data using integer value by extracting a date or timestamp day, as days from 1970-01-01)
* ``Hour`` (partitions data using integer value by extracting a timestamp hour, as hours from 1970-01-01 00:00:00)
Create an Iceberg table partitioned into 8 buckets of equal size ranges::
CREATE TABLE players (
id int,
name varchar,
team varchar
)
WITH (
format = 'ORC',
partitioning = ARRAY['bucket(team, 8)']
);
Create an Iceberg table partitioned by the first letter of the ``team`` field::
CREATE TABLE players (
id int,
name varchar,
team varchar
)
WITH (
format = 'ORC',
partitioning = ARRAY['truncate(team, 1)']
);
Create an Iceberg table partitioned by ``ds``::
CREATE TABLE players (
id int,
name varchar,
team varchar,
ds date
)
WITH (
format = 'ORC',
partitioning = ARRAY['year(ds)']
);
Create an Iceberg table partitioned by ``ts``::
CREATE TABLE players (
id int,
name varchar,
team varchar,
ts timestamp
)
WITH (
format = 'ORC',
partitioning = ARRAY['hour(ts)']
);
Types for partition transforms
The list of supported types in Presto for each partition transform is as follows:
===================== =======================================================================
Transform Name Source Types
===================== =======================================================================
Identity boolean, int, bigint, real, double, decimal,
varchar, varbinary, date, time, timestamp
Bucket int, bigint, decimal, varchar, varbinary, date,
time
Truncate int, bigint, decimal, varchar, varbinary
Year date, timestamp
Month date, timestamp
Day date, timestamp
Hour timestamp
===================== =======================================================================
Presto C++ Support
Reads from tables with partition column transforms is supported in Presto C++.
CREATE VIEW
^^^^^^^^^^^
The Iceberg connector supports creating views in Hive, Glue, REST, and Nessie catalogs.
To create a view named ``view_page_views`` for the ``iceberg.web.page_views`` table created in the `CREATE TABLE`_ example::
CREATE VIEW iceberg.web.view_page_views AS SELECT user_id, country FROM iceberg.web.page_views;
.. note::
The Iceberg REST catalog may not support view creation depending on the
type of the backing catalog.
INSERT INTO
^^^^^^^^^^^
Insert data into the ``page_views`` table::
INSERT INTO iceberg.web.page_views VALUES(TIMESTAMP '2023-08-12 03:04:05.321', 1, 'https://example.com', current_date, 'country');
CREATE TABLE AS SELECT
^^^^^^^^^^^^^^^^^^^^^^
Create a new table ``page_views_new`` from an existing table ``page_views``::
CREATE TABLE iceberg.web.page_views_new AS SELECT * FROM iceberg.web.page_views
SELECT
^^^^^^
SELECT table operations are supported for Iceberg format version 1 and version 2 in the connector::
SELECT * FROM iceberg.web.page_views;
SELECT * FROM iceberg.web.page_views_v2;
Table with delete files
Iceberg V2 tables support row-level deletion. For more information see
Row-level deletes <https://iceberg.apache.org/spec/#row-level-deletes>_ in the Iceberg Table Spec.
Presto supports reading delete files, including Position Delete Files and Equality Delete Files.
When reading, Presto merges these delete files to read the latest results.
ALTER TABLE ^^^^^^^^^^^
Alter table operations are supported in the Iceberg connector::
ALTER TABLE iceberg.web.page_views ADD COLUMN zipcode VARCHAR;
ALTER TABLE iceberg.web.page_views RENAME COLUMN zipcode TO location;
ALTER TABLE iceberg.web.page_views DROP COLUMN location;
ALTER TABLE iceberg.web.page_views DROP BRANCH 'branch1';
ALTER TABLE iceberg.web.page_views DROP TAG 'tag1';
ALTER TABLE iceberg.default.mytable CREATE BRANCH 'audit-branch';
ALTER TABLE iceberg.default.mytable CREATE BRANCH IF NOT EXISTS 'audit-branch';
ALTER TABLE iceberg.default.mytable CREATE OR REPLACE BRANCH 'audit-branch';
ALTER TABLE iceberg.default.mytable CREATE BRANCH 'audit-branch-system' FOR SYSTEM_VERSION AS OF 4176642711908913940;
ALTER TABLE iceberg.default.mytable CREATE BRANCH IF NOT EXISTS 'audit-branch-system' FOR SYSTEM_VERSION AS OF 4176642711908913940;
ALTER TABLE iceberg.default.mytable CREATE BRANCH 'audit-branch-retain' FOR SYSTEM_VERSION AS OF 4176642711908913940 RETAIN 7 DAYS;
ALTER TABLE iceberg.default.mytable CREATE BRANCH 'audit-branch-snap-retain' FOR SYSTEM_VERSION AS OF 4176642711908913940 RETAIN 7 DAYS WITH SNAPSHOT RETENTION 2 SNAPSHOTS 2 DAYS;
ALTER TABLE iceberg.default.mytable CREATE OR REPLACE BRANCH 'audit-branch-time' FOR SYSTEM_TIME AS OF TIMESTAMP '2026-01-02 17:30:35.247 Asia/Kolkata';
ALTER TABLE iceberg.default.mytable CREATE TAG 'audit-tag';
ALTER TABLE iceberg.default.mytable CREATE TAG IF NOT EXISTS 'audit-tag';
ALTER TABLE iceberg.default.mytable CREATE OR REPLACE TAG 'audit-tag';
ALTER TABLE iceberg.default.mytable CREATE TAG 'audit-tag-system' FOR SYSTEM_VERSION AS OF 4176642711908913940;
ALTER TABLE iceberg.default.mytable CREATE TAG IF NOT EXISTS 'audit-tag-system' FOR SYSTEM_VERSION AS OF 4176642711908913940;
ALTER TABLE iceberg.default.mytable CREATE TAG 'audit-tag-retain' FOR SYSTEM_VERSION AS OF 4176642711908913940 RETAIN 7 DAYS;
ALTER TABLE iceberg.default.mytable CREATE TAG 'audit-tag-snap-retain' FOR SYSTEM_VERSION AS OF 4176642711908913940 RETAIN 7 DAYS WITH SNAPSHOT RETENTION 2 SNAPSHOTS 2 DAYS;
ALTER TABLE iceberg.default.mytable CREATE OR REPLACE TAG 'audit-tag-time' FOR SYSTEM_TIME AS OF TIMESTAMP '2026-01-02 17:30:35.247 Asia/Kolkata';
Presto C++ Support
Creating and dropping tags and branches with ALTER TABLE statements is fully supported in Presto C++.
To add a new column as a partition column, identify the transform functions for the column. The table is partitioned by the transformed value of the column::
ALTER TABLE iceberg.web.page_views ADD COLUMN zipcode VARCHAR WITH (partitioning = 'identity');
ALTER TABLE iceberg.web.page_views ADD COLUMN location VARCHAR WITH (partitioning = 'truncate(2)');
ALTER TABLE iceberg.web.page_views ADD COLUMN location VARCHAR WITH (partitioning = 'bucket(8)');
ALTER TABLE iceberg.web.page_views ADD COLUMN dt date WITH (partitioning = 'year');
ALTER TABLE iceberg.web.page_views ADD COLUMN ts timestamp WITH (partitioning = 'month');
ALTER TABLE iceberg.web.page_views ADD COLUMN dt date WITH (partitioning = 'day');
ALTER TABLE iceberg.web.page_views ADD COLUMN ts timestamp WITH (partitioning = 'hour');
Use ARRAY[...] instead of a string to specify multiple partition transforms when adding a column. For example::
ALTER TABLE iceberg.web.page_views ADD COLUMN location VARCHAR WITH (partitioning = ARRAY['truncate(2)', 'bucket(8)', 'identity']);
ALTER TABLE iceberg.web.page_views ADD COLUMN dt date WITH (partitioning = ARRAY['year', 'bucket(16)', 'identity']);
Some Iceberg table properties can be modified using an ALTER TABLE SET PROPERTIES statement. The modifiable table properties are:
commit.retry.num-retriesread.split.target-sizewrite.metadata.delete-after-commit.enabledengine.hive.lock-enabledwrite.metadata.previous-versions-maxFor example, to set commit.retry.num-retries to 6 for the table iceberg.web.page_views_v2, use::
ALTER TABLE iceberg.web.page_views_v2 SET PROPERTIES ("commit.retry.num-retries" = 6);
To set write.metadata.delete-after-commit.enabled to true and set write.metadata.previous-versions-max to 5, use::
ALTER TABLE iceberg.web.page_views_v2 SET PROPERTIES ("write.metadata.delete-after-commit.enabled" = true, "write.metadata.previous-versions-max" = 5);
ADD COLUMN with DEFAULT (Iceberg V3)
.. note::
``ADD COLUMN DEFAULT`` read support is available in both **Presto Java** and **Presto C++** (Prestissimo).
Both engines execute the DDL, store the default in Iceberg metadata, and inject the ``initial-default``
value during reads for historical rows. Write-time handling of ``write-default`` is not supported
in either engine.
Iceberg Format Version 3 supports default column values for schema evolution. When a column is
added with a ``DEFAULT`` clause, Presto sets both the ``initial-default`` and ``write-default``
fields in the Iceberg schema metadata. This is a metadata-only change — no data files are rewritten.
During reads, rows written before the column was added return the ``initial-default`` value instead
of ``NULL``. This enables correct interoperability with Iceberg V3 tables created or evolved by
other engines, such as Spark.
Example — Add a ``country`` column with a default value of ``'IN'``::
ALTER TABLE iceberg.web.orders ADD COLUMN country VARCHAR DEFAULT 'IN';
After this statement, a ``SELECT`` on the table returns ``'IN'`` for the ``country`` column in all
rows that were written before the column was added::
SELECT order_id, country FROM iceberg.web.orders;
order_id | country
----------+---------
1 | IN
2 | IN
3 | IN
ALTER COLUMN SET DEFAULT (Iceberg V3)
.. note::
``ALTER COLUMN SET DEFAULT`` currently only updates the Iceberg metadata (``write-default`` field).
**INSERT support for write-default is not implemented** — inserts will not automatically use
the write-default value.
Iceberg Format Version 3 allows updating the write-default value for an existing column without
modifying the initial-default. This is useful for schema evolution and maintaining compatibility
with other Iceberg engines.
The ALTER COLUMN SET DEFAULT statement updates the write-default field in the Iceberg
schema metadata. This is a metadata-only operation — no data files are rewritten.
Behavior:
initial-default and write-default are set to the same valueinitial-default, only write-default is updated (preserving initial-default)Example — Update the write-default for the country column::
ALTER TABLE iceberg.web.orders ALTER COLUMN country SET DEFAULT 'US';
After this statement, the Iceberg metadata is updated, but INSERT operations do not yet use the write-default value automatically. This functionality will be added in a future release.
This feature requires Iceberg Format Version 3. Attempting to use ALTER COLUMN SET DEFAULT on
a table with format version 2 or lower will result in an error.
ALTER VIEW ^^^^^^^^^^
Alter view operations to alter the name of an existing view to a new name is supported in the Iceberg connector.
.. code-block:: sql
ALTER VIEW iceberg.web.page_views RENAME TO iceberg.web.page_new_views;
TRUNCATE ^^^^^^^^
The Iceberg connector can delete all of the data from tables without
dropping the table from the metadata catalog using TRUNCATE TABLE.
.. code-block:: sql
TRUNCATE TABLE nation;
.. code-block:: text
TRUNCATE TABLE;
.. code-block:: sql
SELECT * FROM nation;
.. code-block:: text
nationkey | name | regionkey | comment
-----------+------+-----------+---------
(0 rows)
DELETE ^^^^^^
The Iceberg connector can delete data from tables by using DELETE FROM. For example, to delete from the table lineitem::
DELETE FROM lineitem;
DELETE FROM lineitem WHERE linenumber = 1;
DELETE FROM lineitem WHERE linenumber not in (1, 3, 5, 7) and linestatus in ('O', 'F');
.. note::
Filtered columns only support comparison operators, such as EQUALS, LESS THAN, or LESS THAN EQUALS.
Deletes must only occur on the latest snapshot.
For V1 tables, the Iceberg connector can only delete data in one or more entire
partitions. Columns in the filter must all be identity transformed partition
columns of the target table.
DROP TABLE ^^^^^^^^^^
Drop the table page_views ::
DROP TABLE iceberg.web.page_views
DROP VIEW ^^^^^^^^^
Drop the view view_page_views::
DROP VIEW iceberg.web.view_page_views;
DROP SCHEMA ^^^^^^^^^^^
Drop the schema iceberg.web::
DROP SCHEMA iceberg.web
SHOW CREATE TABLE ^^^^^^^^^^^^^^^^^
Show the SQL statement that creates the specified Iceberg table by using SHOW CREATE TABLE.
For example, SHOW CREATE TABLE from the partitioned Iceberg table customer:
.. code-block:: sql
SHOW CREATE TABLE customer;
.. code-block:: text
CREATE TABLE iceberg.tpch_iceberg.customer (
"custkey" bigint,
"name" varchar,
"address" varchar,
"nationkey" bigint,
"phone" varchar,
"acctbal" double,
"mktsegment" varchar,
"comment" varchar
)
WITH (
delete_mode = 'copy-on-write',
format = 'PARQUET',
format_version = '2',
location = 's3a://tpch-iceberg/customer',
partitioning = ARRAY['mktsegment']
)
(1 row)
SHOW CREATE TABLE from the un-partitioned Iceberg table region:
.. code-block:: sql
SHOW CREATE TABLE region;
.. code-block:: text
CREATE TABLE iceberg.tpch_iceberg.region (
"regionkey" bigint,
"name" varchar,
"comment" varchar
)
WITH (
delete_mode = 'copy-on-write',
format = 'PARQUET',
format_version = '2',
location = 's3a://tpch-iceberg/region'
)
(1 row)
SHOW COLUMNS ^^^^^^^^^^^^
List the columns in table along with their data type and other attributes by using SHOW COLUMNS.
For example, SHOW COLUMNS from the partitioned Iceberg table customer:
.. code-block:: sql
SHOW COLUMNS FROM customer;
.. code-block:: text
Column | Type | Extra | Comment
------------+---------+---------------+---------
custkey | bigint | |
name | varchar | |
address | varchar | |
nationkey | bigint | |
phone | varchar | |
acctbal | double | |
mktsegment | varchar | partition key |
comment | varchar | |
(8 rows)
SHOW COLUMNS from the un-partitioned Iceberg table region:
.. code-block:: sql
SHOW COLUMNS FROM region;
.. code-block:: text
Column | Type | Extra | Comment
-----------+---------+-------+---------
regionkey | bigint | |
name | varchar | |
comment | varchar | |
(3 rows)
DESCRIBE ^^^^^^^^
List the columns in table along with their data type and other attributes by using DESCRIBE.
DESCRIBE is an alias for SHOW COLUMNS.
For example, DESCRIBE from the partitioned Iceberg table customer:
.. code-block:: sql
DESCRIBE customer;
.. code-block:: text
Column | Type | Extra | Comment
------------+---------+---------------+---------
custkey | bigint | |
name | varchar | |
address | varchar | |
nationkey | bigint | |
phone | varchar | |
acctbal | double | |
mktsegment | varchar | partition key |
comment | varchar | |
(8 rows)
DESCRIBE from the un-partitioned Iceberg table region:
.. code-block:: sql
DESCRIBE region;
.. code-block:: text
Column | Type | Extra | Comment
-----------+---------+-------+---------
regionkey | bigint | |
name | varchar | |
comment | varchar | |
(3 rows)
UPDATE and MERGE ^^^^^^^^^^^^^^^^
The Iceberg connector supports :doc:../sql/update and :doc:../sql/merge operations on Iceberg
tables. Only some tables support them. These tables must be at minimum format
version 2, and the write.update.mode must be set to merge-on-read.
.. code-block:: sql
UPDATE region SET name = 'EU', comment = 'Europe' WHERE regionkey = 1;
.. code-block:: text
UPDATE: 1 row
Query 20250204_010341_00021_ymwi5, FINISHED, 2 nodes
The query returns an error if the table does not meet the requirements for updates.
.. code-block:: text
Query 20250204_010445_00022_ymwi5 failed: Iceberg table updates require at least format version 2 and update mode must be merge-on-read
Iceberg tables do not support running multiple :doc:../sql/merge statements on the same table in parallel. If two or more MERGE operations are executed concurrently on the same Iceberg table:
.. code-block:: text
Failed to commit Iceberg update to table: <table name>
Found conflicting files that can contain records matching true
Transaction support ^^^^^^^^^^^^^^^^^^^
The Iceberg connector supports explicit multi-statement transactions with writes
to a single Iceberg table. To run transaction statements, use
:doc:/sql/start-transaction with :doc:/sql/commit or :doc:/sql/rollback.
The Iceberg connector provides snapshot isolation at REPEATABLE READ level.
This also satisfies READ COMMITTED and READ UNCOMMITTED, so these
isolation levels are supported as well. For snapshot semantics, use
REPEATABLE READ.
Within a transaction, reads can access multiple tables, while write operations are restricted to a single Iceberg table. All operations execute under snapshot isolation. The transaction therefore behaves as a multi-table read, single-table write transaction::
START TRANSACTION ISOLATION LEVEL REPEATABLE READ;
INSERT INTO iceberg.default.test_table
SELECT id, status
FROM iceberg.source.source_table1
WHERE status = 'pending';
INSERT INTO iceberg.default.test_table
SELECT * FROM iceberg.source.source_table2;
INSERT INTO iceberg.default.test_table (id, status) VALUES (1, 'pending');
UPDATE iceberg.default.test_table
SET status = 'committed'
WHERE id < 100 and status = 'pending';
COMMIT;
Statements executed within the same transaction follow read-your-writes semantics. This behavior is important for standard SQL interactive transactions. Data modifications performed earlier in the transaction are visible to subsequent statements before the transaction is committed::
START TRANSACTION;
INSERT INTO iceberg.default.test_table (id, status) VALUES (1, 'pending'), (2, 'pending');
UPDATE iceberg.default.test_table SET status = 'committed' WHERE id = 1;
SELECT * FROM iceberg.default.test_table; -- (1, 'committed'), (2, 'pending')
DELETE FROM iceberg.default.test_table WHERE status = 'pending';
SELECT * FROM iceberg.default.test_table; -- (1, 'committed')
COMMIT;
Limitations:
Not allowed to open write transactions on multiple tables.SERIALIZABLE isolation is not supported by the Iceberg connector.MERGE INTO, CREATE/DROP/RENAME TABLE,
CREATE/DROP/RENAME SCHEMA, CREATE/DROP/RENAME VIEW,
CREATE/DROP/REFRESH MATERIALIZED VIEW, TRUNCATE TABLE, and
ANALYZE.CALL statements are only supported in autocommit mode.Table metadata refresh is required)... _iceberg_analyze:
The Iceberg connector supports collection of table and column statistics
with the :doc:/sql/analyze statement::
ANALYZE iceberg.tpch.orders;
Iceberg and Presto Iceberg connector support in-place table evolution, also known as schema evolution, such as adding, dropping, and renaming columns. With schema evolution, users can evolve a table schema with SQL after enabling the Presto Iceberg connector.
Presto C++ Support ^^^^^^^^^^^^^^^^^^
Schema evolution is supported in Presto C++.
Presto now supports Parquet writer versions V1 and V2 for the Iceberg catalog.
It can be toggled using the session property parquet_writer_version and the config property hive.parquet.writer.version.
Valid values for these properties are PARQUET_1_0 and PARQUET_2_0. Default is PARQUET_1_0.
Presto C++ Support ^^^^^^^^^^^^^^^^^^
Reading Parquet data written with Parquet writer version V1 is supported in Presto C++.
Example Queries ^^^^^^^^^^^^^^^
Let's create an Iceberg table named ctas_nation, created from the TPCH nation
table. The table has four columns: nationkey, name, regionkey, and comment.
.. code-block:: sql
USE iceberg.tpch;
CREATE TABLE IF NOT EXISTS ctas_nation AS (SELECT * FROM nation);
DESCRIBE ctas_nation;
.. code-block:: text
Column | Type | Extra | Comment
-----------+---------+-------+---------
nationkey | bigint | |
name | varchar | |
regionkey | bigint | |
comment | varchar | |
(4 rows)
We can simply add a new column to the Iceberg table by using ALTER TABLE
statement. The following query adds a new column named zipcode to the table.
.. code-block:: sql
ALTER TABLE ctas_nation ADD COLUMN zipcode VARCHAR;
DESCRIBE ctas_nation;
.. code-block:: text
Column | Type | Extra | Comment
-----------+---------+-------+---------
nationkey | bigint | |
name | varchar | |
regionkey | bigint | |
comment | varchar | |
zipcode | varchar | |
(5 rows)
We can also rename the new column to another name, address:
.. code-block:: sql
ALTER TABLE ctas_nation RENAME COLUMN zipcode TO address;
DESCRIBE ctas_nation;
.. code-block:: text
Column | Type | Extra | Comment
-----------+---------+-------+---------
nationkey | bigint | |
name | varchar | |
regionkey | bigint | |
comment | varchar | |
address | varchar | |
(5 rows)
Finally, we can delete the new column. The table columns will be restored to the original state.
.. code-block:: sql
ALTER TABLE ctas_nation DROP COLUMN address;
DESCRIBE ctas_nation;
.. code-block:: text
Column | Type | Extra | Comment
-----------+---------+-------+---------
nationkey | bigint | |
name | varchar | |
regionkey | bigint | |
comment | varchar | |
(4 rows)
Iceberg and Presto Iceberg connector support time travel via table snapshots
identified by unique snapshot IDs. The snapshot IDs are stored in the $snapshots
metadata table. You can rollback the state of a table to a previous snapshot ID.
It also supports time travel query using SYSTEM_VERSION (VERSION) and SYSTEM_TIME (TIMESTAMP) options.
Example Time Travel Queries ^^^^^^^^^^^^^^^^^^^^^^^^^^^
Similar to the example queries in SCHEMA EVOLUTION_, create an Iceberg
table named ctas_nation from the TPCH nation table:
.. code-block:: sql
USE iceberg.tpch;
CREATE TABLE IF NOT EXISTS ctas_nation AS (SELECT * FROM nation);
DESCRIBE ctas_nation;
.. code-block:: text
Column | Type | Extra | Comment
-----------+---------+-------+---------
nationkey | bigint | |
name | varchar | |
regionkey | bigint | |
comment | varchar | |
(4 rows)
We can find snapshot IDs for the Iceberg table from the $snapshots metadata table.
.. code-block:: sql
SELECT snapshot_id FROM iceberg.tpch."ctas_nation$snapshots" ORDER BY committed_at;
.. code-block:: text
snapshot_id
---------------------
5837462824399906536
(1 row)
For now, as we've just created the table, there's only one snapshot ID. Let's insert one row into the table and see the change in the snapshot IDs.
.. code-block:: sql
INSERT INTO ctas_nation VALUES(25, 'new country', 1, 'comment');
SELECT snapshot_id FROM iceberg.tpch."ctas_nation$snapshots" ORDER BY committed_at;
.. code-block:: text
snapshot_id
---------------------
5837462824399906536
5140039250977437531
(2 rows)
Now there's a new snapshot (5140039250977437531) created as a new row is
inserted into the table. The new row can be verified by running
.. code-block:: sql
SELECT * FROM ctas_nation WHERE name = 'new country';
.. code-block:: text
nationkey | name | regionkey | comment
-----------+-------------+-----------+---------
25 | new country | 1 | comment
(1 row)
With the time travel feature, we can rollback to the previous state without the
new row by calling iceberg.system.rollback_to_snapshot:
.. code-block:: sql
CALL iceberg.system.rollback_to_snapshot('tpch', 'ctas_nation', 5837462824399906536);
Now if we check the table again, we'll find that the newly inserted row no longer exists as we've rolled back to the previous state.
.. code-block:: sql
SELECT * FROM ctas_nation WHERE name = 'new country';
.. code-block:: text
nationkey | name | regionkey | comment
-----------+------+-----------+---------
(0 rows)
Time Travel using VERSION (SYSTEM_VERSION) and TIMESTAMP (SYSTEM_TIME) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Use the Iceberg connector to access the historical data of a table. You can see how the table looked like at a certain point in time, even if the data has changed or been deleted since then.
.. code-block:: sql
// snapshot ID 5300424205832769799
INSERT INTO ctas_nation VALUES(10, 'united states', 1, 'comment');
// snapshot ID 6891257133877048303
INSERT INTO ctas_nation VALUES(20, 'canada', 2, 'comment');
// snapshot ID 705548372863208787
INSERT INTO ctas_nation VALUES(30, 'mexico', 3, 'comment');
// snapshot ID for first record
SELECT * FROM ctas_nation FOR VERSION AS OF 5300424205832769799;
// snapshot ID for first record using SYSTEM_VERSION
SELECT * FROM ctas_nation FOR SYSTEM_VERSION AS OF 5300424205832769799;
.. code-block:: text
nationkey | name | regionkey | comment
-----------+---------------+-----------+---------
10 | united states | 1 | comment
(1 row)
.. code-block:: sql
// snapshot ID for second record using BEFORE clause to retrieve previous state
SELECT * FROM ctas_nation FOR SYSTEM_VERSION BEFORE 6891257133877048303;
.. code-block:: text
nationkey | name | regionkey | comment
-----------+---------------+-----------+---------
10 | united states | 1 | comment
(1 row)
In above example, SYSTEM_VERSION can be used as an alias for VERSION.
You can access the historical data of a table using FOR TIMESTAMP AS OF TIMESTAMP. The query returns the table’s state using the table snapshot that is closest to the specified timestamp. In this example, SYSTEM_TIME can be used as an alias for TIMESTAMP.
.. code-block:: sql
// In following query, timestamp string is matching with second inserted record.
SELECT * FROM ctas_nation FOR TIMESTAMP AS OF TIMESTAMP '2023-10-17 13:29:46.822 America/Los_Angeles';
SELECT * FROM ctas_nation FOR TIMESTAMP AS OF TIMESTAMP '2023-10-17 13:29:46.822';
// Same example using SYSTEM_TIME as an alias for TIMESTAMP
SELECT * FROM ctas_nation FOR SYSTEM_TIME AS OF TIMESTAMP '2023-10-17 13:29:46.822 America/Los_Angeles';
SELECT * FROM ctas_nation FOR SYSTEM_TIME AS OF TIMESTAMP '2023-10-17 13:29:46.822';
.. code-block:: text
nationkey | name | regionkey | comment
-----------+---------------+-----------+---------
10 | united states | 1 | comment
20 | canada | 2 | comment
(2 rows)
.. note::
Timestamp without timezone will be parsed and rendered in the session time zone. See `TIMESTAMP <https://prestodb.io/docs/current/language/types.html#timestamp>`_.
The option following FOR TIMESTAMP AS OF can accept any expression that returns a timestamp or timestamp with time zone value.
For example, TIMESTAMP '2023-10-17 13:29:46.822 America/Los_Angeles' and TIMESTAMP '2023-10-17 13:29:46.822' are both valid timestamps. The first specifies the timestamp within the timezone America/Los_Angeles. The second will use the timestamp based on the user's session timezone.
In the following query, the expression CURRENT_TIMESTAMP returns the current timestamp with time zone value.
.. code-block:: sql
SELECT * FROM ctas_nation FOR TIMESTAMP AS OF CURRENT_TIMESTAMP;
.. code-block:: text
nationkey | name | regionkey | comment
-----------+---------------+-----------+---------
10 | united states | 1 | comment
20 | canada | 2 | comment
30 | mexico | 3 | comment
(3 rows)
.. code-block:: sql
// In following query, timestamp string is matching with second inserted record.
// BEFORE clause returns first record which is less than timestamp of the second record.
SELECT * FROM ctas_nation FOR TIMESTAMP BEFORE TIMESTAMP '2023-10-17 13:29:46.822 America/Los_Angeles';
SELECT * FROM ctas_nation FOR TIMESTAMP BEFORE TIMESTAMP '2023-10-17 13:29:46.822';
.. code-block:: text
nationkey | name | regionkey | comment
-----------+---------------+-----------+---------
10 | united states | 1 | comment
(1 row)
Querying branches and tags ^^^^^^^^^^^^^^^^^^^^^^^^^^^
Iceberg supports branches and tags which are named references to snapshots.
Query Iceberg table by specifying the branch name using FOR SYSTEM_VERSION AS OF:
.. code-block:: sql
SELECT * FROM nation FOR SYSTEM_VERSION AS OF 'testBranch';
.. code-block:: text
nationkey | name | regionkey | comment
-----------+---------------+-----------+---------
10 | united states | 1 | comment
20 | canada | 2 | comment
30 | mexico | 3 | comment
(3 rows)
Alternatively, you can query a branch using the dot notation syntax with quoted identifiers:
.. code-block:: sql
SELECT * FROM "nation.branch_testBranch";
.. code-block:: text
nationkey | name | regionkey | comment
-----------+---------------+-----------+---------
10 | united states | 1 | comment
20 | canada | 2 | comment
30 | mexico | 3 | comment
(3 rows)
Query Iceberg table by specifying the tag name:
.. code-block:: sql
SELECT * FROM nation FOR SYSTEM_VERSION AS OF 'testTag';
.. code-block:: text
nationkey | name | regionkey | comment
-----------+---------------+-----------+---------
10 | united states | 1 | comment
20 | canada | 2 | comment
(3 rows)
Note: The dot notation syntax "<table>.branch_<branch_name>" requires double quotes to prevent the SQL parser from interpreting the dot as a schema.table separator. This syntax works for both querying (SELECT) and mutating (INSERT, UPDATE, DELETE, MERGE) branch data.
Presto C++ Support
Querying tags and branches is fully supported in Presto C++.
Mutating Iceberg Branches ^^^^^^^^^^^^^^^^^^^^^^^^^
Iceberg supports performing INSERT, UPDATE, DELETE, and MERGE operations directly on branches, allowing you to make changes to a branch without affecting the main table or other branches.
To perform mutations on a branch, use the quoted identifier syntax "<table>.branch_<branch_name>" (for example, "orders.branch_audit_branch").
The quotes are required to prevent the SQL parser from interpreting the dot as a schema.table separator.
Insert into a branch:
.. code-block:: sql
-- Create a branch first
ALTER TABLE orders CREATE BRANCH 'audit_branch';
-- Insert data into the branch
INSERT INTO "orders.branch_audit_branch" VALUES (1, 'Product A', 100.00);
INSERT INTO "orders.branch_audit_branch" VALUES (2, 'Product B', 200.00);
Update data in a branch:
.. code-block:: sql
-- Update specific rows in the branch
UPDATE "orders.branch_audit_branch" SET price = 120.00 WHERE id = 1;
-- Update with complex expressions
UPDATE "orders.branch_audit_branch"
SET price = price * 1.1
WHERE category = 'electronics';
Delete from a branch:
.. code-block:: sql
-- Delete specific rows from the branch
DELETE FROM "orders.branch_audit_branch" WHERE id = 2;
-- Delete with complex predicates
DELETE FROM "orders.branch_audit_branch"
WHERE created_date < DATE '2024-01-01';
Merge into a branch:
.. code-block:: sql
-- Merge data from source table into branch
MERGE INTO "orders.branch_audit_branch" t
USING source_table s
ON t.id = s.id
WHEN MATCHED THEN UPDATE SET price = s.price
WHEN NOT MATCHED THEN INSERT (id, product, price) VALUES (s.id, s.product, s.price);
Verify branch isolation:
After performing mutations on a branch, you can verify that the main table remains unchanged:
.. code-block:: sql
-- Query the branch to see changes
SELECT * FROM orders FOR SYSTEM_VERSION AS OF 'audit_branch';
-- Query the main table (unchanged)
SELECT * FROM orders;
Supported operations:
The following DML operations are supported with branch-specific table names:
INSERT - Add new rows to a branchUPDATE - Modify existing rows in a branchDELETE - Remove rows from a branch (including metadata delete optimization)MERGE - Conditionally insert, update, or delete rows in a branchTRUNCATE TABLE - Remove all rows from a branchSELECT - Query branch data using FOR SYSTEM_VERSION AS OF 'branch_name'Unsupported operations:
The following operations are not supported with branch-specific table names and will result in an error:
ALTER TABLE DDL operations (ADD COLUMN, DROP COLUMN, RENAME COLUMN, SET PROPERTIES) - Schema changes must be applied to the main tableCREATE VIEW / CREATE MATERIALIZED VIEW - Views cannot be created from branch-specific tablesImportant notes:
ALTER TABLE ... CREATE BRANCH)merge-on-readPresto C++ Support
Branch mutations are partially supported in Presto C++.
INSERT, TRUNCATE TABLEPresto C++ Support ^^^^^^^^^^^^^^^^^^
Time travel queries are supported in Presto C++.
PrestoDB and Iceberg have data types not supported by the other. When using Iceberg to read or write data, Presto changes each Iceberg data type to the corresponding Presto data type, and from each Presto data type to the comparable Iceberg data type. The following tables detail the specific type maps between PrestoDB and Iceberg.
Iceberg to PrestoDB type mapping ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Map of Iceberg types to the relevant PrestoDB types:
.. list-table:: Iceberg to PrestoDB type mapping :widths: 50, 50 :header-rows: 1
BOOLEANBOOLEANINTEGERINTEGERLONGBIGINTFLOATREALDOUBLEDOUBLEDECIMALDECIMALSTRINGVARCHARBINARY, FIXEDVARBINARYDATEDATETIMETIMETIMESTAMPTIMESTAMPTIMESTAMP TZTIMESTAMP WITH TIME ZONEUUIDUUIDLISTARRAYMAPMAPSTRUCTROWNo other types are supported.
PrestoDB to Iceberg type mapping ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Map of PrestoDB types to the relevant Iceberg types:
.. list-table:: PrestoDB to Iceberg type mapping :widths: 50, 50 :header-rows: 1
BOOLEANBOOLEANINTEGERINTEGERTINYINTINTEGERSMALLINTINTEGERBIGINTLONGREALFLOATDOUBLEDOUBLEDECIMALDECIMALVARCHARSTRINGVARBINARYBINARYDATEDATETIMETIMETIMESTAMPTIMESTAMPTIMESTAMP WITH TIME ZONETIMESTAMP TZUUIDUUIDARRAYLISTMAPMAPROWSTRUCTNo other types are supported.
The Iceberg connector supports the creation of sorted tables. Data in the Iceberg table is sorted as each file is written.
Sorted Iceberg tables can decrease query execution time in many cases; but query times can also depend on the query shape and cluster configuration. Sorting is particularly beneficial when the sorted columns have a high cardinality and are used as a filter for selective reads.
Configure sort order with the sorted_by table property to specify an array of
one or more columns to use for sorting.
The following example creates the table with the sorted_by property, and sorts the file based
on the field join_date. The default sort direction is ASC, with null values ordered as NULLS FIRST.
.. code-block:: text
CREATE TABLE emp.employees.employee (
emp_id BIGINT,
emp_name VARCHAR,
join_date DATE,
country VARCHAR)
WITH (
sorted_by = ARRAY['join_date']
)
Explicitly configure sort directions or null ordering using the following example::
CREATE TABLE emp.employees.employee (
emp_id BIGINT,
emp_name VARCHAR,
join_date DATE,
country VARCHAR)
WITH (
sorted_by = ARRAY['join_date DESC NULLS FIRST', 'emp_id ASC NULLS LAST']
)
Sorting can be combined with partitioning on the same column. For example::
CREATE TABLE emp.employees.employee (
emp_id BIGINT,
emp_name VARCHAR,
join_date DATE,
country VARCHAR)
WITH (
partitioning = ARRAY['month(join_date)'],
sorted_by = ARRAY['join_date']
)
Sort order does not support transforms. The following transforms are not supported:
.. code-block:: text
bucket(n, column)
truncate(column, n)
year(column)
month(column)
day(column)
hour(column)
For example::
CREATE TABLE emp.employees.employee (
emp_id BIGINT,
emp_name VARCHAR,
join_date DATE,
country VARCHAR)
WITH (
sorted_by = ARRAY['month(join_date)']
)
If a user creates a table externally with non-identity sort columns and then inserts data, the following warning message will be shown.
Iceberg table sort order has sort fields of <X>, <Y>, ... which are not currently supported by Presto
The Iceberg connector supports materialized views. See :doc:/admin/materialized-views for general information and :doc:/sql/create-materialized-view for SQL syntax.
Storage ^^^^^^^
Materialized views use a dedicated Iceberg storage table to persist the pre-computed results. By default, the storage table is created with the prefix __mv_storage__ followed by the materialized view name in the same schema as the view.
Catalog Configuration ^^^^^^^^^^^^^^^^^^^^^
The following catalog properties affect materialized view storage tables and
refresh behavior. Storage naming and location properties apply at materialized
view creation time and can be overridden per-view by using the storage_schema and
storage_table table properties.
.. list-table:: :header-rows: 1 :widths: 35 45 20
.. _mv-cfg-storage-prefix:
iceberg.materialized-view-storage-prefix
Prefix applied to auto-generated storage table names.
__mv_storage__
.. _mv-cfg-default-storage-schema:
iceberg.materialized-view-default-storage-schema
Schema in which storage tables are created when the per-view storage_schema
property is not set. Point at a locked-down schema to keep storage tables out of
users' reach without affecting materialized view reads.
(the view's own schema)
.. _mv-cfg-max-changed-partitions:
iceberg.materialized-view-max-changed-partitions
Maximum number of changed partitions to track for materialized view staleness detection. If the number of changed partitions exceeds this threshold, the materialized view falls back to a full recompute.
100
.. _mv-cfg-default-max-snapshots-per-refresh:
iceberg.materialized-view-default-max-snapshots-per-refresh
Default upper bound on snapshots consumed per base table per
REFRESH MATERIALIZED VIEW when the view does not override it with the
max_snapshots_per_refresh table property. Requires Iceberg V3 row lineage;
V2 tables fall back to unbounded refresh.
0 (unbounded)
.. _iceberg-mv-table-properties:
.. _iceberg-alter-materialized-view:
Materialized View Properties ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The following table properties can be specified when creating a materialized view.
Properties marked Yes in the Alterable column can be changed after creation
by using :doc:/sql/alter-materialized-view; properties not specified in the
SET PROPERTIES clause keep their current value, and setting these properties to
NULL is not supported.
.. list-table:: :widths: 25 60 15 :header-rows: 1
storage_schemastorage_tablestale_read_behaviorFAIL (throw an error), USE_VIEW_QUERY (query base tables instead).staleness_window1h, 30m, 0s).
Defaults to 0s if only stale_read_behavior is set.
When set to 0s, any staleness triggers the configured behavior.refresh_typeFULL
(always recompute entire view), INCREMENTAL (recompute only stale
partitions when possible, fall back to full otherwise). When not set,
uses the materialized_view_default_refresh_type session property... _mv-prop-max-snapshots-per-refresh:
max_snapshots_per_refresh
Upper bound on snapshots consumed per base table per REFRESH MATERIALIZED VIEW. 0 means unbounded. Defaults to the
materialized_view_default_max_snapshots_per_refresh session property.
Requires Iceberg V3 row lineage; V2 tables fall back to unbounded refresh.
Yes
The storage table inherits standard Iceberg table properties for partitioning, sorting, and file format.
Example::
ALTER MATERIALIZED VIEW hourly_sales
SET PROPERTIES (staleness_window = '30m', refresh_type = 'INCREMENTAL');
Freshness and Refresh ^^^^^^^^^^^^^^^^^^^^^
After running REFRESH MATERIALIZED VIEW, queries read from the pre-computed storage table.
See :doc:/admin/materialized-views for general information on refresh behavior.
.. _iceberg-incremental-refresh:
Incremental Refresh
The Iceberg connector supports incremental refresh, which atomically replaces only stale
partitions rather than recomputing the entire result set. See :ref:`admin/materialized-views:Incremental Refresh`
for general information.
To enable incremental refresh, set ``refresh_type = 'INCREMENTAL'`` when creating the view::
CREATE MATERIALIZED VIEW my_view
WITH (refresh_type = 'INCREMENTAL', partitioning = ARRAY['region'])
AS SELECT ...
Requirements:
* The storage table must be partitioned on identity-transformed columns
* Stale columns in base tables must map to storage table partition columns through
:ref:`column equivalences <admin/materialized-views:Column Equivalences and Passthrough Columns>`
* Only ``INSERT`` operations on base tables enable partition-level staleness detection;
``DELETE`` or ``UPDATE`` operations cause a full refresh
.. note::
If incremental refresh cannot be applied, the engine falls back to a full refresh
and emits a warning. ``DELETE`` or ``UPDATE`` on a base table also forces a full
refresh — the Iceberg connector only tracks partition-level staleness for
append-only changes. See :ref:`admin/materialized-views:Unsupported Patterns` for
the full list of engine-level conditions.
.. _iceberg-bounded-refresh:
Bounded Refresh
~~~~~~~~~~~~~~~
Bounded refresh caps how far each base table advances per ``REFRESH MATERIALIZED VIEW``,
splitting catch-up into a series of smaller refreshes. Each refresh advances each base's
watermark by at most N snapshots; subsequent refreshes consume the remainder until the view
reaches HEAD. Use it when a single refresh would otherwise be too large to complete, e.g.:
* Initial refresh over a base table with a long history.
* Catch-up after the view has fallen far behind.
* High-throughput bases where each snapshot covers a large partition or wide commit.
Set the :ref:`max_snapshots_per_refresh <mv-prop-max-snapshots-per-refresh>` table property
per view, or the :ref:`materialized_view_default_max_snapshots_per_refresh
<iceberg-sess-materialized-view-default-max-snapshots-per-refresh>` session property as a
default for all views. The table property overrides the session default.
Example::
CREATE MATERIALIZED VIEW events_daily
WITH (max_snapshots_per_refresh = 100)
AS SELECT date(event_time) AS d, COUNT(*) AS n FROM events GROUP BY 1;
REFRESH MATERIALIZED VIEW events_daily;
Requirements:
* Base tables must use Iceberg V3 (``format-version = '3'``); a V2 base in a bounded view
triggers a warning at refresh time and falls back to unbounded.
* Each base table is bounded independently in a multi-base view.
.. _iceberg-stale-data-handling:
Stale Data Handling
^^^^^^^^^^^^^^^^^^^
The Iceberg connector automatically detects staleness by comparing current base table
snapshots against the snapshots recorded at the last refresh. A materialized view is
considered stale if base tables have changed AND the time since the last base table
modification exceeds the configured staleness window.
By default, when no staleness properties are configured, queries against a stale materialized
view will fall back to executing the underlying view query against the base tables. You can
change this default using the ``materialized_view_stale_read_behavior`` session property.
To configure staleness handling per view, set both of these properties together:
- ``stale_read_behavior``: What to do when reading stale data (``FAIL``, ``USE_VIEW_QUERY``, or ``USE_STITCHING``)
- ``staleness_window``: How much staleness to tolerate (e.g., ``1h``, ``30m``, ``0s``)
When ``USE_STITCHING`` is configured, the Iceberg connector tracks staleness at the
partition level, enabling predicate stitching to recompute only affected partitions
rather than the entire view. See :doc:`/admin/materialized-views` for details on how
predicate stitching works.
.. note::
If stitching cannot be applied, the engine falls back to a full recompute and emits
a warning. ``DELETE`` or ``UPDATE`` on a base table also forces a full recompute —
the Iceberg connector only tracks partition-level staleness for append-only changes.
See :ref:`admin/materialized-views:Unsupported Patterns` for the full list of
engine-level conditions.
Example with staleness handling:
.. code-block:: sql
CREATE MATERIALIZED VIEW hourly_sales
WITH (
stale_read_behavior = 'FAIL',
staleness_window = '1h'
)
AS SELECT date_trunc('hour', sale_time) as hour, SUM(amount) as total
FROM sales GROUP BY 1;
Limitations
^^^^^^^^^^^
- REFRESH does not provide snapshot isolation across multiple base tables (each base table's current snapshot is used independently)
- Querying materialized views at specific snapshots or timestamps is not supported
- Incremental refresh requires identity-transformed partition columns; other transforms (bucket, truncate, year, month, day, hour) are not supported as valid refresh columns
Example
^^^^^^^
Create a materialized view with custom storage configuration:
.. code-block:: sql
CREATE MATERIALIZED VIEW regional_sales
WITH (
storage_schema = 'analytics',
storage_table = 'sales_summary'
)
AS SELECT region, SUM(amount) as total FROM orders GROUP BY region;
Authorization
-------------
Enable authorization checks for the :doc:`/connector/iceberg` by setting
the ``iceberg.security`` property in the Iceberg catalog properties file. This
property must be one of the following values:
================================================== ============================================================
Property Value Description
================================================== ============================================================
``allow-all`` (default value) No authorization checks are enforced, thus allowing all
operations.
``file`` Authorization checks are enforced using a config file specified
by the Iceberg configuration property ``security.config-file``.
See :ref:`iceberg-file-based-authorization` for details.
================================================== ============================================================
.. _iceberg-file-based-authorization:
File Based Authorization
^^^^^^^^^^^^^^^^^^^^^^^^
The configuration file is specified using JSON and is composed of four sections,
each of which is a list of rules that are matched in the order specified
in the config file. The user is granted the privileges from the first
matching rule. If no rule is specified with file based authorization,
the user does not have permissions to access any of the tables or schemas, to set
session properties or run any procedures on the catalog.
All regexes default to ``.*`` if not specified provided the required fields from the
corresponding sections are specified. For example, user ``guest`` is the owner of all the schemas,
user ``admin`` has SELECT privilege on all the tables in all the schemas, all the users can set
``force_local_scheduling`` session property, but no user is allowed
to run any procedures as per the following JSON.
.. code-block:: json
{
"schemas": [
{
"user": "guest",
"owner": true
}
],
"tables": [
{
"user": "admin",
"privileges": ["SELECT"]
}
],
"sessionProperties": [
{
"property": "force_local_scheduling",
"allow": true
}
]
}
Schema Rules
~~~~~~~~~~~~
These rules govern who is considered an owner of a schema.
* ``user`` (optional): regex to match against user name.
* ``schema`` (optional): regex to match against schema name.
* ``owner`` (required): boolean indicating ownership.
Table Rules
~~~~~~~~~~~
These rules govern the privileges granted on specific tables.
* ``user`` (optional): regex to match against user name.
* ``schema`` (optional): regex to match against schema name.
* ``table`` (optional): regex to match against table name.
* ``privileges`` (required): zero or more of ``SELECT``, ``INSERT``,
``DELETE``, ``OWNERSHIP``, ``GRANT_SELECT``.
Session Property Rules
These rules govern who may set session properties.
user (optional): regex to match against user name.
property (optional): regex to match against session property name.
allowed (required): boolean indicating whether this session property may be set.
Procedure Rules
These rules govern the privileges granted on specific procedures.
* ``user`` (optional): regex to match against user name.
* ``schema`` (optional): regex to match against schema name.
* ``procedure`` (optional): regex to match against procedure name.
* ``privileges`` (required): a list that is empty or contains ``EXECUTE``.
See below for an example.
.. code-block:: json
{
"schemas": [
{
"user": "admin",
"schema": ".*",
"owner": true
},
{
"user": "guest",
"owner": false
},
{
"schema": "default",
"owner": true
}
],
"tables": [
{
"user": "admin",
"privileges": ["SELECT", "INSERT", "DELETE", "OWNERSHIP"]
},
{
"user": "banned_user",
"privileges": []
},
{
"schema": "default",
"table": ".*",
"privileges": ["SELECT"]
}
],
"sessionProperties": [
{
"property": "force_local_scheduling",
"allow": true
},
{
"user": "admin",
"property": "max_split_size",
"allow": true
}
],
"procedures": [
{
"user": "admin",
"schema": ".*",
"privileges": ["EXECUTE"]
},
{
"user": "alice",
"schema": "alice_schema",
"privileges": ["EXECUTE"]
},
{
"user": "guest",
"schema": "alice_schema",
"procedure": "test_procedure",
"privileges": ["EXECUTE"]
}
]
}