docs/src/main/sphinx/object-storage/metastores.md
Object storage access is mediated through a metastore. Metastores provide information on directory structure, file format, and metadata about the stored data. Object storage connectors support the use of one or more metastores. A supported metastore is required to use any object storage connector.
Additional configuration is required in order to access tables with Athena partition projection metadata or implement first class support for Avro tables. These requirements are discussed later in this topic.
(general-metastore-properties)=
The following table describes general metastore configuration properties, most of which are used with either metastore.
At a minimum, each Delta Lake, Hive or Hudi object storage catalog file must set
the hive.metastore configuration property to define the type of metastore to
use. Iceberg catalogs instead use the iceberg.catalog.type configuration
property to define the type of metastore to use.
Additional configuration properties specific to the Thrift and Glue Metastores are also available. They are discussed later in this topic.
:::{list-table} General metastore configuration properties :widths: 35, 50, 15 :header-rows: 1
hive.metastorethrift), and the AWS Glue Catalog (glue) as metadata
sources. You must use this for all object storage catalogs except Iceberg.thrifticeberg.catalog.type
The Iceberg table format manages most metadata in metadata files in the object storage itself. A small amount of metadata, however, still requires the use of a metastore. In the Iceberg ecosystem, these smaller metastores are called Iceberg metadata catalogs, or just catalogs. The examples in each subsection depict the contents of a Trino catalog file that uses the Iceberg connector to configures different Iceberg metadata catalogs.
You must set this property in all Iceberg catalog property files. Valid
values are hive_metastore, glue, jdbc, rest, nessie, and
snowflake.
hive_metastore
hive.metastore-cache.cache-partitionstruehive.metastore-cache.cache-missingtruehive.metastore-cache.cache-missing-partitionsfalsehive.metastore-cache.cache-missing-statsfalsehive.metastore-cache-ttl0shive.metastore-stats-cache-ttl5mhive.metastore-cache-maximum-size20000hive.metastore-refresh-intervalhive.metastore-refresh-max-threads10hive.user-metastore-cache-ttl0shive.user-metastore-cache-maximum-size1000hive.hide-delta-lake-tablesfalse
:::(hive-thrift-metastore)=
In order to use a Hive Thrift metastore, you must configure the metastore with
hive.metastore=thrift and provide further details with the following
properties:
:::{list-table} Thrift metastore configuration properties :widths: 35, 50, 15 :header-rows: 1
hive.metastore.urithrift://192.0.2.3:9083 or
thrift://192.0.2.3:9083,thrift://192.0.2.4:9083hive.metastore.usernamehive.metastore.authentication.typeNONE or
KERBEROS.NONEhive.metastore.thrift.client.connect-timeout10shive.metastore.thrift.client.read-timeout10shive.metastore.thrift.impersonation.enabledhive.metastore.thrift.use-spark-table-statistics-fallbacktruehive.metastore.thrift.delegation-token.cache-ttl1hhive.metastore.thrift.delegation-token.cache-maximum-size1000hive.metastore.thrift.client.ssl.enabledfalsehive.metastore.thrift.client.ssl.keyhive.metastore.thrift.client.ssl.key-passwordhive.metastore.thrift.client.ssl.trust-certificatehive.metastore.thrift.client.ssl.trust-certificate-passwordhive.metastore.service.principalhive.metastore.client.principalhive.metastore.client.keytabhive.metastore.thrift.delete-files-on-dropfalsehive.metastore.thrift.assume-canonical-partition-keysTIMESTAMP type do not get canonicalized.falsehive.metastore.thrift.client.socks-proxyhive.metastore.thrift.client.max-retries9hive.metastore.thrift.client.backoff-scale-factor2.0hive.metastore.thrift.client.max-retry-time30shive.metastore.thrift.client.min-backoff-delay1shive.metastore.thrift.client.max-backoff-delay1shive.metastore.thrift.txn-lock-max-wait10mhive.metastore.thrift.catalog-name:::
(iceberg-hive-catalog)=
When using the Hive catalog, the Iceberg connector supports the same
{ref}general Thrift metastore configuration properties <hive-thrift-metastore>
as previously described with the following additional property:
:::{list-table} Iceberg Hive catalog configuration property :widths: 35, 50, 15 :header-rows: 1
iceberg.hive-catalog.locking-enabledtrue
::::::{warning}
Setting iceberg.hive-catalog.locking-enabled=false will cause the catalog to
commit to tables without using Hive locks. This should only be set to false if all
following conditions are met:
(hive-thrift-metastore-authentication)=
In a Kerberized Hadoop cluster, Trino connects to the Hive metastore Thrift
service using {abbr}SASL (Simple Authentication and Security Layer) and
authenticates using Kerberos. Kerberos authentication for the metastore is
configured in the connector's properties file using the following optional
properties:
:::{list-table} Hive metastore Thrift service authentication properties :widths: 30, 55, 15 :header-rows: 1
hive.metastore.authentication.type
Hive metastore authentication type. One of NONE or KERBEROS. When using
the default value of NONE, Kerberos authentication is disabled, and no
other properties must be configured.
When set to KERBEROS the Hive connector connects to the Hive metastore
Thrift service using SASL and authenticate using Kerberos.
NONE
hive.metastore.service.principal
The Kerberos principal of the Hive metastore service. The coordinator uses this to authenticate the Hive metastore.
The _HOST placeholder can be used in this property value. When connecting
to the Hive metastore, the Hive connector substitutes in the hostname of the
metastore server it is connecting to. This is useful if the metastore
runs on multiple hosts.
Example: hive/[email protected] or hive/[email protected].
hive.metastore.client.principal
The Kerberos principal that Trino uses when connecting to the Hive metastore service.
Example: trino/[email protected] or trino/[email protected].
The _HOST placeholder can be used in this property value. When connecting
to the Hive metastore, the Hive connector substitutes in the hostname of the
worker node Trino is running on. This is useful if each worker node has
its own Kerberos principal.
Unless is enabled, the principal
specified by hive.metastore.client.principal must have sufficient
privileges to remove files and directories within the hive/warehouse
directory.
Warning: If the principal does have sufficient permissions, only the metadata is removed, and the data continues to consume disk space. This occurs because the Hive metastore is responsible for deleting the internal table data. When the metastore is configured to use Kerberos authentication, all the HDFS operations performed by the metastore are impersonated. Errors deleting data are silently ignored.
hive.metastore.client.keytabhive.metastore.client.principal. This file must be
readable by the operating system user running Trino.:::
The following sections describe the configuration properties and values needed for the various authentication configurations needed to use the Hive metastore Thrift service with the Hive connector.
NONE authentication without impersonationhive.metastore.authentication.type=NONE
The default authentication type for the Hive metastore is NONE. When the
authentication type is NONE, Trino connects to an unsecured Hive
metastore. Kerberos is not used.
(hive-security-metastore-impersonation)=
KERBEROS authentication with impersonationhive.metastore.authentication.type=KERBEROS
hive.metastore.thrift.impersonation.enabled=true
hive.metastore.service.principal=hive/[email protected]
[email protected]
hive.metastore.client.keytab=/etc/trino/hive.keytab
When the authentication type for the Hive metastore Thrift service is
KERBEROS, Trino connects as the Kerberos principal specified by the
property hive.metastore.client.principal. Trino authenticates this
principal using the keytab specified by the hive.metastore.client.keytab
property, and verifies that the identity of the metastore matches
hive.metastore.service.principal.
When using KERBEROS Metastore authentication with impersonation, the
principal specified by the hive.metastore.client.principal property must be
allowed to impersonate the current Trino user, as discussed in the section
.
Keytab files must be distributed to every node in the Trino cluster.
(hive-glue-metastore)=
In order to use an AWS Glue catalog, you must configure your catalog file as follows:
hive.metastore=glue and provide further details with the following
properties:
:::{list-table} AWS Glue catalog configuration properties :widths: 35, 50, 15 :header-rows: 1
hive.metastore.glue.regionus-east-1hive.metastore.glue.endpoint-urlhttps://glue.us-east-1.amazonaws.comhive.metastore.glue.sts.regionus-gov-east-1hive.metastore.glue.sts.endpointhttps://sts.us-gov-east-1.amazonaws.comhive.metastore.glue.pin-client-to-current-regionfalsehive.metastore.glue.max-connections30hive.metastore.glue.max-error-retries10hive.metastore.glue.default-warehouse-dirlocation property.hive.metastore.glue.use-web-identity-token-credentials-providertrue. Setting to true forces
Trino to not try using different credential providers from the default credential
provider chain, and instead directly use credentials from the service account.falsehive.metastore.glue.aws-access-keyhive.metastore.glue.aws-secret-key, this parameter takes precedence
over hive.metastore.glue.iam-role.hive.metastore.glue.aws-secret-keyhive.metastore.glue.aws-access-key, this parameter takes precedence
over hive.metastore.glue.iam-role.hive.metastore.glue.catalogidhive.metastore.glue.iam-rolehive.metastore.glue.external-idhive.metastore.glue.partitions-segments5hive.metastore.glue.skip-archivefalse
:::(iceberg-glue-catalog)=
When using the Glue catalog, the Iceberg connector supports the same
{ref}general Glue configuration properties <hive-glue-metastore> as previously
described with the following additional property:
:::{list-table} Iceberg Glue catalog configuration property :widths: 35, 50, 15 :header-rows: 1
iceberg.glue.cache-table-metadatainformation_schema.columns and
system.metadata.table_comments queries.true
:::The Iceberg table format manages most metadata in metadata files in the object storage itself. A small amount of metadata, however, still requires the use of a metastore. In the Iceberg ecosystem, these smaller metastores are called Iceberg metadata catalogs, or just catalogs.
You can use a general metastore such as an HMS or AWS Glue, or you can use the Iceberg-specific REST, Nessie or JDBC metadata catalogs, as discussed in this section.
(iceberg-rest-catalog)=
In order to use the Iceberg REST catalog, configure the catalog type
with iceberg.catalog.type=rest, and provide further details with the
following properties:
:::{list-table} Iceberg REST catalog configuration properties :widths: 40, 60 :header-rows: 1
iceberg.rest-catalog.urihttp://iceberg-with-rest:8181iceberg.rest-catalog.prefixdeviceberg.rest-catalog.warehouses3://my_bucket/warehouse_locationiceberg.rest-catalog.securityNONE). Possible values are NONE,
SIGV4, GOOGLE or OAUTH2. OAUTH2 requires either a token or a credential.iceberg.rest-catalog.sessionNONE or USER (default: NONE).iceberg.rest-catalog.session-timeout1h.iceberg.rest-catalog.oauth2.tokentoken or
credential is required for OAUTH2 security. Example: AbCdEf123456iceberg.rest-catalog.oauth2.credentialtoken or credential is required for OAUTH2
security. Example: AbCdEf123456iceberg.rest-catalog.oauth2.scopecredential.iceberg.rest-catalog.oauth2.server-uriiceberg.rest-catalog.oauth2.token-refresh-enabledtrueiceberg.rest-catalog.oauth2.token-exchange-enabledtrueiceberg.rest-catalog.vended-credentials-enabledfalse.iceberg.rest-catalog.nested-namespace-enabledfalse.iceberg.rest-catalog.view-endpoints-enabledtrue.iceberg.rest-catalog.signing-nameexecute-api.iceberg.rest-catalog.google-project-idiceberg.rest-catalog.security
config property is set to GOOGLE. Example: development-123456.iceberg.rest-catalog.case-insensitive-name-matchingfalse.iceberg.rest-catalog.case-insensitive-name-matching.cache-ttl1m.
:::The following example shows a minimal catalog configuration using an Iceberg REST metadata catalog:
connector.name=iceberg
iceberg.catalog.type=rest
iceberg.rest-catalog.uri=http://iceberg-with-rest:8181
iceberg.security must be read_only when connecting to Databricks Unity catalog
using an Iceberg REST catalog:
connector.name=iceberg
iceberg.catalog.type=rest
iceberg.rest-catalog.uri=https://dbc-12345678-9999.cloud.databricks.com/api/2.1/unity-catalog/iceberg
iceberg.security=read_only
iceberg.rest-catalog.security=OAUTH2
iceberg.rest-catalog.oauth2.token=***
iceberg.rest-catalog.security must be GOOGLE when connecting to BigLake metastore
using an Iceberg REST catalog.
connector.name=iceberg
iceberg.catalog.type=rest
iceberg.unique-table-location=false
iceberg.rest-catalog.warehouse=gs://example-bucket
iceberg.rest-catalog.uri=https://biglake.googleapis.com/iceberg/v1beta/restcatalog
iceberg.rest-catalog.security=GOOGLE
iceberg.rest-catalog.google-project-id=example-project-id
iceberg.rest-catalog.view-endpoints-enabled=false
fs.native-gcs.enable=true
gcs.json-key-file-path=/path/to/gcs_keyfile.json
The REST catalog supports view management using the Iceberg View specification.
The REST catalog does not support materialized view management.
(iceberg-jdbc-catalog)=
The Iceberg JDBC catalog is supported for the Iceberg connector. At a minimum,
iceberg.jdbc-catalog.driver-class, iceberg.jdbc-catalog.connection-url,
iceberg.jdbc-catalog.default-warehouse-dir, and
iceberg.jdbc-catalog.catalog-name must be configured. When using any
database besides PostgreSQL, a JDBC driver jar file must be placed in the plugin
directory.
:::{list-table} JDBC catalog configuration properties :widths: 40, 60 :header-rows: 1
iceberg.jdbc-catalog.driver-classiceberg.jdbc-catalog.connection-urliceberg.jdbc-catalog.connection-usericeberg.jdbc-catalog.connection-passwordiceberg.jdbc-catalog.catalog-nameiceberg.jdbc-catalog.default-warehouse-diriceberg.jdbc-catalog.schema-versionV0 or V1. Defaults to V1.iceberg.jdbc-catalog.retryable-status-codes08000,08003,08006,08007,40001. Specify only
additional codes (such as 57000,57P03,57P04 if using
PostgreSQL driver) here.
::::::{warning}
The JDBC catalog may have compatibility issues if Iceberg introduces breaking
changes in the future. Consider the {ref}REST catalog <iceberg-rest-catalog> as an alternative solution.
The JDBC catalog requires the metadata tables to already exist. Refer to Iceberg repository for creating those tables. :::
The following example shows a minimal catalog configuration using an Iceberg JDBC metadata catalog:
connector.name=iceberg
iceberg.catalog.type=jdbc
iceberg.jdbc-catalog.catalog-name=test
iceberg.jdbc-catalog.driver-class=org.postgresql.Driver
iceberg.jdbc-catalog.connection-url=jdbc:postgresql://example.net:5432/database
iceberg.jdbc-catalog.connection-user=admin
iceberg.jdbc-catalog.connection-password=test
iceberg.jdbc-catalog.default-warehouse-dir=s3://bucket
The JDBC catalog does not support materialized view management.
(iceberg-nessie-catalog)=
In order to use a Nessie catalog, configure the catalog type with
iceberg.catalog.type=nessie and provide further details with the following
properties:
:::{list-table} Nessie catalog configuration properties :widths: 40, 60 :header-rows: 1
iceberg.nessie-catalog.urihttps://localhost:19120/api/v2iceberg.nessie-catalog.refmain.iceberg.nessie-catalog.default-warehouse-dirlocation property. Example: /tmpiceberg.nessie-catalog.read-timeout25s.iceberg.nessie-catalog.connection-timeout5s.iceberg.nessie-catalog.enable-compressiontrue.iceberg.nessie-catalog.authentication.typeBEARER. Defaults to no
authentication.iceberg.nessie-catalog.authentication.tokenBEARER authentication. Example:
SXVLUXUhIExFQ0tFUiEKiceberg.nessie-catalog.client-api-versioniceberg.nessie-catalog.uri value.
Valid values are V1 or V2.
:::connector.name=iceberg
iceberg.catalog.type=nessie
iceberg.nessie-catalog.uri=https://localhost:19120/api/v2
iceberg.nessie-catalog.default-warehouse-dir=/tmp
The Nessie catalog does not support view management or materialized view management.
(iceberg-snowflake-catalog)=
In order to use a Snowflake catalog, configure the catalog type with
iceberg.catalog.type=snowflake and provide further details with the following
properties:
:::{list-table} Snowflake catalog configuration properties :widths: 40, 60 :header-rows: 1
iceberg.snowflake-catalog.account-urijdbc:snowflake://example123456789.snowflakecomputing.comiceberg.snowflake-catalog.usericeberg.snowflake-catalog.passwordiceberg.snowflake-catalog.databaseiceberg.snowflake-catalog.roleconnector.name=iceberg
iceberg.catalog.type=snowflake
iceberg.snowflake-catalog.account-uri=jdbc:snowflake://example1234567890.snowflakecomputing.com
iceberg.snowflake-catalog.user=user
iceberg.snowflake-catalog.password=secret
iceberg.snowflake-catalog.database=db
When using the Snowflake catalog, data management tasks such as creating tables,
must be performed in Snowflake because using the catalog from external systems
like Trino only supports SELECT queries and other read operations.
Additionally, the Snowflake-created Iceberg tables do not expose partitioning information, which prevents efficient parallel reads and therefore can have significant negative performance implications.
The Snowflake catalog does not support view management or materialized view management.
Further information is available in the Snowflake catalog documentation.
(partition-projection)=
Partition projection is a feature of AWS Athena often used to speed up query processing with highly partitioned tables when using the Hive connector.
Trino supports partition projection table properties stored in the Hive
metastore or Glue catalog, and it reimplements this functionality. Currently,
there is a limitation in comparison to AWS Athena for date projection, as it
only supports intervals of DAYS, HOURS, MINUTES, and SECONDS.
If there are any compatibility issues blocking access to a requested table when
partition projection is enabled, set the
partition_projection_ignore table property to true for a table to bypass
any errors.
Refer to {ref}hive-table-properties and {ref}hive-column-properties for
configuration of partition projection.
For catalogs using the Hive connector, you must add the following property
definition to the Hive metastore configuration file hive-site.xml and
restart the metastore service to enable first-class support for Avro tables when
using Hive 3.x:
<property>
<!-- https://community.hortonworks.com/content/supportkb/247055/errorjavalangunsupportedoperationexception-storage.html -->
<name>metastore.storage.schema.reader.impl</name>
<value>org.apache.hadoop.hive.metastore.SerDeStorageSchemaReader</value>
</property>