docs/src/main/sphinx/connector/elasticsearch.md
The Elasticsearch connector allows access to Elasticsearch data from Trino. This document describes how to configure a catalog with the Elasticsearch connector to run SQL queries against Elasticsearch.
To configure the Elasticsearch connector, create a catalog properties file
etc/catalog/example.properties with the following contents, replacing the
properties as appropriate for your setup:
connector.name=elasticsearch
elasticsearch.host=localhost
elasticsearch.port=9200
elasticsearch.default-schema-name=default
The following table details all general configuration properties:
:::{list-table} Elasticsearch configuration properties :widths: 35, 55, 10 :header-rows: 1
elasticsearch.hostelasticsearch.port9200elasticsearch.default-schema-namedefaultelasticsearch.scroll-size1000elasticsearch.scroll-timeout1melasticsearch.request-timeout10selasticsearch.connect-timeout1selasticsearch.backoff-init-delay500mselasticsearch.backoff-max-delay20selasticsearch.max-retry-time30selasticsearch.node-refresh-interval1melasticsearch.ignore-publish-addressfalse
:::The connection to Elasticsearch can use AWS or password authentication.
To enable AWS authentication and authorization using IAM policies, the
elasticsearch.security option must be set to AWS. Additionally, the
following options must be configured:
:::{list-table} :widths: 40, 60 :header-rows: 1
elasticsearch.aws.regionelasticsearch.aws.access-keyelasticsearch.aws.secret-keyelasticsearch.aws.iam-roleelasticsearch.aws.external-idTo enable password authentication, the elasticsearch.security option must be set
to PASSWORD. Additionally the following options must be configured:
:::{list-table} :widths: 45, 55 :header-rows: 1
elasticsearch.auth.userelasticsearch.auth.passwordThe connector provides additional security options to connect to Elasticsearch clusters with TLS enabled.
If your cluster has globally-trusted certificates, you should only need to enable TLS. If you require custom configuration for certificates, the connector supports key stores and trust stores in P12 (PKCS) or Java Key Store (JKS) format.
The available configuration values are listed in the following table:
:::{list-table} TLS Security Properties :widths: 40, 60 :header-rows: 1
elasticsearch.tls.enabledelasticsearch.tls.keystore-pathelasticsearch.tls.truststore-pathelasticsearch.tls.keystore-passwordelasticsearch.tls.keystore-path.elasticsearch.tls.truststore-passwordelasticsearch.tls.truststore-path.elasticsearch.tls.verify-hostnamestrue.
:::(elasticsearch-type-mapping)=
Because Trino and Elasticsearch each support types that the other does not, this
connector {ref}maps some types <type-mapping-overview> when reading data.
The connector maps Elasticsearch types to the corresponding Trino types according to the following table:
:::{list-table} Elasticsearch type to Trino type mapping :widths: 30, 30, 50 :header-rows: 1
BOOLEANBOOLEANDOUBLEDOUBLEFLOATREALBYTETINYINTSHORTSMALLINTINTEGERINTEGERLONGBIGINTKEYWORDVARCHARTEXTVARCHARIPADDRESSIP:::
No other types are supported.
(elasticsearch-array-types)=
Fields in Elasticsearch can contain zero or more values, but there is no dedicated array type. To indicate a field contains an array, it can be annotated in a Trino-specific structure in the _meta section of the index mapping.
For example, you can have an Elasticsearch index that contains documents with the following structure:
{
"array_string_field": ["trino","the","lean","machine-ohs"],
"long_field": 314159265359,
"id_field": "564e6982-88ee-4498-aa98-df9e3f6b6109",
"timestamp_field": "2025-09-17T06:22:48.000Z",
"object_field": {
"array_int_field": [86,75,309],
"int_field": 2
}
}
The array fields of this structure can be defined by using the following command to add the field
property definition to the _meta.trino property of the target index mapping with Elasticsearch available at search.example.com:9200:
curl --request PUT \
--url search.example.com:9200/doc/_mapping \
--header 'content-type: application/json' \
--data '
{
"_meta": {
"trino":{
"array_string_field":{
"isArray":true
},
"object_field":{
"array_int_field":{
"isArray":true
}
},
}
}
}'
:::{note}
It is not allowed to use asRawJson and isArray flags simultaneously for the same column.
:::
(elasticsearch-date-types)=
The Elasticsearch connector supports only the default date type. All other
date formats including built-in date formats and custom date formats are
not supported. Dates with the format property are ignored.
Documents in Elasticsearch can include more complex structures that are not
represented in the mapping. For example, a single keyword field can have
widely different content including a single keyword value, an array, or a
multidimensional keyword array with any level of nesting.
The following command configures array_string_field mapping with Elasticsearch
available at search.example.com:9200:
curl --request PUT \
--url search.example.com:9200/doc/_mapping \
--header 'content-type: application/json' \
--data '
{
"properties": {
"array_string_field":{
"type": "keyword"
}
}
}'
All the following documents are legal for Elasticsearch with
array_string_field mapping:
[
{
"array_string_field": "trino"
},
{
"array_string_field": ["trino","is","the","best"]
},
{
"array_string_field": ["trino",["is","the","best"]]
},
{
"array_string_field": ["trino",["is",["the","best"]]]
}
]
See the Elasticsearch array documentation for more details.
Further, Elasticsearch supports types, such as
dense_vector,
that are not supported in Trino. These and other types can cause parsing
exceptions for users that use of these types in Elasticsearch. To manage all of
these scenarios, you can transform fields to raw JSON by annotating it in a
Trino-specific structure in the
_meta
section of the index mapping. This indicates to Trino that the field, and all
nested fields beneath, need to be cast to a VARCHAR field that contains the
raw JSON content. These fields can be defined by using the following command to
add the field property definition to the _meta.trino property of the target
index mapping.
curl --request PUT \
--url search.example.com:9200/doc/_mapping \
--header 'content-type: application/json' \
--data '
{
"_meta": {
"trino":{
"array_string_field":{
"asRawJson":true
}
}
}
}'
This preceding configuration causes Trino to return the array_string_field
field as a VARCHAR containing raw JSON. You can parse these fields with the
built-in JSON functions.
:::{note}
It is not allowed to use asRawJson and isArray flags simultaneously for the same column.
:::
The following hidden columns are available:
:::{list-table} :widths: 25, 75 :header-rows: 1
_id_score_source(elasticsearch-full-text-queries)=
Trino SQL queries can be combined with Elasticsearch queries by providing the full text query as part of the table name, separated by a colon. For example:
SELECT * FROM "tweets: +trino SQL^2"
(elasticsearch-sql-support)=
The connector provides globally available and read operation statements to access data and metadata in the Elasticsearch catalog.
The connector provides support to query multiple tables using a concise wildcard table notation.
SELECT *
FROM example.web."page_views_*";
The connector provides specific {doc}table functions </functions/table> to
access Elasticsearch.
(elasticsearch-raw-query-function)=
raw_query(varchar) -> tableThe raw_query function allows you to query the underlying database directly.
This function requires Elastic Query
DSL
syntax. The full DSL query is pushed down and processed in Elasticsearch. This
can be useful for accessing native features which are not available in Trino or
for improving query performance in situations where running a query natively may
be faster.
The raw_query function requires three parameters:
schema: The schema in the catalog that the query is to be executed on.index: The index in Elasticsearch to be searched.query: The query to execute, written in Elastic Query DSL.Once executed, the query returns a single row containing the resulting JSON payload returned by Elasticsearch.
For example, query the example catalog and use the raw_query table function
to search for documents in the orders index where the country name is
ALGERIA as defined as a JSON-formatted query matcher and passed to the
raw_query table function in the query parameter:
SELECT
*
FROM
TABLE(
example.system.raw_query(
schema => 'sales',
index => 'orders',
query => '{
"query": {
"match": {
"name": "ALGERIA"
}
}
}'
)
);
The connector includes a number of performance improvements, detailed in the following sections.
The connector requests data from multiple nodes of the Elasticsearch cluster for query processing in parallel.
The connector supports predicate push down for the following data types:
:::{list-table} :widths: 50, 50 :header-rows: 1
booleanBOOLEANdoubleDOUBLEfloatREALbyteTINYINTshortSMALLINTintegerINTEGERlongBIGINTkeywordVARCHARdateTIMESTAMP
:::No other data types are supported for predicate push down.