Back to Trino

Pinot connector

docs/src/main/sphinx/connector/pinot.md

48014.4 KB
Original Source

Pinot connector

{raw}

The Pinot connector allows Trino to query data stored in Apache Pinot™.

Requirements

To connect to Pinot, you need:

  • Pinot 1.1.0 or higher.
  • Network access from the Trino coordinator and workers to the Pinot controller nodes. Port 8098 is the default port.

Configuration

To configure the Pinot connector, create a catalog properties file e.g. etc/catalog/example.properties with at least the following contents:

text
connector.name=pinot
pinot.controller-urls=host1:8098,host2:8098

Replace host1:8098,host2:8098 with a comma-separated list of Pinot controller nodes. This can be the ip or the FQDN, the url scheme (http://) is optional.

Configuration properties

General configuration properties

Property nameRequiredDescription
pinot.controller-urlsYesA comma separated list of controller hosts. If Pinot is deployed via Kubernetes this needs to point to the controller service endpoint. The Pinot broker and server must be accessible via DNS as Pinot returns hostnames and not IP addresses.
pinot.broker-urlNoA host and port of broker. If broker URL exposed by Pinot controller API is not accessible, this property can be used to specify the broker endpoint. Enabling this property will disable broker discovery.
pinot.connection-timeoutNoPinot connection timeout, default is 1m.
pinot.metadata-expiryNoPinot metadata expiration time, default is 2m.
pinot.controller.authentication.typeNoPinot authentication method for controller requests. Allowed values are NONE and PASSWORD - defaults to NONE which is no authentication.
pinot.controller.authentication.userNoController username for basic authentication method.
pinot.controller.authentication.passwordNoController password for basic authentication method.
pinot.broker.authentication.typeNoPinot authentication method for broker requests. Allowed values are NONE and PASSWORD - defaults to NONE which is no authentication.
pinot.broker.authentication.userNoBroker username for basic authentication method.
pinot.broker.authentication.passwordNoBroker password for basic authentication method.
pinot.max-rows-per-split-for-segment-queriesNoFail query if Pinot server split returns more rows than configured, default to 2,147,483,646.
pinot.prefer-broker-queriesNoPinot query plan prefers to query Pinot broker, default is false.
pinot.forbid-segment-queriesNoForbid parallel querying and force all querying to happen via the broker, default is false.
pinot.segments-per-splitNoThe number of segments processed in a split. Setting this higher reduces the number of requests made to Pinot. This is useful for smaller Pinot clusters, default is 1.
pinot.fetch-retry-countNoRetry count for retriable Pinot data fetch calls, default is 2.
pinot.non-aggregate-limit-for-broker-queriesNoMax limit for non aggregate queries to the Pinot broker, default is 25,000.
pinot.max-rows-for-broker-queriesNoMax rows for a broker query can return, default is 50,000.
pinot.aggregation-pushdown.enabledNoPush down aggregation queries, default is true.
pinot.count-distinct-pushdown.enabledNoPush down count distinct queries to Pinot, default is true.
pinot.target-segment-page-sizeNoMax allowed page size for segment query, default is 1MB.
pinot.proxy.enabledNoUse Pinot Proxy for controller and broker requests, default is false.

If pinot.controller.authentication.type is set to PASSWORD then both pinot.controller.authentication.user and pinot.controller.authentication.password are required.

If pinot.broker.authentication.type is set to PASSWORD then both pinot.broker.authentication.user and pinot.broker.authentication.password are required.

If pinot.controller-urls uses https scheme then TLS is enabled for all connections including brokers.

gRPC configuration properties

Property nameRequiredDescription
pinot.grpc.portNoPinot gRPC port, default to 8090.
pinot.grpc.max-inbound-message-sizeNoMax inbound message bytes when init gRPC client, default is 128MB.
pinot.grpc.use-plain-textNoUse plain text for gRPC communication, default to true.
pinot.grpc.tls.keystore-typeNoTLS keystore type for gRPC connection, default is JKS.
pinot.grpc.tls.keystore-pathNoTLS keystore file location for gRPC connection, default is empty.
pinot.grpc.tls.keystore-passwordNoTLS keystore password, default is empty.
pinot.grpc.tls.truststore-typeNoTLS truststore type for gRPC connection, default is JKS.
pinot.grpc.tls.truststore-pathNoTLS truststore file location for gRPC connection, default is empty.
pinot.grpc.tls.truststore-passwordNoTLS truststore password, default is empty.
pinot.grpc.tls.ssl-providerNoSSL provider, default is JDK.
pinot.grpc.proxy-uriNoPinot Rest Proxy gRPC endpoint URI, default is null.

For more Apache Pinot TLS configurations, please also refer to Configuring TLS/SSL.

You can use {doc}secrets </security/secrets> to avoid actual values in the catalog properties files.

Querying Pinot tables

The Pinot connector automatically exposes all tables in the default schema of the catalog. You can list all tables in the pinot catalog with the following query:

SHOW TABLES FROM example.default;

You can list columns in the flight_status table:

DESCRIBE example.default.flight_status;
SHOW COLUMNS FROM example.default.flight_status;

Queries written with SQL are fully supported and can include filters and limits:

SELECT foo
FROM pinot_table
WHERE bar = 3 AND baz IN ('ONE', 'TWO', 'THREE')
LIMIT 25000;

(pinot-dynamic-tables)=

Dynamic tables

To leverage Pinot's fast aggregation, a Pinot query written in PQL can be used as the table name. Filters and limits in the outer query are pushed down to Pinot. Let's look at an example query:

SELECT *
FROM example.default."SELECT MAX(col1), COUNT(col2) FROM pinot_table GROUP BY col3, col4"
WHERE col3 IN ('FOO', 'BAR') AND col4 > 50
LIMIT 30000

Filtering and limit processing is pushed down to Pinot.

The queries are routed to the broker and are more suitable to aggregate queries.

For SELECT queries without aggregates it is more performant to issue a regular SQL query. Processing is routed directly to the servers that store the data.

The above query is translated to the following Pinot PQL query:

SELECT MAX(col1), COUNT(col2)
FROM pinot_table
WHERE col3 IN('FOO', 'BAR') and col4 > 50
TOP 30000

(pinot-type-mapping)=

Type mapping

Because Trino and Pinot each support types that the other does not, this connector {ref}maps some types <type-mapping-overview> when reading data.

Pinot type to Trino type mapping

The connector maps Pinot types to the corresponding Trino types according to the following table:

:::{list-table} Pinot type to Trino type mapping :widths: 75,60 :header-rows: 1

    • Pinot type
    • Trino type
    • INT
    • INTEGER
    • LONG
    • BIGINT
    • FLOAT
    • REAL
    • DOUBLE
    • DOUBLE
    • STRING
    • VARCHAR
    • BYTES
    • VARBINARY
    • JSON
    • JSON
    • TIMESTAMP
    • TIMESTAMP
    • INT_ARRAY
    • VARCHAR
    • LONG_ARRAY
    • VARCHAR
    • FLOAT_ARRAY
    • VARCHAR
    • DOUBLE_ARRAY
    • VARCHAR
    • STRING_ARRAY
    • VARCHAR :::

No other types are supported.

Date Type

For Pinot DateTimeFields, if the FormatSpec is in days, then it is converted to a Trino DATE type. Pinot allows for LONG fields to have a FormatSpec of days as well, if the value is larger than Integer.MAX_VALUE then the conversion to Trino DATE fails.

Null Handling

If a Pinot TableSpec has nullHandlingEnabled set to true, then for numeric types the null value is encoded as MIN_VALUE for that type. For Pinot STRING type, the value null is interpreted as a NULL value.

(pinot-sql-support)=

SQL support

The connector provides {ref}globally available <sql-globally-available> and {ref}read operation <sql-read-operations> statements to access data and metadata in Pinot.

(pinot-pushdown)=

Pushdown

The connector supports pushdown for a number of operations:

  • {ref}limit-pushdown

{ref}Aggregate pushdown <aggregation-pushdown> for the following functions:

  • {func}avg
  • {func}approx_distinct
  • count(*) and count(distinct) variations of {func}count
  • {func}max
  • {func}min
  • {func}sum

Aggregate function pushdown is enabled by default, but can be disabled with the catalog property pinot.aggregation-pushdown.enabled or the catalog session property aggregation_pushdown_enabled.

A count(distinct) pushdown may cause Pinot to run a full table scan with significant performance impact. If you encounter this problem, you can disable it with the catalog property pinot.count-distinct-pushdown.enabled or the catalog session property count_distinct_pushdown_enabled.

{include}