docs/development/extensions-contrib/thrift.md
To use this Apache Druid extension, include druid-thrift-extensions in the extensions load list.
This extension enables Druid to ingest Thrift-encoded data from streaming sources such as Kafka and Kinesis.
You may want to use another version of thrift, change the dependency in pom and compile yourself.
Thrift-encoded data for streaming ingestion (Kafka, Kinesis) can be ingested using the Thrift input format. It supports flattenSpec for extracting fields from nested Thrift structs using JSONPath expressions.
| Field | Type | Description | Required |
|---|---|---|---|
| type | String | Must be thrift | yes |
| thriftClass | String | Fully qualified class name of the Thrift-generated TBase class to deserialize into. | yes |
| thriftJar | String | Path to a JAR file containing the Thrift class. If not provided, the class is looked up from the classpath. | no |
| flattenSpec | JSON Object | Specifies flattening of nested Thrift structs. See Flattening nested data for details. | no |
Consider the following Thrift schema definition:
namespace java com.example.druid
struct Author {
1: string firstName;
2: string lastName;
}
struct Book {
1: string date;
2: double price;
3: string title;
4: Author author;
}
Compile it to produce com.example.druid.Book (and com.example.druid.Author) and make the resulting JAR available on the classpath of your Druid processes, or reference it via thriftJar.
The following Kafka supervisor spec ingests compact-encoded Book messages, using a flattenSpec to extract the nested author.lastName field:
{
"type": "kafka",
"spec": {
"dataSchema": {
"dataSource": "books",
"timestampSpec": {
"column": "date",
"format": "auto"
},
"dimensionsSpec": {
"dimensions": [
"title",
"lastName"
]
},
"granularitySpec": {
"type": "uniform",
"segmentGranularity": "DAY",
"queryGranularity": "NONE"
}
},
"tuningConfig": {
"type": "kafka"
},
"ioConfig": {
"type": "kafka",
"consumerProperties": {
"bootstrap.servers": "localhost:9092"
},
"topic": "books",
"inputFormat": {
"type": "thrift",
"thriftClass": "com.example.druid.Book",
"flattenSpec": {
"useFieldDiscovery": true,
"fields": [
{
"type": "path",
"name": "lastName",
"expr": "$.author.lastName"
}
]
}
},
"taskCount": 1,
"replicas": 1,
"taskDuration": "PT1H"
}
}
}