pinot-connectors/pinot-spark-2-connector/README.md
Spark-pinot connector to read data from Pinot.
Detailed read model documentation is here; Spark-Pinot Connector Read Model
import org.apache.spark.sql.SparkSession
val spark: SparkSession = SparkSession
.builder()
.appName("spark-pinot-connector-test")
.master("local")
.getOrCreate()
import spark.implicits._
val data = spark.read
.format("pinot")
.option("table", "airlineStats")
.option("tableType", "offline")
.load()
.filter($"DestStateName" === "Florida")
data.show(100)
There are more examples included in src/test/scala/.../ExampleSparkPinotConnectorTest.scala.
You can run the examples locally (e.g. using your IDE) in standalone mode by starting a local Pinot cluster. See: https://docs.pinot.apache.org/basics/getting-started/running-pinot-locally
You can also run the tests in cluster mode using following command:
export SPARK_CLUSTER=<YOUR_YARN_OR_SPARK_CLUSTER>
# Edit the ExampleSparkPinotConnectorTest to get rid of `.master("local")` and rebuild the jar before running this command
spark-submit \
--class org.apache.pinot.connector.spark.datasource.ExampleSparkPinotConnectorTest \
--jars ./target/pinot-spark-2-connector-0.13.0-SNAPSHOT-shaded.jar \
--master $SPARK_CLUSTER \
--deploy-mode cluster \
./target/pinot-spark-2-connector-0.13.0-SNAPSHOT-tests.jar
Spark-Pinot connector uses Spark DatasourceV2 API. Please check the Databricks presentation for DatasourceV2 API;