Back to Datafusion

DataFusion Examples

datafusion-examples/README.md

53.1.016.0 KB
Original Source
<!--- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -->

DataFusion Examples

This crate includes end to end, highly commented examples of how to use various DataFusion APIs to help you get started.

Prerequisites

Run git submodule update --init to init test files.

Running Examples

To run an example, use the cargo run command, such as:

bash
git clone https://github.com/apache/datafusion
cd datafusion
# Download test data
git submodule update --init

# Change to the examples directory
cd datafusion-examples/examples

# Run all examples in a group
cargo run --example <group> -- all

# Run a specific example within a group
cargo run --example <group> -- <subcommand>

# Run all examples in the `dataframe` group
cargo run --example dataframe -- all

# Run a single example from the `dataframe` group
# (apply the same pattern for any other group)
cargo run --example dataframe -- dataframe

Builtin Functions Examples

Group: builtin_functions

Category: Single Process

SubcommandFile PathDescription
date_timebuiltin_functions/date_time.rsExamples of date-time related functions and queries
function_factorybuiltin_functions/function_factory.rsRegister CREATE FUNCTION handler to implement SQL macros
regexpbuiltin_functions/regexp.rsExamples of using regular expression functions

Custom Data Source Examples

Group: custom_data_source

Category: Single Process

SubcommandFile PathDescription
adapter_serializationcustom_data_source/adapter_serialization.rsPreserve custom PhysicalExprAdapter information during plan serialization using PhysicalExtensionCodec interception
csv_json_openercustom_data_source/csv_json_opener.rsUse low-level FileOpener APIs for CSV/JSON
csv_sql_streamingcustom_data_source/csv_sql_streaming.rsRun a streaming SQL query against CSV data
custom_datasourcecustom_data_source/custom_datasource.rsQuery a custom TableProvider
custom_file_castscustom_data_source/custom_file_casts.rsImplement custom casting rules
custom_file_formatcustom_data_source/custom_file_format.rsWrite to a custom file format
default_column_valuescustom_data_source/default_column_values.rsCustom default values using metadata
file_stream_providercustom_data_source/file_stream_provider.rsRead/write via FileStreamProvider for streams

Data IO Examples

Group: data_io

Category: Single Process

SubcommandFile PathDescription
catalogdata_io/catalog.rsRegister tables into a custom catalog
json_shreddingdata_io/json_shredding.rsImplement filter rewriting for JSON shredding
parquet_adv_idxdata_io/parquet_advanced_index.rsCreate a secondary index across multiple parquet files
parquet_emb_idxdata_io/parquet_embedded_index.rsStore a custom index inside Parquet files
parquet_encdata_io/parquet_encrypted.rsRead & write encrypted Parquet files
parquet_enc_with_kmsdata_io/parquet_encrypted_with_kms.rsEncrypted Parquet I/O using a KMS-backed factory
parquet_exec_visitordata_io/parquet_exec_visitor.rsExtract statistics by visiting an ExecutionPlan
parquet_idxdata_io/parquet_index.rsCreate a secondary index
query_http_csvdata_io/query_http_csv.rsQuery CSV files via HTTP
remote_catalogdata_io/remote_catalog.rsInteract with a remote catalog

DataFrame Examples

Group: dataframe

Category: Single Process

SubcommandFile PathDescription
cache_factorydataframe/cache_factory.rsCustom lazy caching for DataFrames using CacheFactory
dataframedataframe/dataframe.rsQuery DataFrames from various sources and write output
deserialize_to_structdataframe/deserialize_to_struct.rsConvert Arrow arrays into Rust structs

Execution Monitoring Examples

Group: execution_monitoring

Category: Single Process

SubcommandFile PathDescription
mem_pool_exec_planexecution_monitoring/memory_pool_execution_plan.rsMemory-aware ExecutionPlan with spilling
mem_pool_trackingexecution_monitoring/memory_pool_tracking.rsDemonstrates memory tracking
tracingexecution_monitoring/tracing.rsDemonstrates tracing integration

External Dependency Examples

Group: external_dependency

Category: Single Process

SubcommandFile PathDescription
dataframe_to_s3external_dependency/dataframe_to_s3.rsQuery DataFrames and write results to S3
query_aws_s3external_dependency/query_aws_s3.rsQuery S3-backed data using object_store

Flight Examples

Group: flight

Category: Distributed

SubcommandFile PathDescription
clientflight/client.rsExecute SQL queries via Arrow Flight protocol
serverflight/server.rsRun DataFusion server accepting FlightSQL/JDBC queries
sql_serverflight/sql_server.rsStandalone SQL server for JDBC clients

Proto Examples

Group: proto

Category: Single Process

SubcommandFile PathDescription
composed_extension_codecproto/composed_extension_codec.rsUse multiple extension codecs for serialization/deserialization
expression_deduplicationproto/expression_deduplication.rsExample of expression caching/deduplication using the codec decorator pattern

Query Planning Examples

Group: query_planning

Category: Single Process

SubcommandFile PathDescription
analyzer_rulequery_planning/analyzer_rule.rsCustom AnalyzerRule to change query semantics
expr_apiquery_planning/expr_api.rsCreate, execute, analyze, and coerce Exprs
optimizer_rulequery_planning/optimizer_rule.rsReplace predicates via a custom OptimizerRule
parse_sql_exprquery_planning/parse_sql_expr.rsParse SQL into DataFusion Expr
plan_to_sqlquery_planning/plan_to_sql.rsGenerate SQL from expressions or plans
planner_apiquery_planning/planner_api.rsAPIs for logical and physical plan manipulation
pruningquery_planning/pruning.rsUse pruning to skip irrelevant files
thread_poolsquery_planning/thread_pools.rsConfigure custom thread pools for DataFusion execution

Relation Planner Examples

Group: relation_planner

Category: Single Process

SubcommandFile PathDescription
match_recognizerelation_planner/match_recognize.rsImplement MATCH_RECOGNIZE pattern matching
pivot_unpivotrelation_planner/pivot_unpivot.rsImplement PIVOT / UNPIVOT
table_samplerelation_planner/table_sample.rsImplement TABLESAMPLE

SQL Ops Examples

Group: sql_ops

Category: Single Process

SubcommandFile PathDescription
analysissql_ops/analysis.rsAnalyze SQL queries
custom_sql_parsersql_ops/custom_sql_parser.rsImplement a custom SQL parser to extend DataFusion
frontendsql_ops/frontend.rsBuild LogicalPlans from SQL
querysql_ops/query.rsQuery data using SQL

UDF Examples

Group: udf

Category: Single Process

SubcommandFile PathDescription
adv_udafudf/advanced_udaf.rsAdvanced User Defined Aggregate Function (UDAF)
adv_udfudf/advanced_udf.rsAdvanced User Defined Scalar Function (UDF)
adv_udwfudf/advanced_udwf.rsAdvanced User Defined Window Function (UDWF)
async_udfudf/async_udf.rsAsynchronous User Defined Scalar Function
udafudf/simple_udaf.rsSimple UDAF example
udfudf/simple_udf.rsSimple UDF example
udtfudf/simple_udtf.rsSimple UDTF example
udwfudf/simple_udwf.rsSimple UDWF example