docs/source/library-user-guide/upgrading/46.0.0.md
invoke_with_args instead of invoke() and invoke_batch()DataFusion is moving to a consistent API for invoking ScalarUDFs,
ScalarUDFImpl::invoke_with_args(), and deprecating
ScalarUDFImpl::invoke(), ScalarUDFImpl::invoke_batch(), and ScalarUDFImpl::invoke_no_args()
If you see errors such as the following it means the older APIs are being used:
This feature is not implemented: Function concat does not implement invoke but called
To fix this error, use ScalarUDFImpl::invoke_with_args() instead, as shown
below. See PR 14876 for an example.
Given existing code like this:
# /* comment to avoid running
impl ScalarUDFImpl for SparkConcat {
...
fn invoke_batch(&self, args: &[ColumnarValue], number_rows: usize) -> Result<ColumnarValue> {
if args
.iter()
.any(|arg| matches!(arg.data_type(), DataType::List(_)))
{
ArrayConcat::new().invoke_batch(args, number_rows)
} else {
ConcatFunc::new().invoke_batch(args, number_rows)
}
}
}
# */
To
# /* comment to avoid running
impl ScalarUDFImpl for SparkConcat {
...
fn invoke_with_args(&self, args: ScalarFunctionArgs) -> Result<ColumnarValue> {
if args
.args
.iter()
.any(|arg| matches!(arg.data_type(), DataType::List(_)))
{
ArrayConcat::new().invoke_with_args(args)
} else {
ConcatFunc::new().invoke_with_args(args)
}
}
}
# */
ParquetExec, AvroExec, CsvExec, JsonExec deprecatedDataFusion 46 has a major change to how the built in DataSources are organized.
Instead of individual ExecutionPlans for the different file formats they now
all use DataSourceExec and the format specific information is embodied in new
traits DataSource and FileSource.
Here is more information about
ParquetExecBuilderCode that looks for ParquetExec like this will no longer work:
# /* comment to avoid running
if let Some(parquet_exec) = plan.as_any().downcast_ref::<ParquetExec>() {
// Do something with ParquetExec here
}
# */
Instead, with DataSourceExec, the same information is now on FileScanConfig and
ParquetSource. The equivalent code is
# /* comment to avoid running
if let Some(datasource_exec) = plan.as_any().downcast_ref::<DataSourceExec>() {
if let Some(scan_config) = datasource_exec.data_source().as_any().downcast_ref::<FileScanConfig>() {
// FileGroups, and other information is on the FileScanConfig
// parquet
if let Some(parquet_source) = scan_config.file_source.as_any().downcast_ref::<ParquetSource>()
{
// Information on PruningPredicates and parquet options are here
}
}
# */
ParquetExecBuilderLikewise code that builds ParquetExec using the ParquetExecBuilder such as
the following must be changed:
# /* comment to avoid running
let mut exec_plan_builder = ParquetExecBuilder::new(
FileScanConfig::new(self.log_store.object_store_url(), file_schema)
.with_projection(self.projection.cloned())
.with_limit(self.limit)
.with_table_partition_cols(table_partition_cols),
)
.with_schema_adapter_factory(Arc::new(DeltaSchemaAdapterFactory {}))
.with_table_parquet_options(parquet_options);
// Add filter
if let Some(predicate) = logical_filter {
if config.enable_parquet_pushdown {
exec_plan_builder = exec_plan_builder.with_predicate(predicate);
}
};
# */
New code should use FileScanConfig to build the appropriate DataSourceExec:
# /* comment to avoid running
let mut file_source = ParquetSource::new(parquet_options)
.with_schema_adapter_factory(Arc::new(DeltaSchemaAdapterFactory {}));
// Add filter
if let Some(predicate) = logical_filter {
if config.enable_parquet_pushdown {
file_source = file_source.with_predicate(predicate);
}
};
let file_scan_config = FileScanConfig::new(
self.log_store.object_store_url(),
file_schema,
Arc::new(file_source),
)
.with_statistics(stats)
.with_projection(self.projection.cloned())
.with_limit(self.limit)
.with_table_partition_cols(table_partition_cols);
// Build the actual scan like this
parquet_scan: file_scan_config.build(),
# */
datafusion-cli no longer automatically unescapes stringsdatafusion-cli previously would incorrectly unescape string literals (see ticket for more details).
To escape ' in SQL literals, use '':
> select 'it''s escaped';
+----------------------+
| Utf8("it's escaped") |
+----------------------+
| it's escaped |
+----------------------+
1 row(s) fetched.
To include special characters (such as newlines via \n) you can use an E literal string. For example
> select 'foo\nbar';
+------------------+
| Utf8("foo\nbar") |
+------------------+
| foo\nbar |
+------------------+
1 row(s) fetched.
Elapsed 0.005 seconds.
DataFusion 46 has changed the way scalar array function signatures are
declared. Previously, functions needed to select from a list of predefined
signatures within the ArrayFunctionSignature enum. Now the signatures
can be defined via a Vec of pseudo-types, which each correspond to a
single argument. Those pseudo-types are the variants of the
ArrayFunctionArgument enum and are as follows:
Array: An argument of type List/LargeList/FixedSizeList. All Array
arguments must be coercible to the same type.Element: An argument that is coercible to the inner type of the Array
arguments.Index: An Int64 argument.Each of the old variants can be converted to the new format as follows:
TypeSignature::ArraySignature(ArrayFunctionSignature::ArrayAndElement):
# use datafusion::common::utils::ListCoercion;
# use datafusion_expr_common::signature::{ArrayFunctionArgument, ArrayFunctionSignature, TypeSignature};
TypeSignature::ArraySignature(ArrayFunctionSignature::Array {
arguments: vec![ArrayFunctionArgument::Array, ArrayFunctionArgument::Element],
array_coercion: Some(ListCoercion::FixedSizedListToList),
});
TypeSignature::ArraySignature(ArrayFunctionSignature::ElementAndArray):
# use datafusion::common::utils::ListCoercion;
# use datafusion_expr_common::signature::{ArrayFunctionArgument, ArrayFunctionSignature, TypeSignature};
TypeSignature::ArraySignature(ArrayFunctionSignature::Array {
arguments: vec![ArrayFunctionArgument::Element, ArrayFunctionArgument::Array],
array_coercion: Some(ListCoercion::FixedSizedListToList),
});
TypeSignature::ArraySignature(ArrayFunctionSignature::ArrayAndIndex):
# use datafusion::common::utils::ListCoercion;
# use datafusion_expr_common::signature::{ArrayFunctionArgument, ArrayFunctionSignature, TypeSignature};
TypeSignature::ArraySignature(ArrayFunctionSignature::Array {
arguments: vec![ArrayFunctionArgument::Array, ArrayFunctionArgument::Index],
array_coercion: None,
});
TypeSignature::ArraySignature(ArrayFunctionSignature::ArrayAndElementAndOptionalIndex):
# use datafusion::common::utils::ListCoercion;
# use datafusion_expr_common::signature::{ArrayFunctionArgument, ArrayFunctionSignature, TypeSignature};
TypeSignature::OneOf(vec![
TypeSignature::ArraySignature(ArrayFunctionSignature::Array {
arguments: vec![ArrayFunctionArgument::Array, ArrayFunctionArgument::Element],
array_coercion: None,
}),
TypeSignature::ArraySignature(ArrayFunctionSignature::Array {
arguments: vec![
ArrayFunctionArgument::Array,
ArrayFunctionArgument::Element,
ArrayFunctionArgument::Index,
],
array_coercion: None,
}),
]);
TypeSignature::ArraySignature(ArrayFunctionSignature::Array):
# use datafusion::common::utils::ListCoercion;
# use datafusion_expr_common::signature::{ArrayFunctionArgument, ArrayFunctionSignature, TypeSignature};
TypeSignature::ArraySignature(ArrayFunctionSignature::Array {
arguments: vec![ArrayFunctionArgument::Array],
array_coercion: None,
});
Alternatively, you can switch to using one of the following functions which
take care of constructing the TypeSignature for you:
Signature::array_and_elementSignature::array_and_element_and_optional_indexSignature::array_and_indexSignature::array