Back to Datafusion

13.0.0

dev/changelog/13.0.0.md

53.1.028.3 KB
Original Source
<!--- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -->

13.0.0 (2022-10-06)

Full Changelog

Breaking changes:

  • Make ObjectStoreProvider fallible (return Result rather than Option) #3584 (tustvold)
  • Make OptimizerConfig a builder style API #3525 (alamb)

Implemented enhancements:

  • remove type coercion for ScalarUDF in the physical phase #3734
  • Allow with statements to specify their columns alongside their expression names #3716
  • Support SQLDataType::Timestamp(TimezoneInfo) #3693
  • support type coercion for case when expr #3673
  • Add simplification rules for the Modulo operator #3664
  • Add TIMESTAMPTZ #3659
  • Simplify A * 0 and A * null. #3626
  • change rule of PreCastLitInComparisonExpressions to unwrap cast rule after #3582 #3622
  • Optimize regex_replace with a known pattern / replacement #3613
  • Simplify CONCAT_WS(NULL, ..) to NULL #3607
  • Add OctoSQL to list of systems powered by DataFusion #3605
  • Prevent over-allocation (and spills) on TopK queries #3596
  • Allow ObjectStoreProvider to return None (return Result<Option> rather than Result) #3594
  • simplify between expr should consider the data type #3587
  • make type coercion simple and remove the evaluate logic #3585
  • ReduceOuterJoin optimizer support cast or try_cast expr. #3565
  • Support type coercion for subquery #3557
  • Make ParquetScanOptions public and expose a reference to the scan options from ParquetExec #3550
  • Use fetch limit in get_sorted_iter #3544
  • Push limit to sort #3528
  • Execute sorts in parallel when limit is used after sort #3526
  • Consolidate optimizer passes in optimizer module for better testing #3524
  • Support Top-K query optimization for `ORDER BY <EXPR> [ASC #3515
  • support the type coercion for like unlike istrue isfalse isunknown #3509
  • Automate the pushing of releases to Homebrew #3506
  • Add extra DATE_PART units that are already supported in arrow-rs #3502
  • Release datafusion-cli 12.0.0 on Homebrew #3501
  • Make from_proto_binary_op public #3489
  • coercion between decimal and other types lacking, compared to other numeric types #3479
  • move type coercion for inlist from physical phase to logical phase #3468
  • Make datafusion::physical_plan::file_format::file_strean::FileStream public #3466
  • Support using offset index in ParquetRecordBatchStream when pushing down RowFilter #3456
  • Support timestamp data type in In_list node #3449
  • Evaluate expressions after type coercion #3431
  • Make a convenience function to register a single RecordBatch as a table from SessionContext #3426
  • add datafusion-cli support of external table locations that object_store supports #3424
  • pruning support cast/try_cast expr #3414
  • Add documentation on querying against files in object store such as S3 #3399
  • Remove type-coercion from physical planner #3388
  • support Statement::ShowVariable to show session configs #3364
  • Support RowFilter in ParquetExec #3360
  • Apply TypeCoercion rule before FilterPushDown #3289
  • Add support for get / show timezone #3255
  • Consider adding DataFusion to ClickBench benchmarks #2902
  • filter_push_down panics on semi/anti join with join filters #2888
  • Migrate the cross join -> inner join optimization from the planner to the optimizer #2859
  • ObjectStore write support #2185
  • DataFusion should scan Parquet statistics once per query #871
  • Extend & generalize constant folding / evaluation in logical optimizer #237

Fixed bugs:

  • projection_push_down produces invalid aggregate plans in some cases #3738
  • Time With Time Zone should raise error until DataType::Time64 support tz #3715
  • SQL Planner doesn't distinguish normal CTEs from the recursive ones. #3713
  • Fix inconsistency between column name formats #3711
  • Optimizer rule 'projection_push_down' failed due to unexpected error: Error during planning: Aggregate schema has wrong number of fields. Expected 3 got 8 #3704
  • Optimizer regressions in unwrap_cast_in_comparison #3690
  • Internal error when evaluating a predicate = "The type of Dictionary(Int16, Utf8) = Int64 of binary physical should be same" #3685
  • Specialized regexp_replace should early-abort when the input arrays are empty #3647
  • Internal error: Failed to coerce types Decimal128(10, 2) and Boolean in BETWEEN expression #3646
  • Internal error: Failed to coerce types Decimal128(10, 2) and Boolean in BETWEEN expression #3645
  • Type coercion error: The type of Boolean AND Decimal128(10, 2) of binary physical should be same #3644
  • LEFT JOIN not working as expected, error message is confusing #3639
  • INTERSECT and EXCEPT don't return an error when 2 sets have the different number of columns #3632
  • The datafusion-cli panics when union 2 table with different number of columns. #3630
  • The expression col(a) / null is not optimized. #3624
  • s3_build_error test may fail in some environments #3601
  • New clippy errors appears to be break the CI on the master #3597
  • StringConcat gives inconsistent result with concat when containing null #3569
  • simplify_expressions don't support different data type for binary #3556
  • Broken logical plan serialization for aggregation queries #3555
  • Aggregate filters do not get pushed down to table scan #3546
  • docs.rs cannot build datafusion-proto crate #3538
  • DataFusion serialization doesn't handle ScalarValue::Dictionary, Binary, LargeBinary, Time64, IntervalMonthDayNano, Struct #3531
  • What should be returned when trying to get a config in invalid format? #3505
  • Dividing decimal type gives wrong error: "170141183460469231731687303715884105727 is too large to store in a Decimal128 #3498
  • Add BitwiseXor in function from_proto_binary_op #3495
  • comparison operations with a scalar null and decimal array panics #3487
  • Union columns with different types #3467
  • Can't get the right logical plan after optimizer #3421
  • Fix conflict between simplify_expression rule and CAST expressions #3409
  • Empty array giving error #2439
  • Internal error: Unsupported data type in hasher: FixedSizeBinary(16) #1516
  • Predicates on to_timestamp do not work as expected with "naive" timestamp strings #765
  • Address performance/execution plan of TPCH query 19 #78
  • Bug fix: expr_visitor was not visiting aggregate filter expressions #3548 (andygrove)

Documentation updates:

  • Publish 8.0.0 user guide #2558
  • MINOR: Add Dask SQL to list of projects powered by DataFusion #3581 (andygrove)
  • Add Parseable as Datafusion user #3471 (nitisht)

Closed issues:

  • Upgrade to Arrow 24.0.0 #3689
  • what's the best practice to get a single value from arrow array? #3497
  • The data type of predicate in the row filter should be same in the binary expr #3469
  • Extend constant folding and parquet filtering support #188
  • Add FORMAT to explain plan and an easy to visualize format #96

Merged pull requests: