Back to Datafusion

24.0.0

dev/changelog/24.0.0.md

53.1.016.0 KB
Original Source
<!--- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -->

24.0.0 (2023-05-06)

Full Changelog

Breaking changes:

  • Extract LogicalPlan::Create* DDL related plan structures into LogicalPlan::Ddl #6121 (alamb)
  • Complete Extracting LogicalPlan::Drop* DDL related plan structures into LogicalPlan::Ddl #6144 (alamb)
  • Remove JIT based on cranelift, and jit feature #6164 (yjshen)
  • Remove Rayon-based Scheduler #6169 (tustvold)
  • Cleanup join memory accounting in CrossJoin and NestedLoopsJoin #6188 (tustvold)

Implemented enhancements:

  • feat: date_bin output type same with input type #6053 (jiacai2050)
  • feat: add version field to ExecutionContext #6052 (Weijun-H)
  • Implement Streaming Aggregation: Do not break pipeline in aggregation if group by columns are ordered (V2) #6124 (mustafasrepo)
  • Make decimal multiplication allow precision-loss in DataFusion #6103 (viirya)
  • Add support for scientific notation (remove old unimplemented() for scientific notation) #6142 (2010YOUY01)
  • Support DROP SCHEMA statements #6096 (jaylmiller)
  • feat: make threshold for using scalar update in aggregate configurable #6166 (yjshen)
  • Adaptive in-memory sort (~2x faster) (#5879) #6163 (tustvold)
  • Add Support for Ordering Equivalence #6160 (mustafasrepo)
  • Sum and Avg should be able to take inputs of type Dictionary #6197 (viirya)
  • CLI: Add support for AWS named profiles #6161 (mr-brobot)
  • feat: concurrent physical planning #6138 (crepererum)
  • feat: allow scalars to appear as the LHS arg in arithmetic operations on temporal values #6196 (kyle-mccarthy)
  • Select Into support #6219 (berkaysynnada)

Fixed bugs:

  • fix: Use arrow bitwise_op as binary_kernels #6093 (RTEnzyme)
  • fix: allow group by same expr in Aggregate #6091 (jackwener)
  • fix: null handling of ScalarValue::Struct #6085 (crepererum)
  • fix: make simplify_expressions use a single schema for resolution #6077 (wolffcm)
  • fix: incorrect column pruning in sql with window operations #6039 (yukkit)
  • fix: allow group by same expr in Aggregate #6097 (jackwener)
  • fix: reassign_predicate_columns w/ in-list expr #6114 (crepererum)
  • fix: common_subexpr_eliminate and aggregates #6129 (crepererum)
  • fix: type coercion ignore prevent Interval - Date #6177 (jackwener)
  • fix: common_subexpr_eliminate w/ aggregates and relations #6199 (crepererum)
  • fix: slt fail due to bors problem #6242 (jackwener)

Documentation updates:

  • chore: Update api docs for SessionContext, TaskContext, etc #6106 (alamb)
  • docs: consolidate datafusion-cli docs #6218 (alamb)

Merged pull requests:

  • fix: Use arrow bitwise_op as binary_kernels #6093 (RTEnzyme)
  • fix: allow group by same expr in Aggregate #6091 (jackwener)
  • Update DataFusion architecture documentation #6056 (alamb)
  • feat: date_bin output type same with input type #6053 (jiacai2050)
  • fix: null handling of ScalarValue::Struct #6085 (crepererum)
  • fix: make simplify_expressions use a single schema for resolution #6077 (wolffcm)
  • fix: incorrect column pruning in sql with window operations #6039 (yukkit)
  • Substrait: Handle multiple select with the same table/columns in INTERSECT #6059 (nseekhao)
  • fix: allow group by same expr in Aggregate #6097 (jackwener)
  • chore(deps): update regex and regex-syntax requirement from 0.6.28 to 0.7.1 #6095 (alamb)
  • test: more sqllogicaltest for GROUPBY. #6100 (jackwener)
  • minor: replace unwrap() with Err #6101 (jackwener)
  • Remove temporal to kernels_arrow #6069 (berkaysynnada)
  • feat: add version field to ExecutionContext #6052 (Weijun-H)
  • chore(deps): update substrait requirement from 0.7.1 to 0.8.0 #6105 (dependabot[bot])
  • refactor(sqllogictests): port group by test to sqllogic #6088 (aprimadi)
  • Use arrow kernels for bitwise operations #6098 (alamb)
  • Unordered PARTITION BY column implementation (to prevent pipeline breaking) #6036 (mustafasrepo)
  • Add from_string_hash_map() method for SessionConfig #6111 (yahoNanJing)
  • Parallelise collect_partitioned #6109 (tustvold)
  • Fix column mapping in output_ordering() and output_partitioning() for ProjectionExec and AggregateExec #6113 (mingmwang)
  • fix: reassign_predicate_columns w/ in-list expr #6114 (crepererum)
  • Update to arrow 38 #6115 (tustvold)
  • Minor: Split DmlStatement into its own module #6120 (alamb)
  • Remove input schema from PhysicalExpr, move the validation logic to physical expression planner #6122 (mingmwang)
  • Move Scalar Subquery validation logic to the Analyzer #6084 (mingmwang)
  • chore: Update api docs for SessionContext, TaskContext, etc #6106 (alamb)
  • Implement Streaming Aggregation: Do not break pipeline in aggregation if group by columns are ordered (V2) #6124 (mustafasrepo)
  • Minor: Mark default_session_builder deprecated, add since markings #6123 (alamb)
  • [MINOR]: Add GroupByOrderMode to the AggregateExec display #6141 (mustafasrepo)
  • Extract LogicalPlan::Create* DDL related plan structures into LogicalPlan::Ddl #6121 (alamb)
  • Fix broken build due to merge conflict #6143 (alamb)
  • [Bug fix] Source projection output ordering #6136 (mustafasrepo)
  • Make decimal multiplication allow precision-loss in DataFusion #6103 (viirya)
  • MemoryExec INSERT INTO refactor to use ExecutionPlan #6049 (metesynnada)
  • Complete Extracting LogicalPlan::Drop* DDL related plan structures into LogicalPlan::Ddl #6144 (alamb)
  • fix: common_subexpr_eliminate and aggregates #6129 (crepererum)
  • Add support for scientific notation (remove old unimplemented() for scientific notation) #6142 (2010YOUY01)
  • Use ParquetObjectReader #6130 (tustvold)
  • Add bench.sh script to automate benchmarking DataFusion against itself #6131 (alamb)
  • Port test in select.rs to sqllogic #6158 (aprimadi)
  • Support uint64 literals #6146 (jackkleeman)
  • Remove JIT based on cranelift, and jit feature #6164 (yjshen)
  • minor: add decimal roundtrip tests for the row format #6165 (yjshen)
  • Support DROP SCHEMA statements #6096 (jaylmiller)
  • Improve compare.py output to use min times and better column titles #6134 (alamb)
  • Fix GroupByOrderMode documentation #6168 (aprimadi)
  • Remove Rayon-based Scheduler #6169 (tustvold)
  • [Minor] Fix typo in SQL data types doc #6178 (alamb)
  • feat: make threshold for using scalar update in aggregate configurable #6166 (yjshen)
  • Handle ScalarValue::Dictionary in add_to_row and update_avg_to_row #6175 (viirya)
  • chore(datetime_expression): support uppercase granularity for DATE_TRUNC func #6185 (appletreeisyellow)
  • Simplify HashJoin memory accounting #6170 (tustvold)
  • Adaptive in-memory sort (~2x faster) (#5879) #6163 (tustvold)
  • refactor: optimizer shouldn't assume failed rules are internal error #6184 (wolffcm)
  • fix: type coercion ignore prevent Interval - Date #6177 (jackwener)
  • minor: remove default match for function signature #6186 (alamb)
  • Add Support for Ordering Equivalence #6160 (mustafasrepo)
  • Remove db-benchmark (and add them to arrow-datafusion-python repo instead) #6204 (andygrove)
  • fix: common_subexpr_eliminate w/ aggregates and relations #6199 (crepererum)
  • Add parquet filter and sort to bench.sh #6172 (alamb)
  • minor: remove unused code in binary.rs #6203 (alamb)
  • Sum and Avg should be able to take inputs of type Dictionary #6197 (viirya)
  • minor: fix tiny typo in table.rs #6220 (okue)
  • Lower some log levels from debug to trace during plan execution #6193 (alamb)
  • minor: fix typo in comments. #6225 (jackwener)
  • CLI: Add support for AWS named profiles #6161 (mr-brobot)
  • feat: concurrent physical planning #6138 (crepererum)
  • Cleanup join memory accounting in CrossJoin and NestedLoopsJoin #6188 (tustvold)
  • minor: Move shift_date tests to be with code #6200 (alamb)
  • Minor: rename variable for clarity #6229 (alamb)
  • feat: allow scalars to appear as the LHS arg in arithmetic operations on temporal values #6196 (kyle-mccarthy)
  • Supply consistent format output for FileScanConfig params #6202 (tz70s)
  • minor: refactor code to avoid same code. #6222 (jackwener)
  • minor: change error type for window expressions #6231 (comphead)
  • Add sqllogic test coverage for interval arithmetic #6201 (alamb)
  • Simplify MemoryWriteExec #6154 (tustvold)
  • refactor: separate get_result_type from coerce_type #6221 (jackwener)
  • fix: slt fail due to bors problem #6242 (jackwener)
  • Port tests in errors.rs to sqllogictest #6239 (parkma99)
  • Select Into support #6219 (berkaysynnada)
  • refactor: use get_result_type to replace binary_operator_data_type. #6241 (jackwener)
  • Port test in union.rs to sqllogic #6224 (parkma99)
  • Port tests in expr.rs to sqllogictest #6240 (parkma99)
  • Port tests in cast.rs to sqllogictest #6244 (parkma99)
  • Port tests in identifiers.rs to sqllogictest #6245 (parkma99)
  • Minor: allow creating infinite external tables via SQL (for testing) #6235 (alamb)
  • Minor: consolidate test data #6217 (alamb)
  • docs: consolidate datafusion-cli docs #6218 (alamb)
  • Port tests in wildcard.rs to sqllogictest #6249 (masanobbb)
  • Simplify and speed up MemoryExec insert #6236 (alamb)
  • minor: fix typo #6253 (okue)
  • refactor: separate get_common_type / get_result_type for temporal type #6250 (jackwener)
  • Port tests in set_variable.rs to sqllogictest #6255 (parkma99)