Back to Datafusion

34.0.0

dev/changelog/34.0.0.md

53.1.029.4 KB
Original Source
<!--- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -->

34.0.0 (2023-12-11)

Full Changelog

Breaking changes:

  • Implement DISTINCT ON from Postgres #7981 (gruuya)
  • Encapsulate EquivalenceClass into a struct #8034 (alamb)
  • Make fields of ScalarUDF , AggregateUDF and WindowUDF non pub #8079 (alamb)
  • Implement StreamTable and StreamTableProvider (#7994) #8021 (tustvold)
  • feat: make FixedSizeList scalar also an ArrayRef #8221 (wjones127)
  • Remove FileWriterMode and ListingTableInsertMode (#7994) #8017 (tustvold)
  • Refactor: Unify Expr::ScalarFunction and Expr::ScalarUDF, introduce unresolved functions by name #8258 (2010YOUY01)
  • Refactor aggregate function handling #8358 (Weijun-H)
  • Move PartitionSearchMode into datafusion_physical_plan, rename to InputOrderMode #8364 (alamb)
  • Split EmptyExec into PlaceholderRowExec #8446 (razeghi71)

Implemented enhancements:

  • feat: show statistics in explain verbose #8113 (NGA-TRAN)
  • feat:implement postgres style 'overlay' string function #8117 (Syleechan)
  • feat: fill missing values with NULLs while inserting #8146 (jonahgao)
  • feat: to_array_of_size for ScalarValue::FixedSizeList #8225 (wjones127)
  • feat:implement calcite style 'levenshtein' string function #8168 (Syleechan)
  • feat: roundtrip FixedSizeList Scalar to protobuf #8239 (wjones127)
  • feat: impl the basic string_agg function #8148 (haohuaijin)
  • feat: support simplifying BinaryExpr with arbitrary guarantees in GuaranteeRewriter #8256 (wjones127)
  • feat: support customizing column default values for inserting #8283 (jonahgao)
  • feat:implement sql style 'substr_index' string function #8272 (Syleechan)
  • feat:implement sql style 'find_in_set' string function #8328 (Syleechan)
  • feat: support LargeList in array_empty #8321 (Weijun-H)
  • feat: support LargeList in make_array and array_length #8121 (Weijun-H)
  • feat: ScalarValue from String #8411 (QuenKar)
  • feat: support LargeList for array_has, array_has_all and array_has_any #8322 (Weijun-H)
  • feat: customize column default values for external tables #8415 (jonahgao)
  • feat: Support array_sort(list_sort) #8279 (Asura7969)
  • feat: support InterleaveExecNode in the proto #8460 (liukun4515)
  • feat: improve string statistics display in datafusion-cli parquet_metadata function #8535 (asimsedhain)

Fixed bugs:

  • fix: Timestamp with timezone not considered join on #8150 (ACking-you)
  • fix: wrong result of range function #8313 (smallzhongfeng)
  • fix: make ntile work in some corner cases #8371 (haohuaijin)
  • fix: Changed labeler.yml to latest format #8431 (viirya)
  • fix: Literal in ORDER BY window definition should not be an ordinal referring to relation column #8419 (viirya)
  • fix: ORDER BY window definition should work on null literal #8444 (viirya)
  • fix: RANGE frame for corner cases with empty ORDER BY clause should be treated as constant sort #8445 (viirya)
  • fix: don't unifies projection if expr is non-trival #8454 (haohuaijin)
  • fix: support uppercase when parsing Interval #8478 (QuenKar)
  • fix: incorrect set preserve_partitioning in SortExec #8485 (haohuaijin)
  • fix: Pull stats in IdentVisitor/GraphvizVisitor only when requested #8514 (vrongmeal)
  • fix: volatile expressions should not be target of common subexpt elimination #8520 (viirya)

Documentation updates:

  • Library Guide: Add Using the DataFrame API #8319 (Veeupup)
  • Minor: Add installation link to README.md #8389 (Weijun-H)
  • Prepare version 34.0.0 #8508 (andygrove)

Merged pull requests:

  • Fix typo in partitioning.rs #8134 (lewiszlw)
  • Implement DISTINCT ON from Postgres #7981 (gruuya)
  • Prepare 33.0.0-rc2 #8144 (andygrove)
  • Avoid concat in array_append #8137 (jayzhan211)
  • Replace macro with function for array_remove #8106 (jayzhan211)
  • Implement array_union #7897 (edmondop)
  • Minor: Document ExecutionPlan::equivalence_properties more thoroughly #8128 (alamb)
  • feat: show statistics in explain verbose #8113 (NGA-TRAN)
  • feat:implement postgres style 'overlay' string function #8117 (Syleechan)
  • Minor: Encapsulate LeftJoinData into a struct (rather than anonymous enum) and add comments #8153 (alamb)
  • Update sqllogictest requirement from 0.18.0 to 0.19.0 #8163 (dependabot[bot])
  • feat: fill missing values with NULLs while inserting #8146 (jonahgao)
  • Introduce return type for aggregate sum #8141 (jayzhan211)
  • implement range/generate_series func #8140 (Veeupup)
  • Encapsulate EquivalenceClass into a struct #8034 (alamb)
  • Revert "Minor: remove unnecessary projection in `single_distinct_to_g… #8176 (NGA-TRAN)
  • Preserve all of the valid orderings during merging. #8169 (mustafasrepo)
  • Make fields of ScalarUDF , AggregateUDF and WindowUDF non pub #8079 (alamb)
  • Fix logical conflicts #8187 (tustvold)
  • Minor: Update JoinHashMap comment example to make it clearer #8154 (alamb)
  • Implement StreamTable and StreamTableProvider (#7994) #8021 (tustvold)
  • [MINOR]: Remove unused Results #8189 (mustafasrepo)
  • Minor: clean up the code based on clippy #8179 (Weijun-H)
  • Minor: simplify filter statistics code #8174 (alamb)
  • Replace macro with function for array_position and array_positions #8170 (jayzhan211)
  • Add Library Guide for User Defined Functions: Window/Aggregate #8171 (Veeupup)
  • Add more stream docs #8192 (tustvold)
  • Implement func array_pop_front #8142 (Veeupup)
  • Moving arrow_files SQL tests to sqllogictest #8217 (edmondop)
  • fix regression in the use of name in ProjectionPushdown #8219 (alamb)
  • [MINOR]: Fix column indices in the planning tests #8191 (mustafasrepo)
  • Remove unnecessary reassignment #8232 (qrilka)
  • Update itertools requirement from 0.11 to 0.12 #8233 (crepererum)
  • Port tests in subqueries.rs to sqllogictest #8231 (PsiACE)
  • feat: make FixedSizeList scalar also an ArrayRef #8221 (wjones127)
  • Add versions to datafusion dependencies #8238 (andygrove)
  • feat: to_array_of_size for ScalarValue::FixedSizeList #8225 (wjones127)
  • feat:implement calcite style 'levenshtein' string function #8168 (Syleechan)
  • feat: roundtrip FixedSizeList Scalar to protobuf #8239 (wjones127)
  • Update prost-build requirement from =0.12.1 to =0.12.2 #8244 (dependabot[bot])
  • Minor: Port tests in displayable.rs to sqllogictest #8246 (Weijun-H)
  • Minor: add with_estimated_selectivity to Precision #8177 (alamb)
  • fix: Timestamp with timezone not considered join on #8150 (ACking-you)
  • Replace macro in array_array to remove duplicate codes #8252 (Veeupup)
  • Port tests in projection.rs to sqllogictest #8240 (PsiACE)
  • Introduce array_except function #8135 (jayzhan211)
  • Port tests in describe.rs to sqllogictest #8242 (Asura7969)
  • Remove FileWriterMode and ListingTableInsertMode (#7994) #8017 (tustvold)
  • Minor: clean up the code based on Clippy #8257 (Weijun-H)
  • Update arrow 49.0.0 and object_store 0.8.0 #8029 (tustvold)
  • feat: impl the basic string_agg function #8148 (haohuaijin)
  • Minor: Make schema of grouping set columns nullable #8248 (markusa380)
  • feat: support simplifying BinaryExpr with arbitrary guarantees in GuaranteeRewriter #8256 (wjones127)
  • Making stream joins extensible: A new Trait implementation for SHJ #8234 (metesynnada)
  • Don't Canonicalize Filesystem Paths in ListingTableUrl / support new external tables for files that do not (yet) exist #8014 (tustvold)
  • Minor: Add sql level test for inserting into non-existent directory #8278 (alamb)
  • Replace array_has/array_has_all/array_has_any macro to remove duplicate code #8263 (Veeupup)
  • Fix bug in field level metadata matching code #8286 (alamb)
  • Refactor Interval Arithmetic Updates #8276 (berkaysynnada)
  • [MINOR]: Remove unecessary orderings from the final plan #8289 (mustafasrepo)
  • consistent logical & physical NTILE return types #8270 (korowa)
  • make array_union/array_except/array_intersect handle empty/null arrays rightly #8269 (Veeupup)
  • improve file path validation when reading parquet #8267 (Weijun-H)
  • [Benchmarks] Make partitions default to number of cores instead of 2 #8292 (andygrove)
  • Update prost-build requirement from =0.12.2 to =0.12.3 #8298 (dependabot[bot])
  • Fix Display for List #8261 (jayzhan211)
  • feat: support customizing column default values for inserting #8283 (jonahgao)
  • support LargeList for arrow_cast, support ScalarValue::LargeList #8290 (Weijun-H)
  • Minor: remove useless clone based on Clippy #8300 (Weijun-H)
  • Calculate ordering equivalence for expressions (rather than just columns) #8281 (mustafasrepo)
  • Fix sqllogictests link in contributor-guide/index.md #8314 (qrilka)
  • Refactor: Unify Expr::ScalarFunction and Expr::ScalarUDF, introduce unresolved functions by name #8258 (2010YOUY01)
  • Support no distinct aggregate sum/min/max in single_distinct_to_group_by rule #8266 (haohuaijin)
  • feat:implement sql style 'substr_index' string function #8272 (Syleechan)
  • Fixing issues with for timestamp literals #8193 (comphead)
  • Projection Pushdown over StreamingTableExec #8299 (berkaysynnada)
  • minor: fix documentation #8323 (comphead)
  • fix: wrong result of range function #8313 (smallzhongfeng)
  • Minor: rename parquet.rs to parquet/mod.rs #8301 (alamb)
  • refactor: output ordering #8304 (QuenKar)
  • Update substrait requirement from 0.19.0 to 0.20.0 #8339 (dependabot[bot])
  • Port tests in aggregates.rs to sqllogictest #8316 (edmondop)
  • Library Guide: Add Using the DataFrame API #8319 (Veeupup)
  • Port tests in limit.rs to sqllogictest #8315 (zhangxffff)
  • move array function unit_tests to sqllogictest #8332 (Veeupup)
  • NTH_VALUE reverse support #8327 (mustafasrepo)
  • Optimize Projections during Logical Plan #8340 (mustafasrepo)
  • [MINOR]: Move merge projections tests to under optimize projections #8352 (mustafasrepo)
  • Add quote and escape attributes to create csv external table #8351 (Asura7969)
  • Minor: Add DataFrame test #8341 (alamb)
  • Minor: clean up the code based on Clippy #8359 (Weijun-H)
  • Minor: Make it easier to work with Expr::ScalarFunction #8350 (alamb)
  • Minor: Move some datafusion-optimizer::utils down to datafusion-expr::utils #8354 (Jesse-Bakker)
  • Minor: Make BuiltInScalarFunction::alias a method #8349 (alamb)
  • Extract parquet statistics to its own module, add tests #8294 (alamb)
  • feat:implement sql style 'find_in_set' string function #8328 (Syleechan)
  • Support LargeUtf8 to Temporal Coercion #8357 (jayzhan211)
  • Refactor aggregate function handling #8358 (Weijun-H)
  • Implement Aliases for ScalarUDF #8360 (Veeupup)
  • Minor: Remove unnecessary name field in ScalarFunctionDefintion #8365 (alamb)
  • feat: support LargeList in array_empty #8321 (Weijun-H)
  • Double type argument for to_timestamp function #8159 (spaydar)
  • Support User Defined Table Function #8306 (Veeupup)
  • Document timestamp input limits #8369 (comphead)
  • fix: make ntile work in some corner cases #8371 (haohuaijin)
  • Minor: Refactor array_union function to use a generic union_arrays function #8381 (Weijun-H)
  • Minor: Refactor function argument handling in ScalarFunctionDefinition #8387 (Weijun-H)
  • Materialize dictionaries in group keys #8291 (qrilka)
  • Rewrite array_ndims to fix List(Null) handling #8320 (jayzhan211)
  • Docs: Improve the documentation on ScalarValue #8378 (alamb)
  • Avoid concat for array_replace #8337 (jayzhan211)
  • add a summary table to benchmark compare output #8399 (razeghi71)
  • Refactors on TreeNode Implementations #8395 (berkaysynnada)
  • feat: support LargeList in make_array and array_length #8121 (Weijun-H)
  • remove unalias TableScan filters when create Physical Filter #8404 (jackwener)
  • Update custom-table-providers.md #8409 (nickpoorman)
  • fix transforming LogicalPlan::Explain use TreeNode::transform fails #8400 (haohuaijin)
  • Docs: Fix array_except documentation example error #8407 (Asura7969)
  • Support named query parameters #8384 (Asura7969)
  • Minor: Add installation link to README.md #8389 (Weijun-H)
  • Update code comment for the cases of regularized RANGE frame and add tests for ORDER BY cases with RANGE frame #8410 (viirya)
  • Minor: Add example with parameters to LogicalPlan #8418 (alamb)
  • Minor: Improve PruningPredicate documentation #8394 (alamb)
  • feat: ScalarValue from String #8411 (QuenKar)
  • Bump actions/labeler from 4.3.0 to 5.0.0 #8422 (dependabot[bot])
  • Update sqlparser requirement from 0.39.0 to 0.40.0 #8338 (dependabot[bot])
  • feat: support LargeList for array_has, array_has_all and array_has_any #8322 (Weijun-H)
  • Union schema can't be a subset of the child schema #8408 (jackwener)
  • Move PartitionSearchMode into datafusion_physical_plan, rename to InputOrderMode #8364 (alamb)
  • Make filter selectivity for statistics configurable #8243 (edmondop)
  • fix: Changed labeler.yml to latest format #8431 (viirya)
  • Minor: Use ScalarValue::from impl for strings #8429 (alamb)
  • Support crossjoin in substrait. #8427 (my-vegetable-has-exploded)
  • Fix ambiguous reference when aliasing in combination with ORDER BY #8425 (Asura7969)
  • Minor: convert marcro list-slice and slice to function #8424 (Weijun-H)
  • Remove macro in iter_to_array for List #8414 (jayzhan211)
  • fix: Literal in ORDER BY window definition should not be an ordinal referring to relation column #8419 (viirya)
  • feat: customize column default values for external tables #8415 (jonahgao)
  • feat: Support array_sort(list_sort) #8279 (Asura7969)
  • Bugfix: Remove df-cli specific SQL statment options before executing with DataFusion #8426 (devinjdangelo)
  • Detect when filters on unique constraints make subqueries scalar #8312 (Jesse-Bakker)
  • Add alias check to optimize projections merge #8438 (mustafasrepo)
  • Fix PartialOrd for ScalarValue::List/FixSizeList/LargeList #8253 (jayzhan211)
  • Support parquet_metadata for datafusion-cli #8413 (Veeupup)
  • Fix bug in optimizing a nested count #8459 (Dandandan)
  • Bump actions/setup-python from 4 to 5 #8449 (dependabot[bot])
  • fix: ORDER BY window definition should work on null literal #8444 (viirya)
  • flx clippy warnings #8455 (waynexia)
  • fix: RANGE frame for corner cases with empty ORDER BY clause should be treated as constant sort #8445 (viirya)
  • Preserve dict_id on Field during serde roundtrip #8457 (avantgardnerio)
  • feat: support InterleaveExecNode in the proto #8460 (liukun4515)
  • [BUG FIX]: Proper Empty Batch handling in window execution #8466 (mustafasrepo)
  • Minor: update cast #8458 (Weijun-H)
  • fix: don't unifies projection if expr is non-trival #8454 (haohuaijin)
  • Minor: Add new bloom filter predicate tests #8433 (alamb)
  • Add PRIMARY KEY Aggregate support to dataframe API #8356 (mustafasrepo)
  • Minor: refactor data_trunc to reduce duplicated code #8430 (Weijun-H)
  • Support array_distinct function. #8268 (my-vegetable-has-exploded)
  • Add primary key support to stream table #8467 (mustafasrepo)
  • Add evaluate_demo and range_analysis_demo to Expr examples #8377 (alamb)
  • Minor: fix function name typo #8473 (Weijun-H)
  • Minor: Fix comment typo in table.rs: s/indentical/identical/ #8469 (KeunwooLee-at)
  • Remove define_array_slice and reuse array_slice for array_pop_front/back #8401 (jayzhan211)
  • Minor: refactor trim to clean up duplicated code #8434 (Weijun-H)
  • Split EmptyExec into PlaceholderRowExec #8446 (razeghi71)
  • Enable non-uniform field type for structs created in DataFusion #8463 (dlovell)
  • Minor: Add multi ordering test for array agg order #8439 (jayzhan211)
  • Sort filenames when reading parquet to ensure consistent schema #6629 (thomas-k-cameron)
  • Minor: Improve comments in EnforceDistribution tests #8474 (alamb)
  • fix: support uppercase when parsing Interval #8478 (QuenKar)
  • Better Equivalence (ordering and exact equivalence) Propagation through ProjectionExec #8484 (mustafasrepo)
  • Add today alias for current_date #8423 (smallzhongfeng)
  • Minor: remove useless clone in array_expression #8495 (Weijun-H)
  • fix: incorrect set preserve_partitioning in SortExec #8485 (haohuaijin)
  • Explicitly mark parquet for tests in datafusion-common #8497 (Dennis40816)
  • Minor/Doc: Clarify DataFrame::write_table Documentation #8519 (devinjdangelo)
  • fix: Pull stats in IdentVisitor/GraphvizVisitor only when requested #8514 (vrongmeal)
  • Change display of RepartitionExec from SortPreservingRepartitionExec to RepartitionExec preserve_order=true #8521 (JacobOgle)
  • Fix DataFrame::cache errors with Plan("Mismatch between schema and batches") #8510 (Asura7969)
  • Minor: update pbjson_dependency #8470 (alamb)
  • Minor: Update prost-derive dependency #8471 (alamb)
  • Minor/Doc: Add DataFrame::write_table to DataFrame user guide #8527 (devinjdangelo)
  • Minor: Add repartition_file.slt end to end test for repartitioning files, and supporting tweaks #8505 (alamb)
  • Prepare version 34.0.0 #8508 (andygrove)
  • refactor: use ExprBuilder to consume substrait expr and use macro to generate error #8515 (waynexia)
  • [MINOR]: Make some slt tests deterministic #8525 (mustafasrepo)
  • fix: volatile expressions should not be target of common subexpt elimination #8520 (viirya)
  • Minor: Add LakeSoul to the list of Known Users #8536 (xuchen-plus)
  • Fix regression with Incorrect results when reading parquet files with different schemas and statistics #8533 (alamb)
  • feat: improve string statistics display in datafusion-cli parquet_metadata function #8535 (asimsedhain)
  • Defer file creation to write #8539 (tustvold)
  • Minor: Improve error handling in sqllogictest runner #8544 (alamb)