Back to Datafusion

35.0.0

dev/changelog/35.0.0.md

53.1.032.1 KB
Original Source
<!--- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -->

35.0.0 (2024-01-20)

Full Changelog

Breaking changes:

  • Minor: make SubqueryAlias::try_new take Arc<LogicalPlan> #8542 (sadboy)
  • Remove ListingTable and FileScanConfig Unbounded (#8540) #8573 (tustvold)
  • Rename ParamValues::{LIST -> List,MAP -> Map} #8611 (kawadakk)
  • Rename expr::window_function::WindowFunction to WindowFunctionDefinition, make structure consistent with ScalarFunction #8382 (edmondop)
  • Implement ScalarUDF in terms of ScalarUDFImpl trait #8713 (alamb)
  • Change ScalarValue::{List, LargeList, FixedSizedList} to take specific types rather than ArrayRef #8562 (rspears74)
  • Remove unused array_expression.rs and SUPPORTED_ARRAY_TYPES #8807 (alamb)
  • Simplify physical expression creation API (not require schema) #8823 (comphead)
  • Determine causal window frames to produce early results. #8842 (mustafasrepo)

Implemented enhancements:

  • feat: implement Unary Expr in substrait #8534 (waynexia)
  • feat: implement Repartition plan in substrait #8526 (waynexia)
  • feat: support largelist in array_slice #8561 (Weijun-H)
  • feat: support LargeList in array_positions #8571 (Weijun-H)
  • feat: support LargeList in array_element #8570 (Weijun-H)
  • feat: support LargeList in array_dims #8592 (Weijun-H)
  • feat: support LargeList in array_remove #8595 (Weijun-H)
  • feat: support inlist in LiteralGurantee for pruning #8654 (my-vegetable-has-exploded)
  • feat: support 'LargeList' in array_pop_front and array_pop_back #8569 (Weijun-H)
  • feat: support LargeList in array_position #8714 (Weijun-H)
  • feat: support LargeList in array_ndims #8716 (Weijun-H)
  • feat: remove filters with null constants #8700 (asimsedhain)
  • feat: support LargeList in array_repeat #8725 (Weijun-H)
  • feat: native types in DistinctCountAccumulator for primitive types #8721 (korowa)
  • feat: support LargeList in cardinality #8726 (Weijun-H)
  • feat: support largelist in array_to_string #8729 (Weijun-H)
  • feat: Add bloom filter metric to ParquetExec #8772 (my-vegetable-has-exploded)
  • feat: support array_resize #8744 (Weijun-H)
  • feat: add more components to the wasm-pack compatible list #8843 (waynexia)

Fixed bugs:

  • fix: make sure CASE WHEN pick first true branch when WHEN clause is true #8477 (haohuaijin)
  • fix: Antarctica/Vostok tz offset changed in chrono-tz 0.8.5 #8677 (korowa)
  • fix: struct field don't push down to TableScan #8774 (haohuaijin)
  • fix: failed to create ValuesExec with non-nullable schema #8776 (jonahgao)
  • fix: fix markdown table in docs #8812 (tshauck)
  • fix: don't extract common sub expr in CASE WHEN clause #8833 (haohuaijin)

Documentation updates:

  • docs: update udf docs for udtf #8546 (tshauck)
  • Doc: Clarify When Limit is Pushed Down to TableProvider::Scan #8686 (devinjdangelo)
  • Minor: Improve PruningPredicate docstrings #8748 (alamb)
  • Minor: Add documentation about stream cancellation #8747 (alamb)
  • docs: add sudo for install commands #8804 (caicancai)
  • docs: document SessionConfig #8771 (wjones127)
  • Upgrade to object_store 0.9.0 and arrow 50.0.0 #8758 (tustvold)
  • docs: fix wrong pushdown name & a typo #8875 (SteveLauC)
  • docs: Update contributor guide with installation instructions #8876 (caicancai)
  • docs: fix wrong name in sub-crates' README #8889 (SteveLauC)
  • docs: add an example for RecordBatchReceiverStreamBuilder #8888 (SteveLauC)

Merged pull requests:

  • Remove order_bys from AggregateExec state #8537 (mustafasrepo)
  • Fix count(null) and count(distinct null) #8511 (joroKr21)
  • Minor: reduce code duplication in date_bin_impl #8528 (Weijun-H)
  • Add metrics for UnnestExec #8482 (simonvandel)
  • Prepare 34.0.0-rc3 #8549 (andygrove)
  • fix: make sure CASE WHEN pick first true branch when WHEN clause is true #8477 (haohuaijin)
  • Minor: make SubqueryAlias::try_new take Arc<LogicalPlan> #8542 (sadboy)
  • Fallback on null empty value in ExprBoundaries::try_from_column #8501 (razeghi71)
  • Add test for DataFrame::write_table #8531 (devinjdangelo)
  • [MINOR]: Generate empty column at placeholder exec #8553 (mustafasrepo)
  • Minor: Remove now dead SUPPORTED_STRUCT_TYPES #8480 (alamb)
  • [MINOR]: Add getter methods to first and last value #8555 (mustafasrepo)
  • [MINOR]: Some code changes and a new empty batch guard for SHJ #8557 (metesynnada)
  • docs: update udf docs for udtf #8546 (tshauck)
  • feat: implement Unary Expr in substrait #8534 (waynexia)
  • Fix compute_record_batch_statistics wrong with projection #8489 (Asura7969)
  • Minor: Cleanup warning in scalar.rs test #8563 (jayzhan211)
  • Minor: move some invariants out of the loop #8564 (haohuaijin)
  • feat: implement Repartition plan in substrait #8526 (waynexia)
  • Fix sort order aware file group parallelization #8517 (alamb)
  • feat: support largelist in array_slice #8561 (Weijun-H)
  • minor: fix to support scalars #8559 (comphead)
  • refactor: HashJoinStream state machine #8538 (korowa)
  • Remove ListingTable and FileScanConfig Unbounded (#8540) #8573 (tustvold)
  • Update substrait requirement from 0.20.0 to 0.21.0 #8574 (dependabot[bot])
  • [minor]: Fix rank calculation bug when empty order by is seen #8567 (mustafasrepo)
  • Add LiteralGuarantee on columns to extract conditions required for PhysicalExpr expressions to evaluate to true #8437 (alamb)
  • [MINOR]: Parametrize sort-preservation tests to exercise all situations (unbounded/bounded sources and flag behavior) #8575 (mustafasrepo)
  • Minor: Add some comments to scalar_udf example #8576 (alamb)
  • Move Coercion for MakeArray to coerce_arguments_for_signature and introduce another one for ArrayAppend #8317 (jayzhan211)
  • feat: support LargeList in array_positions #8571 (Weijun-H)
  • feat: support LargeList in array_element #8570 (Weijun-H)
  • Increase test coverage for unbounded and bounded cases #8581 (mustafasrepo)
  • Port tests in parquet.rs to sqllogictest #8560 (hiltontj)
  • Minor: avoid a copy in Expr::unalias #8588 (alamb)
  • Minor: support complex expr as the arg in the ApproxPercentileCont function #8580 (liukun4515)
  • Bugfix: Add functional dependency check and aggregate try_new schema #8584 (mustafasrepo)
  • Remove GroupByOrderMode #8593 (ozankabak)
  • Minor: replace not-impl-err in array_expression #8589 (Weijun-H)
  • Substrait insubquery #8363 (tgujar)
  • Minor: port last test from parquet.rs #8587 (alamb)
  • Minor: consolidate map sqllogictest tests #8550 (alamb)
  • feat: support LargeList in array_dims #8592 (Weijun-H)
  • Fix regression in regenerating protobuf source #8603 (andygrove)
  • Remove unbounded_input from FileSinkOptions #8605 (devinjdangelo)
  • Add arrow_err! macros, optional backtrace to ArrowError #8586 (comphead)
  • Add examples of DataFrame::write* methods without S3 dependency #8606 (devinjdangelo)
  • Implement logical plan serde for CopyTo #8618 (andygrove)
  • Fix InListExpr to return the correct number of rows #8601 (alamb)
  • Remove ListingTable single_file option #8604 (devinjdangelo)
  • feat: support LargeList in array_remove #8595 (Weijun-H)
  • Rename ParamValues::{LIST -> List,MAP -> Map} #8611 (kawadakk)
  • Support binary temporal coercion for Date64 and Timestamp types #8616 (Asura7969)
  • Add new configuration item listing_table_ignore_subdirectory #8565 (Asura7969)
  • Optimize the parameter types of ParamValues's methods #8613 (kawadakk)
  • Do not panic on zero placeholders in ParamValues::get_placeholders_with_values #8615 (kawadakk)
  • Fix #8507: Non-null sub-field on nullable struct-field has wrong nullity #8623 (marvinlanhenke)
  • Implement contained API in PruningPredicate #8440 (alamb)
  • Add partial serde support for ParquetWriterOptions #8627 (andygrove)
  • Minor: add arguments length check in array_expressions #8622 (Weijun-H)
  • Minor: improve dataframe functional dependency tests #8630 (alamb)
  • Improve regexp_match performance by avoiding cloning Regex #8631 (viirya)
  • Minor: improve listing_table_ignore_subdirectory config documentation #8634 (alamb)
  • Support Writing Arrow files #8608 (devinjdangelo)
  • Filter pushdown into cross join #8626 (mustafasrepo)
  • [MINOR] Remove duplicate test utility and move one utility function for better organization #8652 (metesynnada)
  • [MINOR]: Add new test for filter pushdown into cross join #8648 (mustafasrepo)
  • Rewrite bloom filters to use contains API #8442 (alamb)
  • Split equivalence code into smaller modules. #8649 (tushushu)
  • Move parquet_schema.rs from sql to parquet tests #8644 (alamb)
  • Fix group by aliased expression in LogicalPLanBuilder::aggregate #8629 (alamb)
  • Refactor array_union and array_intersect functions to one general function #8516 (Weijun-H)
  • Minor: avoid extra clone in datafusion-proto::physical_plan #8650 (ongchi)
  • Minor: name some constant values in arrow writer, parquet writer #8642 (alamb)
  • TreeNode Refactor Part 2 #8653 (berkaysynnada)
  • feat: support inlist in LiteralGurantee for pruning #8654 (my-vegetable-has-exploded)
  • Streaming CLI support #8651 (berkaysynnada)
  • Add serde support for CSV FileTypeWriterOptions #8641 (andygrove)
  • Add trait based ScalarUDF API #8578 (alamb)
  • Handle ordering of first last aggregation inside aggregator #8662 (mustafasrepo)
  • feat: support 'LargeList' in array_pop_front and array_pop_back #8569 (Weijun-H)
  • chore: rename ceresdb to apache horaedb #8674 (tanruixiang)
  • Minor: clean up code #8671 (Weijun-H)
  • fix: Antarctica/Vostok tz offset changed in chrono-tz 0.8.5 #8677 (korowa)
  • Make the BatchSerializer behind Arc to avoid unnecessary struct creation #8666 (metesynnada)
  • Implement serde for CSV and Parquet FileSinkExec #8646 (andygrove)
  • [pruning] Add shortcut when all units have been pruned #8675 (Ted-Jiang)
  • Change first/last implementation to prevent redundant comparisons when data is already sorted #8678 (mustafasrepo)
  • minor: remove useless conversion #8684 (comphead)
  • refactor: modified JoinHashMap build order for HashJoinStream #8658 (korowa)
  • Start setting up tpch planning benchmarks #8665 (matthewmturner)
  • Doc: Clarify When Limit is Pushed Down to TableProvider::Scan #8686 (devinjdangelo)
  • Closes #8502: Parallel NDJSON file reading #8659 (marvinlanhenke)
  • Improve array_prepend signature for null and empty array #8625 (jayzhan211)
  • Cleanup TreeNode implementations #8672 (viirya)
  • Update sqlparser requirement from 0.40.0 to 0.41.0 #8647 (dependabot[bot])
  • Update scalar functions doc for extract/datepart #8682 (Jefffrey)
  • Remove DescribeTableStmt in parser in favour of existing functionality from sqlparser-rs #8703 (Jefffrey)
  • Simplify NULL [NOT] IN (..) expressions #8691 (asimsedhain)
  • Rename expr::window_function::WindowFunction to WindowFunctionDefinition, make structure consistent with ScalarFunction #8382 (edmondop)
  • Deprecate duplicate function LogicalPlan::with_new_inputs #8707 (viirya)
  • Minor: refactor bloom filter tests to reduce duplication #8435 (alamb)
  • Minor: clean up code based on Clippy #8715 (Weijun-H)
  • Minor: Unbounded Output of AnalyzeExec #8717 (berkaysynnada)
  • feat: support LargeList in array_position #8714 (Weijun-H)
  • feat: support LargeList in array_ndims #8716 (Weijun-H)
  • feat: remove filters with null constants #8700 (asimsedhain)
  • support LargeList in array_prepend and array_append #8679 (Weijun-H)
  • Support for extract(epoch from date) for Date32 and Date64 #8695 (Jefffrey)
  • Implement trait based API for defining WindowUDF #8719 (guojidan)
  • Minor: Introduce utils::hash for StructArray #8552 (jayzhan211)
  • [CI] Improve windows machine CI test time #8730 (comphead)
  • fix guarantees in allways_true of PruningPredicate #8732 (my-vegetable-has-exploded)
  • Minor: Avoid memory copy in construct window exprs #8718 (Ted-Jiang)
  • feat: support LargeList in array_repeat #8725 (Weijun-H)
  • Minor: Ctrl+C Termination in CLI #8739 (berkaysynnada)
  • Add support for functional dependency for ROW_NUMBER window function. #8737 (mustafasrepo)
  • Minor: reduce code duplication in PruningPredicate test #8441 (alamb)
  • feat: native types in DistinctCountAccumulator for primitive types #8721 (korowa)
  • [MINOR]: Add a test case for when target partition is 1, no hash repartition is added to the plan. #8757 (mustafasrepo)
  • Minor: Improve PruningPredicate docstrings #8748 (alamb)
  • feat: support LargeList in cardinality #8726 (Weijun-H)
  • Add reproducer for #8738 #8750 (alamb)
  • Minor: Use faster check for column name in schema merge #8765 (matthewmturner)
  • Minor: Add documentation about stream cancellation #8747 (alamb)
  • Move repartition_file_scans out of enable_round_robin check in EnforceDistribution rule #8731 (viirya)
  • Clean internal implementation of WindowUDF #8746 (guojidan)
  • feat: support largelist in array_to_string #8729 (Weijun-H)
  • [MINOR] CLI error handling on streaming use cases #8761 (metesynnada)
  • Convert Binary Operator StringConcat to Function for array_concat, array_append and array_prepend #8636 (jayzhan211)
  • Minor: Fix incorrect indices for hashing struct #8775 (jayzhan211)
  • Minor: Improve library docs to mention TreeNode, ExprSimplifier, PruningPredicate and cp_solver #8749 (alamb)
  • [MINOR] Add logo source files #8762 (andygrove)
  • Add Apache attribution to site footer #8760 (alamb)
  • ci: speed up win64 test #8728 (Jefffrey)
  • Add schema_err! error macros with optional backtrace #8620 (comphead)
  • Fix regression by reverting Materialize dictionaries in group keys #8740 (alamb)
  • fix: struct field don't push down to TableScan #8774 (haohuaijin)
  • Implement ScalarUDF in terms of ScalarUDFImpl trait #8713 (alamb)
  • Minor: Fix error messages in array expressions #8781 (Weijun-H)
  • Move tests from expr.rs to sqllogictests. Part1 #8773 (comphead)
  • Permit running sqllogictest as a rust test in IDEs (+ use clap for sqllogicttest parsing, accept (and ignore) rust test harness arguments) #8288 (alamb)
  • Minor: Use standard tree walk in Projection Pushdown #8787 (alamb)
  • Implement trait based API for define AggregateUDF #8733 (guojidan)
  • Minor: Improve DataFusionError documentation #8792 (alamb)
  • fix: failed to create ValuesExec with non-nullable schema #8776 (jonahgao)
  • Update substrait requirement from 0.21.0 to 0.22.1 #8796 (dependabot[bot])
  • Bump follow-redirects from 1.15.3 to 1.15.4 in /datafusion/wasmtest/datafusion-wasm-app #8798 (dependabot[bot])
  • Minor: array_pop_first should be array_pop_front in documentation #8797 (ongchi)
  • feat: Add bloom filter metric to ParquetExec #8772 (my-vegetable-has-exploded)
  • Add note on using larger row group size #8745 (twitu)
  • Change ScalarValue::{List, LargeList, FixedSizedList} to take specific types rather than ArrayRef #8562 (rspears74)
  • fix: fix markdown table in docs #8812 (tshauck)
  • docs: add sudo for install commands #8804 (caicancai)
  • Standardize CompressionTypeVariant encoding in protobuf #8785 (tushushu)
  • Make benefits_from_input_partitioning Default in SHJ #8801 (metesynnada)
  • refactor: standardize exec_from funcs arg order #8809 (tshauck)
  • [Minor] extract const and add doc and more tests for in_list pruning #8815 (Ted-Jiang)
  • [MINOR]: Add size check for aggregate #8813 (mustafasrepo)
  • Minor: chores: Update clippy in pre-commit.sh #8810 (my-vegetable-has-exploded)
  • Cleanup the usage of round-robin repartitioning #8794 (viirya)
  • Implement monotonicity for ScalarUDF #8799 (guojidan)
  • Remove unused array_expression.rs and SUPPORTED_ARRAY_TYPES #8807 (alamb)
  • feat: support array_resize #8744 (Weijun-H)
  • Minor: typo in arrays.slt #8831 (Weijun-H)
  • docs: document SessionConfig #8771 (wjones127)
  • Minor: Improve datafusion-proto documentation #8822 (alamb)
  • [CI] Refactor CI builders #8826 (comphead)
  • Serialize function signature simplifications #8802 (metesynnada)
  • Port tests in group_by.rs to sqllogictest #8834 (hiltontj)
  • Simplify physical expression creation API (not require schema) #8823 (comphead)
  • feat: add more components to the wasm-pack compatible list #8843 (waynexia)
  • Port tests in timestamp.rs to sqllogictest. Part 1 #8818 (caicancai)
  • Upgrade to object_store 0.9.0 and arrow 50.0.0 #8758 (tustvold)
  • Fix ApproxPercentileCont signature #8825 (joroKr21)
  • Minor: Update with_column_rename method doc #8858 (comphead)
  • Minor: Document parquet_metadata function #8852 (alamb)
  • Speedup new_with_metadata by removing sort #8855 (simonvandel)
  • Minor: fix wrong function call #8847 (Weijun-H)
  • Add options of parquet bloom filter and page index in Session config #8869 (Ted-Jiang)
  • Port tests in timestamp.rs to sqllogictest #8859 (caicancai)
  • test: Port order.rs tests to sqllogictest #8857 (simicd)
  • Determine causal window frames to produce early results. #8842 (mustafasrepo)
  • docs: fix wrong pushdown name & a typo #8875 (SteveLauC)
  • fix: don't extract common sub expr in CASE WHEN clause #8833 (haohuaijin)
  • Add "Extended" clickbench queries #8861 (alamb)
  • Change cli to propagate error to exit code #8856 (tshauck)
  • test: Port tests in predicates.rs to sqllogictest #8879 (simicd)
  • docs: Update contributor guide with installation instructions #8876 (caicancai)
  • Minor: add tests for casts between nested List and LargeList #8882 (Weijun-H)
  • Disable Parallel Parquet Writer by Default, Improve Writing Test Coverage #8854 (devinjdangelo)
  • Support for order sensitive NTH_VALUE aggregation, make reverse ARRAY_AGG more efficient #8841 (mustafasrepo)
  • test: Port tests in csv_files.rs to sqllogictest #8885 (simicd)
  • test: Port tests in references.rs to sqllogictest #8877 (simicd)
  • fix bug with to_timestamp and InitCap logical serialization, add roundtrip test between expression and proto, #8868 (Weijun-H)
  • Support LargeListArray scalar values and align_array_dimensions #8881 (Weijun-H)
  • refactor: rename FileStream.file_reader to file_opener & update doc #8883 (SteveLauC)
  • docs: fix wrong name in sub-crates' README #8889 (SteveLauC)
  • Recursive CTEs: Stage 1 - add config flag #8828 (matthewgapp)
  • Support array literal with scalar function #8884 (jayzhan211)
  • Bump actions/cache from 3 to 4 #8903 (dependabot[bot])
  • Fix datafusion-cli print output #8895 (alamb)
  • docs: add an example for RecordBatchReceiverStreamBuilder #8888 (SteveLauC)
  • Fix "Projection references non-aggregate values" by updating rebase_expr to use transform_down #8890 (wizardxz)
  • Add serde support for Arrow FileTypeWriterOptions #8850 (tushushu)
  • Improve datafusion-cli print format tests #8896 (alamb)
  • Recursive CTEs: Stage 2 - add support for sql -> logical plan generation #8839 (matthewgapp)
  • Minor: remove null in array-append and array-prepend #8901 (Weijun-H)
  • Add support for FixedSizeList type in arrow_cast, hashing #8344 (Weijun-H)
  • aggregate_statistics should only optimize MIN/MAX when relation is not empty #8914 (viirya)
  • support to_timestamp with optional chrono formats #8886 (Omega359)
  • Minor: Document third argument of date_bin as optional and default value #8912 (alamb)
  • Minor: distinguish parquet row group pruning type in unit test #8921 (Ted-Jiang)