Back to Datafusion

33.0.0

dev/changelog/33.0.0.md

53.1.031.4 KB
Original Source
<!--- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -->

33.0.0 (2023-11-12)

Full Changelog

Breaking changes:

  • Refactor Statistics, introduce precision estimates (Exact, Inexact, Absent) #7793 (berkaysynnada)
  • Remove redundant unwrap in ScalarValue::new_primitive, return a Result #7830 (maruschin)
  • Add parquet feature flag, enabled by default, and make parquet conditional #7745 (ongchi)
  • Change input for to_timestamp function to be seconds rather than nanoseconds, add to_timestamp_nanos #7844 (comphead)
  • Percent Decode URL Paths (#8009) #8012 (tustvold)
  • chore: remove panics in datafusion-common::scalar by making more operations return Result #7901 (junjunjd)
  • Combine Expr::Wildcard and Wxpr::QualifiedWildcard, add wildcard() expr fn #8105 (alamb)

Performance related:

  • Add distinct union optimization #7788 (maruschin)
  • Fix join order for TPCH Q17 & Q18 by improving FilterExec statistics #8126 (andygrove)
  • feat: add column statistics into explain #8112 (NGA-TRAN)

Implemented enhancements:

  • Support InsertInto Sorted ListingTable #7743 (devinjdangelo)
  • External Table Primary key support #7755 (mustafasrepo)
  • add interval arithmetic for timestamp types #7758 (mhilton)
  • Interval Arithmetic NegativeExpr Support #7804 (berkaysynnada)
  • Exactness Indicator of Parameters: Precision #7809 (berkaysynnada)
  • Implement GetIndexedField for map-typed columns #7825 (swgillespie)
  • Fix precision loss when coercing date_part utf8 argument #7846 (Dandandan)
  • Support Binary/LargeBinary --> Utf8/LargeUtf8 in ilike and string functions #7840 (alamb)
  • Support Decimal256 on AVG aggregate expression #7853 (viirya)
  • Support Decimal256 column in create external table #7866 (viirya)
  • Support Decimal256 in Min/Max aggregate expressions #7881 (viirya)
  • Implement Hive-Style Partitioned Write Support #7801 (devinjdangelo)
  • feat: support Decimal256 for the abs function #7904 (jonahgao)
  • Parallelize Serialization of Columns within Parquet RowGroups #7655 (devinjdangelo)
  • feat: Use bloom filter when reading parquet to skip row groups #7821 (hengfeiyang)
  • Support Partitioning Data by Dictionary Encoded String Array Types #7896 (devinjdangelo)
  • Read only enough bytes to infer Arrow IPC file schema via stream #7962 (Jefffrey)
  • feat: Support determining extensions from names like foo.parquet.snappy as well as foo.parquet #7972 (Weijun-H)
  • feat: Protobuf serde for Json file sink #8062 (Jefffrey)
  • feat: support target table alias in update statement #8080 (jonahgao)
  • feat: support UDAF in substrait producer/consumer #8119 (waynexia)

Fixed bugs:

  • fix: preserve column qualifier for DataFrame::with_column #7792 (jonahgao)
  • fix: don't push down volatile predicates in projection #7909 (haohuaijin)
  • fix: generate logical plan for UPDATE SET FROM statement #7984 (jonahgao)
  • fix: single_distinct_aggretation_to_group_by fail #7997 (haohuaijin)
  • fix: clippy warnings from nightly rust 1.75 #8025 (waynexia)
  • fix: DataFusion suggests invalid functions #8083 (jonahgao)
  • fix: add encode/decode to protobuf encoding #8089 (Syleechan)

Documentation updates:

  • Minor: Improve TableProvider document, and add ascii art #7759 (alamb)
  • Expose arrow-schema serde crate feature flag #7829 (lewiszlw)
  • doc: fix ExecutionContext to SessionContext in custom-table-providers.md #7903 (ZENOTME)
  • Minor: Document parquet crate feature #7927 (alamb)
  • Add some initial content about creating logical plans #7952 (andygrove)
  • Minor: Add implementation examples to ExecutionPlan::execute #8013 (tustvold)
  • Minor: Improve documentation for Filter Pushdown #8023 (alamb)
  • Minor: Improve ExecutionPlan documentation #8019 (alamb)
  • Improve comments for PartitionSearchMode struct #8047 (ozankabak)
  • Prepare 33.0.0 Release #8057 (andygrove)
  • Improve documentation for calculate_prune_length method in SymmetricHashJoin #8125 (Asura7969)
  • docs: show creation of DFSchema #8132 (wjones127)
  • Improve documentation site to make it easier to find communication on Slack/Discord #8138 (alamb)

Merged pull requests:

  • Minor: Improve TableProvider document, and add ascii art #7759 (alamb)
  • Prepare 32.0.0 Release #7769 (andygrove)
  • Minor: Change all file links to GitHub in document #7768 (ongchi)
  • Minor: Improve PruningPredicate documentation #7738 (alamb)
  • Support InsertInto Sorted ListingTable #7743 (devinjdangelo)
  • Minor: improve documentation to stagger_batch #7754 (alamb)
  • External Table Primary key support #7755 (mustafasrepo)
  • Minor: Build array_array() with ListArray construction instead of ArrayData #7780 (jayzhan211)
  • Minor: Remove unnecessary #[cfg(feature = "avro")] #7773 (sarutak)
  • add interval arithmetic for timestamp types #7758 (mhilton)
  • Minor: make tests deterministic #7771 (Weijun-H)
  • Minor: Improve Interval Docs #7782 (alamb)
  • DataSink additions #7778 (Dandandan)
  • Update substrait requirement from 0.15.0 to 0.16.0 #7783 (dependabot[bot])
  • Move nested union optimization from plan builder to logical optimizer #7695 (maruschin)
  • Minor: comments that explain the schema used in simply_expressions #7747 (alamb)
  • Update regex-syntax requirement from 0.7.1 to 0.8.0 #7784 (dependabot[bot])
  • Minor: Add sql test for UNION / UNION ALL + plans #7787 (alamb)
  • fix: preserve column qualifier for DataFrame::with_column #7792 (jonahgao)
  • Interval Arithmetic NegativeExpr Support #7804 (berkaysynnada)
  • Exactness Indicator of Parameters: Precision #7809 (berkaysynnada)
  • add LogicalPlanBuilder::join_on #7805 (haohuaijin)
  • Fix SortPreservingRepartition with no existing ordering. #7811 (mustafasrepo)
  • Update zstd requirement from 0.12 to 0.13 #7806 (dependabot[bot])
  • [Minor]: Remove input_schema field from window executor #7810 (mustafasrepo)
  • refactor(7181): move streaming_merge() into separate mod from the merge node #7799 (wiedld)
  • Improve update error #7777 (lewiszlw)
  • Minor: Update LogicalPlan::join_on API, use it more #7814 (alamb)
  • Add distinct union optimization #7788 (maruschin)
  • Make CI fail on any occurrence of rust-tomlfmt failed #7774 (ongchi)
  • Encode all join conditions in a single expression field #7612 (nseekhao)
  • Update substrait requirement from 0.16.0 to 0.17.0 #7808 (dependabot[bot])
  • Minor: include sort expressions in SortPreservingRepartitionExec explain plan #7796 (alamb)
  • minor: add more document to Wildcard expr #7822 (waynexia)
  • Minor: Move Monotonicity to expr crate #7820 (2010YOUY01)
  • Use code block for better formatting of rustdoc for PhysicalGroupBy #7823 (qrilka)
  • Update explain plan to show TopK operator #7826 (haohuaijin)
  • Extract ReceiverStreamBuilder #7817 (tustvold)
  • Extend backtrace coverage for DatafusionError::Plan errors errors #7803 (comphead)
  • Add documentation and usability for prepared parameters #7785 (alamb)
  • Implement GetIndexedField for map-typed columns #7825 (swgillespie)
  • Minor: Assert streaming_merge has non empty sort exprs #7795 (alamb)
  • Minor: Upgrade docs for PhysicalExpr::{propagate_constraints, evaluate_bounds} #7812 (alamb)
  • Change ScalarValue::List to store ArrayRef #7629 (jayzhan211)
  • [MINOR]:Do not introduce unnecessary repartition when row count is 1. #7832 (mustafasrepo)
  • Minor: Add tests for binary / utf8 coercion #7839 (alamb)
  • Avoid panics on error while encoding/decoding ListValue::Array as protobuf #7837 (alamb)
  • Refactor Statistics, introduce precision estimates (Exact, Inexact, Absent) #7793 (berkaysynnada)
  • Remove redundant unwrap in ScalarValue::new_primitive, return a Result #7830 (maruschin)
  • Fix precision loss when coercing date_part utf8 argument #7846 (Dandandan)
  • Add operator section to user guide, Add std::ops operations to prelude, and add not() expr_fn #7732 (ongchi)
  • Expose arrow-schema serde crate feature flag #7829 (lewiszlw)
  • Improve ContextProvider naming: rename get_table_provider --> get_table_source, deprecate get_table_provider #7831 (lewiszlw)
  • DataSink Dynamic Execution Time Demux #7791 (devinjdangelo)
  • Add small column on empty projection #7833 (ch-sc)
  • feat(7849): coerce TIMESTAMP to TIMESTAMPTZ #7850 (mhilton)
  • Support Binary/LargeBinary --> Utf8/LargeUtf8 in ilike and string functions #7840 (alamb)
  • Minor: fix typo in comments #7856 (haohuaijin)
  • Minor: improve join / join_on docs #7813 (alamb)
  • Support Decimal256 on AVG aggregate expression #7853 (viirya)
  • Minor: fix typo in comments #7861 (alamb)
  • Minor: fix typo in GreedyMemoryPool documentation #7864 (avh4)
  • Minor: fix multiple typos #7863 (Smoothieewastaken)
  • Minor: Fix docstring typos #7873 (alamb)
  • Add CursorValues Decoupling Cursor Data from Cursor Position #7855 (tustvold)
  • Support Decimal256 column in create external table #7866 (viirya)
  • Support Decimal256 in Min/Max aggregate expressions #7881 (viirya)
  • Implement Hive-Style Partitioned Write Support #7801 (devinjdangelo)
  • Minor: fix config typo #7874 (alamb)
  • Add Decimal256 sqllogictests for SUM, MEDIAN and COUNT aggregate expressions #7889 (viirya)
  • [test] add fuzz test for topk #7772 (Tangruilin)
  • Allow Setting Minimum Parallelism with RowCount Based Demuxer #7841 (devinjdangelo)
  • Drop single quotes to make warnings for parquet options not confusing #7902 (qrilka)
  • Add multi-column topk fuzz tests #7898 (alamb)
  • Change FileScanConfig.table_partition_cols from (String, DataType) to Fields #7890 (NGA-TRAN)
  • Maintain time zone in ScalarValue::new_list #7899 (Dandandan)
  • [MINOR]: Move joinside struct to common #7908 (mustafasrepo)
  • doc: fix ExecutionContext to SessionContext in custom-table-providers.md #7903 (ZENOTME)
  • Update arrow 48.0.0 #7854 (tustvold)
  • feat: support Decimal256 for the abs function #7904 (jonahgao)
  • [MINOR] Simplify Aggregate, and Projection output_partitioning implementation #7907 (mustafasrepo)
  • Bump actions/setup-node from 3 to 4 #7915 (dependabot[bot])
  • [Bug Fix]: Fix bug, first last reverse #7914 (mustafasrepo)
  • Minor: provide default implementation for ExecutionPlan::statistics #7911 (alamb)
  • Update substrait requirement from 0.17.0 to 0.18.0 #7916 (dependabot[bot])
  • Minor: Remove unnecessary clone in datafusion_proto #7921 (ongchi)
  • [MINOR]: Simplify code, change requirement from PhysicalSortExpr to PhysicalSortRequirement #7913 (mustafasrepo)
  • [Minor] Move combine_join util to under equivalence.rs #7917 (mustafasrepo)
  • support scan empty projection #7920 (haohuaijin)
  • Cleanup logical optimizer rules. #7919 (mustafasrepo)
  • Parallelize Serialization of Columns within Parquet RowGroups #7655 (devinjdangelo)
  • feat: Use bloom filter when reading parquet to skip row groups #7821 (hengfeiyang)
  • fix: don't push down volatile predicates in projection #7909 (haohuaijin)
  • Add parquet feature flag, enabled by default, and make parquet conditional #7745 (ongchi)
  • [MINOR]: Simplify enforce_distribution, minor changes #7924 (mustafasrepo)
  • Add simple window query to sqllogictest #7928 (Jefffrey)
  • ci: upgrade node to version 20 #7918 (crepererum)
  • Change input for to_timestamp function to be seconds rather than nanoseconds, add to_timestamp_nanos #7844 (comphead)
  • Minor: Document parquet crate feature #7927 (alamb)
  • Minor: reduce some #cfg(feature = "parquet") #7929 (alamb)
  • Minor: reduce use of #cfg(feature = "parquet") in tests #7930 (alamb)
  • Fix CI failures on to_timestamp() calls #7941 (comphead)
  • minor: add a datatype casting for the updated value #7922 (jonahgao)
  • Minor:add avro feature in datafusion-examples to make avro_sql run #7946 (haohuaijin)
  • Add simple exclude all columns test to sqllogictest #7945 (Jefffrey)
  • Support Partitioning Data by Dictionary Encoded String Array Types #7896 (devinjdangelo)
  • Minor: Remove array() in array_expression #7961 (jayzhan211)
  • Minor: simplify update code #7943 (alamb)
  • Add some initial content about creating logical plans #7952 (andygrove)
  • Minor: Change from &mut SessionContext to &SessionContext in substrait #7965 (my-vegetable-has-exploded)
  • Fix crate READMEs #7964 (Jefffrey)
  • Minor: Improve HashJoinExec documentation #7953 (alamb)
  • chore: clean useless clone baesd on clippy #7973 (Weijun-H)
  • Add README.md to core, execution and physical-plan crates #7970 (alamb)
  • Move source repartitioning into ExecutionPlan::repartition #7936 (alamb)
  • minor: fix broken links in README.md #7986 (jonahgao)
  • Minor: Upate the sqllogictest crate README #7971 (alamb)
  • Improve MemoryCatalogProvider default impl block placement #7975 (lewiszlw)
  • Fix ScalarValue handling of NULL values for ListArray #7969 (viirya)
  • Refactor of Ordering and Prunability Traversals and States #7985 (berkaysynnada)
  • Keep output as scalar for scalar function if all inputs are scalar #7967 (viirya)
  • Fix crate READMEs for core, execution, physical-plan #7990 (Jefffrey)
  • Update sqlparser requirement from 0.38.0 to 0.39.0 #7983 (jackwener)
  • Fix panic in multiple distinct aggregates by fixing ScalarValue::new_list #7989 (alamb)
  • Minor: Add MemoryReservation::consumer getter #8000 (milenkovicm)
  • fix: generate logical plan for UPDATE SET FROM statement #7984 (jonahgao)
  • Create temporary files for reading or writing #8005 (smallzhongfeng)
  • Minor: fix comment on SortExec::with_fetch method #8011 (westonpace)
  • Fix: dataframe_subquery example Optimizer rule common_sub_expression_eliminate failed #8016 (smallzhongfeng)
  • Percent Decode URL Paths (#8009) #8012 (tustvold)
  • Minor: Extract common deps into workspace #7982 (lewiszlw)
  • minor: change some plan_err to exec_err #7996 (waynexia)
  • Minor: error on unsupported RESPECT NULLs syntax #7998 (alamb)
  • Break GroupedHashAggregateStream spill batch into smaller chunks #8004 (milenkovicm)
  • Minor: Add implementation examples to ExecutionPlan::execute #8013 (tustvold)
  • Minor: Extend wrap_into_list_array to accept multiple args #7993 (jayzhan211)
  • GroupedHashAggregateStream should register spillable consumer #8002 (milenkovicm)
  • fix: single_distinct_aggretation_to_group_by fail #7997 (haohuaijin)
  • Read only enough bytes to infer Arrow IPC file schema via stream #7962 (Jefffrey)
  • Minor: remove a strange char #8030 (haohuaijin)
  • Minor: Improve documentation for Filter Pushdown #8023 (alamb)
  • Minor: Improve ExecutionPlan documentation #8019 (alamb)
  • fix: clippy warnings from nightly rust 1.75 #8025 (waynexia)
  • Minor: Avoid recomputing compute_array_ndims in align_array_dimensions #7963 (jayzhan211)
  • Minor: fix doc and fmt CI check #8037 (alamb)
  • Minor: remove uncessary #cfg test #8036 (alamb)
  • Minor: Improve documentation for PartitionStream and StreamingTableExec #8035 (alamb)
  • Combine Equivalence and Ordering equivalence to simplify state #8006 (mustafasrepo)
  • Encapsulate ProjectionMapping as a struct #8033 (alamb)
  • Minor: Fix bugs in docs for to_timestamp, to_timestamp_seconds, ... #8040 (alamb)
  • Improve comments for PartitionSearchMode struct #8047 (ozankabak)
  • General approach for Array replace #8050 (jayzhan211)
  • Minor: Remove the irrelevant note from the Expression API doc #8053 (ongchi)
  • Minor: Add more documentation about Partitioning #8022 (alamb)
  • Minor: improve documentation for IsNotNull, DISTINCT, etc #8052 (alamb)
  • Prepare 33.0.0 Release #8057 (andygrove)
  • Minor: improve error message by adding types to message #8065 (alamb)
  • Minor: Remove redundant BuiltinScalarFunction::supports_zero_argument() #8059 (2010YOUY01)
  • Add example to ci #8060 (smallzhongfeng)
  • Update substrait requirement from 0.18.0 to 0.19.0 #8076 (dependabot[bot])
  • Fix incorrect results in COUNT(*) queries with LIMIT #8049 (msirek)
  • feat: Support determining extensions from names like foo.parquet.snappy as well as foo.parquet #7972 (Weijun-H)
  • Use FairSpillPool for TaskContext with spillable config #8072 (viirya)
  • Minor: Improve HashJoinStream docstrings #8070 (alamb)
  • Fixing broken link #8085 (edmondop)
  • fix: DataFusion suggests invalid functions #8083 (jonahgao)
  • Replace macro with function for array_repeat #8071 (jayzhan211)
  • Minor: remove unnecessary projection in single_distinct_to_group_by rule #8061 (haohuaijin)
  • minor: Remove duplicate version numbers for arrow, object_store, and parquet dependencies #8095 (andygrove)
  • fix: add encode/decode to protobuf encoding #8089 (Syleechan)
  • feat: Protobuf serde for Json file sink #8062 (Jefffrey)
  • Minor: use Expr::alias in a few places to make the code more concise #8097 (alamb)
  • Minor: Cleanup BuiltinScalarFunction::return_type() #8088 (2010YOUY01)
  • Update sqllogictest requirement from 0.17.0 to 0.18.0 #8102 (dependabot[bot])
  • Projection Pushdown in PhysicalPlan #8073 (berkaysynnada)
  • Push limit into aggregation for DISTINCT ... LIMIT queries #8038 (msirek)
  • Bug-fix in Filter and Limit statistics #8094 (berkaysynnada)
  • feat: support target table alias in update statement #8080 (jonahgao)
  • Minor: Simlify downcast functions in cast.rs. #8103 (Weijun-H)
  • Fix ArrayAgg schema mismatch issue #8055 (jayzhan211)
  • Minor: Support nulls in array_replace, avoid a copy #8054 (alamb)
  • Minor: Improve the document format of JoinHashMap #8090 (Asura7969)
  • Simplify ProjectionPushdown and make it more general #8109 (alamb)
  • Minor: clean up the code regarding clippy #8122 (Weijun-H)
  • Support remaining functions in protobuf serialization, add expr_fn for StructFunction #8100 (JacobOgle)
  • Minor: Cleanup BuiltinScalarFunction's phys-expr creation #8114 (2010YOUY01)
  • rewrite array_append/array_prepend to remove deplicate codes #8108 (Veeupup)
  • Implementation of array_intersect #8081 (Veeupup)
  • Minor: fix ci break #8136 (haohuaijin)
  • Improve documentation for calculate_prune_length method in SymmetricHashJoin #8125 (Asura7969)
  • Minor: remove duplicated array_replace tests #8066 (alamb)
  • Minor: Fix temporary files created but not deleted during testing #8115 (2010YOUY01)
  • chore: remove panics in datafusion-common::scalar by making more operations return Result #7901 (junjunjd)
  • Fix join order for TPCH Q17 & Q18 by improving FilterExec statistics #8126 (andygrove)
  • Fix: Do not try and preserve order when there is no order to preserve in RepartitionExec #8127 (alamb)
  • feat: add column statistics into explain #8112 (NGA-TRAN)
  • Add subtrait support for IS NULL and IS NOT NULL #8093 (tgujar)
  • Combine Expr::Wildcard and Wxpr::QualifiedWildcard, add wildcard() expr fn #8105 (alamb)
  • docs: show creation of DFSchema #8132 (wjones127)
  • feat: support UDAF in substrait producer/consumer #8119 (waynexia)
  • Improve documentation site to make it easier to find communication on Slack/Discord #8138 (alamb)