Back to Datafusion

36.0.0

dev/changelog/36.0.0.md

53.1.027.6 KB
Original Source
<!--- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -->

36.0.0 (2024-02-16)

Full Changelog

Breaking changes:

  • Deprecate make_scalar_function #8878 (viirya)
  • Change Accumulator::evaluate and Accumulator::state to take &mut self #8925 (alamb)
  • Rename CatalogList to CatalogProviderList #9002 (comphead)
  • Remove some recursive cloning from logical planning #9050 (ozankabak)
  • Support FixedSizeList type coercion #8902 (Weijun-H)
  • Add ColumnarValue::values_to_arrays, deprecate columnar_values_to_array #9114 (alamb)

Performance related:

  • Minor: Add new Extended ClickBench benchmark queries #8950 (alamb)

Implemented enhancements:

  • feat: support stride in array_slice, change indexes to be1 based #8829 (Weijun-H)
  • feat: emitting partial join results in HashJoinStream #8020 (korowa)
  • feat:implement sql style 'ends_with' and 'instr' string function #8862 (zy-kkk)
  • feat: Support parquet bloom filter pruning for decimal128 #8930 (Ted-Jiang)
  • feat: Disable client console highlight by default #9013 (comphead)
  • feat: support the ergonomics of getting list slice with stride #8946 (Weijun-H)
  • feat: Parallel Arrow file format reading #8897 (my-vegetable-has-exploded)
  • feat: support array_reverse #9023 (Weijun-H)
  • feat: issue #8969 adding position function #8988 (Lordworms)
  • feat: support LargeList in flatten #9110 (Weijun-H)
  • feat: improve make_date performance #9112 (r3stl355)
  • feat: add github action to self-assign the issue #9132 (r3stl355)
  • feat: add ability to query the remote http(s) location directly in datafusion-cli #9150 (r3stl355)
  • feat: implement select directly from s3 and gcs locations in datafusion-cli #9199 (r3stl355)
  • feat: support block gzip for streams #9175 (tshauck)

Fixed bugs:

  • fix: recursive initialize method #8937 (waynexia)
  • fix: common_subexpr_eliminate rule should not apply to short-circuit expression #8928 (haohuaijin)
  • fix: issue #8922 make row group test more readable #8941 (Lordworms)
  • fix: allow placeholders to be substituted when coercible #8977 (kallisti-dev)
  • fix: unambiguously truncate time in date_trunc function #9068 (mhilton)
  • fix: schema metadata retrieval when listing parquet table #9134 (brayanjuls)

Documentation updates:

  • Prepare 35.0.0-rc1 #8924 (andygrove)
  • Update project links #8954 (comphead)
  • Document parallelism and thread scheduling in the architecture guide #8986 (alamb)
  • chore: fix license badge in README #9008 (suyanhanx)
  • docs: fix array_position docs #9003 (tshauck)
  • Docs: improve contributor guide to explain how to work with tickets #8999 (alamb)
  • Document minimum required rust version #9071 (comphead)
  • Minor: Add ParadeDB to the list of users #9018 (alamb)
  • Update minimum rust version to 1.72 #8997 (alamb)
  • docs: add docs and example showing how to get the expression data type #9118 (r3stl355)
  • chore: Fix incorrect comment in substrait consumer #9123 (caicancai)
  • Minor: Fix Self referential links in readme #9119 (alamb)
  • Examples link in catalogs.rs leads to a 404 #9194 (Omega359)
  • Create datafusion-functions-array crate and move ArrayToString function into it #9113 (alamb)

Merged pull requests:

  • Add hash_join_single_partition_threshold_rows config #8720 (maruschin)
  • Prepare 35.0.0-rc1 #8924 (andygrove)
  • feat: support stride in array_slice, change indexes to be1 based #8829 (Weijun-H)
  • fix: recursive initialize method #8937 (waynexia)
  • Fix expr partial ord test #8908 (mustafasrepo)
  • Simplify windows builtin functions return type #8920 (comphead)
  • Fix handling of nested leaf columns in parallel parquet writer #8923 (devinjdangelo)
  • feat: emitting partial join results in HashJoinStream #8020 (korowa)
  • fix: common_subexpr_eliminate rule should not apply to short-circuit expression #8928 (haohuaijin)
  • Support GroupsAccumulator accumulator for udaf #8892 (guojidan)
  • test: Port tests in partitioned_csv.rs to sqllogictest #8919 (simicd)
  • [CI] Fix RUSTFLAGS #8929 (Jefffrey)
  • Minor: Update datafusion-cli README to explain why it is not in the w… #8938 (alamb)
  • Add syntax highlight to datafusion-cli #8918 (trungda)
  • Update substrait requirement from 0.22.1 to 0.23.0 #8943 (dependabot[bot])
  • Deprecate make_scalar_function #8878 (viirya)
  • Update project links #8954 (comphead)
  • fix: issue #8922 make row group test more readable #8941 (Lordworms)
  • feat:implement sql style 'ends_with' and 'instr' string function #8862 (zy-kkk)
  • [MINOR]: Extract aggregate topk function to aggregate_topk.slt #8948 (mustafasrepo)
  • Combine multiple IN lists in ExprSimplifier #8949 (jayzhan211)
  • Fix clippy failures: error: use of deprecated function `functions::make_scalar_function #8972 (alamb)
  • feat: Support parquet bloom filter pruning for decimal128 #8930 (Ted-Jiang)
  • [MINOR]: Update create_window_expr to refer only input schema #8945 (mustafasrepo)
  • Don't error in simplify_expressions rule #8957 (haohuaijin)
  • Use .zip to avoid unwrap #8956 (Luv-Ray)
  • Change Accumulator::evaluate and Accumulator::state to take &mut self #8925 (alamb)
  • Enhance simplifier by adding Canonicalize #8780 (yyy1000)
  • Find the correct fields when using page filter on struct fields in parquet #8848 (manoj-inukolunu)
  • fix: allow placeholders to be substituted when coercible #8977 (kallisti-dev)
  • Minor: improve CatalogProvider documentation with rationale and info about remote catalogs #8968 (alamb)
  • Improve to_timestamp docs #8981 (Omega359)
  • Add helper function for processing scalar function input #8962 (viirya)
  • Fix optimize projections bug #8960 (mustafasrepo)
  • NOT operator not return internal error when args are not boolean value #8982 (guojidan)
  • Minor: Add new Extended ClickBench benchmark queries #8950 (alamb)
  • Minor: Add comments to MSRV CI check to help if it fails #8995 (alamb)
  • Minor: Document memory management design on MemoryPool #8966 (alamb)
  • Fix LEAD/LAG window functions when default value null #8989 (comphead)
  • Optimize MIN/MAX when relation is empty #8940 (viirya)
  • [task #8203] Port tests in joins.rs to sqllogictest #8996 (Tangruilin)
  • [task #8213]Port tests in select.rs to sqllogictest #8967 (Tangruilin)
  • test: Port (last) repartition.rs query to sqllogictest #8936 (simicd)
  • Update to sqlparser 0.42.0 #9000 (alamb)
  • [MINOR]: Fix Optimize Projections Bug #8992 (mustafasrepo)
  • Make Topk aggregate tests deterministic #8998 (mustafasrepo)
  • Add support for Postgres LIKE operators #8894 (gruuya)
  • bug: Datafusion doesn't respect case sensitive table references #8964 (xhwhis)
  • Document parallelism and thread scheduling in the architecture guide #8986 (alamb)
  • Fix None Projections in Projection Pushdown #9005 (berkaysynnada)
  • Lead and Lag window functions should support default value with datatype other than Int64 #9001 (viirya)
  • chore: fix license badge in README #9008 (suyanhanx)
  • Minor: fix: #9010 - Optimizer schema change assert error is incorrect #9012 (curtisleefulton)
  • docs: fix array_position docs #9003 (tshauck)
  • Rename CatalogList to CatalogProviderList #9002 (comphead)
  • Safeguard against potential inexact row count being smaller than exact null count #9007 (gruuya)
  • Recursive CTEs: Stage 3 - add execution support #8840 (matthewgapp)
  • sqllogictest: move the creation of the nan_table from Rust to slt #9022 (jonahgao)
  • TreeNode refactor code deduplication: Part 3 #8817 (ozankabak)
  • feat: Disable client console highlight by default #9013 (comphead)
  • [task #8917] Implement information_schema.schemata #8993 (Tangruilin)
  • Properly encode STRING_AGG, NTH_VALUE in physical plan protobufs #9027 (scsmithr)
  • [task #8201] Port tests in expr.rs to sqllogictest, finish the left c… #9014 (Tangruilin)
  • Fix the clippy error of use of deprecated method #9034 (viirya)
  • feat: support the ergonomics of getting list slice with stride #8946 (Weijun-H)
  • Cache common referred expression at the window input #9009 (mustafasrepo)
  • Optimize COUNT( DISTINCT ...) for strings (up to 9x faster) #8849 (jayzhan211)
  • feat: Parallel Arrow file format reading #8897 (my-vegetable-has-exploded)
  • Change remove from swap to shift in index map #9049 (mustafasrepo)
  • Relax join keys constraint from Column to any physical expression for physical join operators #8991 (viirya)
  • Minor: Improve memory helper trait documentation #9025 (alamb)
  • Docs: improve contributor guide to explain how to work with tickets #8999 (alamb)
  • fix issue where upper and lower functions only work correctly on ascii character #9054 (Omega359)
  • Minor: small updates to bench.sh #9035 (kmitchener)
  • Chore: explicitly list out all Expr types in TypeCoercionRewriter::mutate #9038 (guojidan)
  • Minor: improve scalar functions document #9029 (Weijun-H)
  • [MINOR] Alter a SHJ test for relaxing "on" condition #9065 (metesynnada)
  • Remove some recursive cloning from logical planning #9050 (ozankabak)
  • minor: remove useless macro #8979 (jackwener)
  • Causality Analysis for Builtin Window Functions #9048 (mustafasrepo)
  • Minor: add doc examples for RawTableAllocExt #9059 (alamb)
  • Update substrait requirement from 0.23.0 to 0.24.0 #9067 (dependabot[bot])
  • Remove single_file_output option from FileSinkConfig and Copy statement #9041 (yyy1000)
  • Add a make_date function #9040 (Omega359)
  • Speedup DFSchema::merge using HashSet indices #9020 (simonvandel)
  • Document minimum required rust version #9071 (comphead)
  • Return proper number of expressions for nth_value_agg #9044 (mustafasrepo)
  • ScalarUDF with zero arguments should be provided with one null array as parameter #9031 (viirya)
  • Update strum requirement from 0.25.0 to 0.26.1 #9046 (dependabot[bot])
  • Create datafusion-functions crate, extract encode and decode to #8705 (alamb)
  • Add documentation for streaming usecase #9070 (mustafasrepo)
  • fix: unambiguously truncate time in date_trunc function #9068 (mhilton)
  • feat: support array_reverse #9023 (Weijun-H)
  • prettier to_timestamp_invoke #9078 (Tangruilin)
  • Handle invalid types for negation #9066 (trungda)
  • Minor: reduce unwraps in datetime_expressions.rs #9072 (alamb)
  • Remove custom doubling strategy + add examples to VecAllocEx #9058 (alamb)
  • Split physical_plan_tpch into separate benchmarks #9043 (simonvandel)
  • Minor: Add ParadeDB to the list of users #9018 (alamb)
  • [MINOR]: Add check for unnecessary projection #9079 (mustafasrepo)
  • chore(placeholder): update error message and add tests #9073 (appletreeisyellow)
  • refer to #8781, convert the internal_err! in datetime_expression.rs to exec_err! #9083 (Tangruilin)
  • Add benchmarks for to_timestamp and make_date functions #9086 (Omega359)
  • chore: Clarify ParadeDB branding #9088 (philippemnoel)
  • doc: Add example how to include latest datafusion #9076 (comphead)
  • Update minimum rust version to 1.72 #8997 (alamb)
  • Fix typo in an error message #9099 (AdamGS)
  • Update InfluxDB links in Known Users section of documentation #9092 (alamb)
  • Support FixedSizeList type coercion #8902 (Weijun-H)
  • Improve Canonicalize API #8983 (alamb)
  • Update env_logger requirement from 0.10 to 0.11 #8944 (dependabot[bot])
  • Split count_distinct.rs into separate modules #9087 (alamb)
  • Fix update_expr for projection pushdown #9096 (viirya)
  • Improve InListSImplifier -- add test, commend and avoid clones #8971 (alamb)
  • feat: issue #8969 adding position function #8988 (Lordworms)
  • Cleanup regex_expressions.rs to remove _regexp_match function #9107 (Omega359)
  • Unnest with single expression #9069 (jayzhan211)
  • Minor: improve GroupsAccumulator and Accumulator documentation #8963 (alamb)
  • move InList related simplify to one place #9037 (guojidan)
  • docs: add docs and example showing how to get the expression data type #9118 (r3stl355)
  • Add http(s) support to the command line #8753 (kcolford)
  • Remove External Table Backwards Compatibility Options #9105 (yyy1000)
  • feat: support LargeList in flatten #9110 (Weijun-H)
  • feat: improve make_date performance #9112 (r3stl355)
  • Refactor min/max value update in Parquet statistics #9120 (Weijun-H)
  • chore: Fix incorrect comment in substrait consumer #9123 (caicancai)
  • Minor: Fix Self referential links in readme #9119 (alamb)
  • Add ColumnarValue::values_to_arrays, deprecate columnar_values_to_array #9114 (alamb)
  • Support Copy with Remote Object Stores in datafusion-cli #9064 (manoj-inukolunu)
  • Fix Dockerfile min rust version to 1.72 #9135 (alamb)
  • fix: schema metadata retrieval when listing parquet table #9134 (brayanjuls)
  • Update parse_protobuf_file_scan_config to remove any partition columns from the file_schema in FileScanConfig #9126 (bcmcmill)
  • feat: add github action to self-assign the issue #9132 (r3stl355)
  • Fix NULL values in FixedSizeList creation #9141 (Weijun-H)
  • Add FunctionRegistry::register_udaf and FunctionRegistry::register_udwf #9075 (alamb)
  • Change ScalarValue::Struct to ArrayRef #7893 (jayzhan211)
  • Support join filter for SortMergeJoin #9080 (viirya)
  • Typo in docstring #9149 (tv42)
  • RecordBatchReceiverStreamBuilder: don't stringify errors #9155 (tv42)
  • port position test to scalar #9128 (Lordworms)
  • Minor: Improve DataFrame docs, add examples #9159 (alamb)
  • feat: add ability to query the remote http(s) location directly in datafusion-cli #9150 (r3stl355)
  • Add regexp_like, improve docs and examples for regexp_match` #9137 (Omega359)
  • Partial Sort Plan Implementation #9125 (ahmetenis)
  • Update tonic requirement from 0.10 to 0.11 #9176 (dependabot[bot])
  • minor: fix error message function naming #9168 (comphead)
  • Minor: Update DataFrame::write_table docs #9169 (alamb)
  • Improve PhysicalExpr documentation #9180 (alamb)
  • Fix sphinx warnings #9142 (ongchi)
  • Use concat to simplify Nested Scalar creation #9174 (jayzhan211)
  • Minor: Remove unecessary map_err #9186 (alamb)
  • Add example of using PruningPredicate to datafusion-examples #9183 (alamb)
  • Use prep_null_mask_filter to handle nulls in selection mask #9163 (viirya)
  • [Document] Adding UDF by impl ScalarUDFImpl #9172 (yyy1000)
  • Docs: Extend PruningPredicate with background and implementation info #9184 (alamb)
  • chore: make tokio a workspace dependency #9187 (PsiACE)
  • Examples link in catalogs.rs leads to a 404 #9194 (Omega359)
  • Add test pipeline for Mac aarch64 #9191 (viirya)
  • Add string aggregate grouping fuzz test, add MemTable::with_sort_exprs #9190 (alamb)
  • Create datafusion-functions-array crate and move ArrayToString function into it #9113 (alamb)
  • Add constant expression support to equivalence properties #9198 (mustafasrepo)
  • chore: update tpch-docker docker repository #9204 (pmcgleenon)
  • feat: implement select directly from s3 and gcs locations in datafusion-cli #9199 (r3stl355)
  • MINOR: Add "fs" feature to "tokio", fix "features" typo. #9210 (mustafasrepo)
  • Add to_char function implementation using chrono formats #9181 (Omega359)
  • Add SessionContext::read_batches #9197 (Lordworms)
  • feat: support block gzip for streams #9175 (tshauck)
  • chore(pruning): Support IS NOT NULL predicates in PruningPredicate #9208 (appletreeisyellow)
  • Add cargo audit CI #9182 (ongchi)
  • Move nullif and isnan to datafusion-functions #9216 (alamb)
  • Bugfix - Projection Removal Conditions #9215 (berkaysynnada)
  • Partitioning fixes #9207 (esheppa)
  • Return an error when a column does not exist in window function #9202 (PhVHoang)
  • Revert "chore(pruning): Support IS NOT NULL predicates in PruningPredicate (#9208)" #9232 (appletreeisyellow)
  • Improve documentation on how to build ScalarValue::Struct and add ScalarStructBuilder #9229 (alamb)
  • Minor: improve Display of output ordering of StreamTableExec #9225 (mustafasrepo)
  • Support compute return types from argument values (not just their DataTypes) #8985 (yyy1000)
  • Dont call multiunzip when no stats #9220 (matthewmturner)
  • Use setup-macos-aarch64-builder for aarch64 CI pipeline #9242 (viirya)
  • GROUP-BY prioritizes input columns in case of ambiguity #9228 (jonahgao)
  • Minor: chore: improve catalog test in mod.rs #9244 (caicancai)
  • Add example for ScalarStructBuilder::new_null, fix display for null ScalarValue::Struct #9238 (alamb)