Back to Datafusion

Apache DataFusion 39.0.0 Changelog

dev/changelog/39.0.0.md

53.1.030.6 KB
Original Source
<!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -->

Apache DataFusion 39.0.0 Changelog

This release consists of 234 commits from 59 contributors. See credits at the end of this changelog for more information.

Breaking changes:

  • Remove ScalarFunctionDefinition #10325 (lewiszlw)
  • Introduce user-defined signature #10439 (jayzhan211)
  • Remove AggregateFunctionDefinition::Name #10441 (lewiszlw)
  • Make CREATE EXTERNAL TABLE format options consistent, remove special syntax for HEADER ROW, DELIMITER and COMPRESSION #10404 (berkaysynnada)
  • feat: allow array_slice to take an optional stride parameter #10469 (jonahgao)
  • Minor: Extend more style of udaf expr_fn, Remove order args forcovar_samp and covar_pop #10492 (jayzhan211)
  • Remove file_type() from FileFormat #10499 (Jefffrey)
  • UDAF: Extend more args to state_fields and groups_accumulator_supported and introduce ReversedUDAF #10525 (jayzhan211)
  • Remove Expr::GetIndexedField, replace Expr::{field,index,range} with FieldAccessor, IndexAccessor, and SliceAccessor #10568 (jayzhan211)
  • Improve ContextProvider #10577 (lewiszlw)
  • Minor: Use slice in ConcreteTreeNode #10666 (peter-toth)
  • Add reference visitor TreeNode APIs, change ExecutionPlan::children() and PhysicalExpr::children() return references #10543 (peter-toth)
  • Introduce Sum UDAF #10651 (jayzhan211)

Implemented enhancements:

  • feat: optional args for regexp_* UDFs #10514 (Michael-J-Ward)
  • feat: Expose Parquet Schema Adapter #10515 (HawaiianSpork)
  • feat: API for collecting statistics/index for metadata of a parquet file + tests #10537 (NGA-TRAN)
  • feat: Add eliminate group by constant optimizer rule #10591 (korowa)
  • feat: extend unnest to support Struct datatype #10429 (duongcongtoai)
  • feat: add substrait support for Interval types and literals #10646 (waynexia)
  • feat: support unparsing LogicalPlan::Window nodes #10767 (devinjdangelo)
  • feat: Update Parquet row filtering to handle type coercion #10716 (jeffreyssmith2nd)

Fixed bugs:

  • fix: make columnize_expr resistant to display_name collisions #10459 (jonahgao)
  • fix: avoid compressed json files repartitioning #10470 (korowa)
  • fix: parsing timestamp with date format #10476 (shanretoo)
  • fix: array_slice panics #10547 (jonahgao)
  • fix: pass quote parameter to CSV writer #10671 (DDtKey)
  • fix: CI compilation failed on substrait #10683 (jonahgao)
  • fix: fix string repeat for negative numbers #10760 (tshauck)
  • fix: array_slice and array_element panicked on empty args #10804 (jonahgao)

Documentation updates:

  • Prepare 38.0.0 release candidate 1 #10407 (andygrove)
  • chore(docs): update subquery documentation with more information #10361 (sanderson)
  • minor: Remove docs archive #10416 (andygrove)
  • Minor: format comments in PushDownFilter rule #10437 (alamb)
  • Minor: Add usecase to comments in LogicalPlan::recompute_schema #10443 (alamb)
  • doc: fix old master branch references to main #10458 (Jefffrey)
  • Minor: Improved document string for LogicalPlanBuilder #10496 (AbrarNitk)
  • Add to_date function to scalar functions doc #10601 (Omega359)
  • Docs: Update PR workflow documentation #10532 (alamb)
  • Minor: Add examples of using TreeNode with Expr #10686 (alamb)
  • docs: add documents to substrait type variation consts #10719 (waynexia)
  • Minor: (Doc) Enable rt-multi-thread feature for sample code #10770 (hsiang-c)

Other:

  • Minor: Add more docs and examples for Expr::unalias #10406 (alamb)
  • minor: Remove [RUST][datafusion] from release vote email subject line #10411 (andygrove)
  • fix dml logical plan output schema #10394 (leoyvens)
  • [MINOR]: Move transpose code to under common #10409 (mustafasrepo)
  • Fix incorrect Schema over aggregate function, Remove unnecessary exprlist_to_fields_aggregate #10408 (jonahgao)
  • Enable user defined display_name for ScalarUDF #10417 (yyy1000)
  • Fix and improve CommonSubexprEliminate rule #10396 (peter-toth)
  • Simplify making information_schame tables #10420 (lewiszlw)
  • only consider main part of the url when deciding is_collection in listing table #10419 (y-f-u)
  • make common expression alias human-readable #10333 (MohamedAbdeen21)
  • Minor: Simplify + document EliminateCrossJoin better #10427 (alamb)
  • During expression equality, check for new ordering information #10434 (mustafasrepo)
  • Revert 10333 / changes to aliasing in CommonSubExprEliminate #10436 (MohamedAbdeen21)
  • Improve flight sql examples #10432 (lewiszlw)
  • Move Covariance (Population) covar_pop to be a User Defined Aggregate Function #10418 (yyy1000)
  • Stop copying LogicalPlan and Exprs in OptimizeProjections (2% faster planning) #10405 (alamb)
  • chore: Improve release process for next time #10447 (andygrove)
  • Move bit_and_or_xor unit tests to slt #10457 (NoeB)
  • Remove some Expr clones in EliminateCrossJoin(3%-5% faster planning) #10430 (alamb)
  • refactor: Reduce string allocations in Expr::display_name (use write instead of format!) #10454 (erratic-pattern)
  • Add simplify method to aggregate function #10354 (milenkovicm)
  • Add cast array test to sqllogictest #10474 (viirya)
  • Add Expr::try_as_col, deprecate Expr::try_into_col (speed up optimizer) #10448 (alamb)
  • Implement From<Arc<LogicalPlan>> for LogicalPlanBuilder #10466 (AbrarNitk)
  • Minor: Improve documentation for catalog.has_header config option #10452 (alamb)
  • Minor: Simplify conjunction and disjunction, improve docs #10446 (alamb)
  • Stop copying LogicalPlan and Exprs in ReplaceDistinctWithAggregate #10460 (ClSlaid)
  • Stop copying LogicalPlan and Exprs in EliminateCrossJoin (4% faster planning) #10431 (alamb)
  • Improved ergonomy for CREATE EXTERNAL TABLE OPTIONS: Don't require quotations for simple namespaced keys like foo.bar #10483 (ozankabak)
  • Replace GetFieldAccess with indexing function in SqlToRel #10375 (jayzhan211)
  • Fix values with different data types caused failure #10445 (b41sh)
  • Fix SortMergeJoin with join filter filtering all rows out #10495 (viirya)
  • chore: use fullpath in macro to avoid declaring in other module #10503 (jayzhan211)
  • Minor: remove unused source file udf.rs #10497 (jonahgao)
  • Support UDAF to align Builtin aggregate function #10493 (jayzhan211)
  • Minor: add a test for current_time (no args) #10509 (alamb)
  • [MINOR]: Move pipeline checker rule to the end #10502 (mustafasrepo)
  • Minor: Extract parent/child limit calculation into a function, improve docs #10501 (alamb)
  • Fix window expr deserialization #10506 (lewiszlw)
  • Update substrait requirement from 0.32.0 to 0.33.3 #10516 (dependabot[bot])
  • Stop copying LogicalPlan and Exprs in TypeCoercion (10% faster planning) #10356 (alamb)
  • Implement unparse IS_NULL to String and enhance the tests #10529 (goldmedal)
  • Fix panic in array_agg(distinct) query #10526 (jayzhan211)
  • Move min_max unit tests to slt #10539 (xinlifoobar)
  • Implement unparse IsNotFalse to String #10538 (goldmedal)
  • Implement Unparse TryCast Expr --> String Support #10542 (xinlifoobar)
  • Implement unparse Placeholder to String #10540 (reswqa)
  • Implement unparse OuterReferenceColumn to String #10544 (goldmedal)
  • Stop copying LogicalPlan and Exprs in PushDownFilter (4%-6% faster planning) #10444 (alamb)
  • Stop most copying LogicalPlan and Exprs in ScalarSubqueryToJoin #10489 (alamb)
  • Example for simple Expr --> SQL conversion #10528 (edmondop)
  • fix null_count on compute_record_batch_statistics to report null counts across partitions #10468 (samuelcolvin)
  • Minor: Add PullUpCorrelatedExpr::new and improve documentation #10500 (alamb)
  • Stop copying LogicalPlan and Exprs in PushDownLimit #10508 (alamb)
  • Break up contributing guide into smaller pages #10533 (alamb)
  • PhysicalExpr Orderings with Range Information #10504 (berkaysynnada)
  • Implement unparse ScalarVariable to String #10541 (reswqa)
  • Handle dictionary values in ScalarValue serde #10563 (thinkharderdev)
  • Improve signature of get_field function #10569 (lewiszlw)
  • Implement Unparse GroupingSet Expr --> String Support sql #10555 (xinlifoobar)
  • Minor: Move proxy to datafusion common #10561 (jayzhan211)
  • Update prost-build requirement from =0.12.4 to =0.12.6 #10578 (dependabot[bot])
  • Add examples of how to convert logical plan to/from sql strings #10558 (xinlifoobar)
  • Fix: Sort Merge Join LeftSemi issues when JoinFilter is set #10304 (comphead)
  • Minor: Fix ArrayFunctionRewriter name reporting #10581 (alamb)
  • Improve UserDefinedLogicalNode::from_template API to return Result #10575 (lewiszlw)
  • Migrate testing optimizer rules to use rewrite API #10576 (lewiszlw)
  • test: add more tests for statistics reading #10592 (NGA-TRAN)
  • refactor: reduce allocations in push down filter #10567 (erratic-pattern)
  • Fix compilation of datafusion-cli on 32bit targets #10594 (nathaniel-daniel)
  • Rename monotonicity as output_ordering in ScalarUDF's #10596 (berkaysynnada)
  • Implement Unparser for UNION ALL #10603 (phillipleblanc)
  • Improve UserDefinedLogicalNodeCore::from_template API to return Result #10597 (lewiszlw)
  • Minor: Move group accumulator for aggregate function to physical-expr-common, and add ahash physical-expr-common #10574 (jayzhan211)
  • Minor: Consolidate some integration tests into core_integration #10588 (alamb)
  • Stop copying LogicalPlan and Exprs in SingleDistinctToGroupBy #10527 (appletreeisyellow)
  • [MINOR]: Update get range implementation for lead lag window functions #10614 (mustafasrepo)
  • Minor: Improve documentation in sql_to_plan example #10582 (alamb)
  • Docs: add examples for RuntimeEnv::register_object_store, improve error messages #10617 (aditanase)
  • Add support for Substrait List/EmptyList literals #10615 (Blizzara)
  • Add to_unixtime function to scalar functions doc #10620 (Omega359)
  • Test for reading read statistics from parquet files without statistics and boolean & struct data type #10608 (NGA-TRAN)
  • adding benchmark for extracting arrow statistics from parquet #10610 (Lordworms)
  • Implement a dialect-specific rule for unparsing an identifier with or without quotes #10573 (goldmedal)
  • add catalog as part of the table path in plan_to_sql #10612 (y-f-u)
  • Refactor parquet row group pruning into a struct (use new statistics API, part 1) #10607 (alamb)
  • Extract Date32 parquet statistics as Date32Array rather than Int32Array #10593 (xinlifoobar)
  • Omit NULLS FIRST/LAST when unparsing ORDER BY clauses for MySQL #10625 (phillipleblanc)
  • Fix broken build/test from merge #10637 (phillipleblanc)
  • Add SessionContext::register_object_store #10621 (alamb)
  • Minor: Move median test #10611 (jayzhan211)
  • Add support for Substrait Struct literals and type #10622 (Blizzara)
  • fix Incorrect statistics read for i8 i16 columns in parquet #10629 (Lordworms)
  • Minor: add runtime asserts to RowGroup #10641 (alamb)
  • Update cli Dockerfile to a newer ubuntu release, newer rust release #10638 (Omega359)
  • More properly handle nullability of types/literals in Substrait #10640 (Blizzara)
  • fix wrong type validation on unnest expr #10657 (duongcongtoai)
  • Fix incorrect statistics read for binary columns in parquet #10645 (xinlifoobar)
  • Fix NULL["field"] for expr_API #10655 (alamb)
  • Update substrait requirement from 0.33.3 to 0.34.0 #10632 (dependabot[bot])
  • Fix typo in Cargo.toml (unused manifest key: dependencies.regex.worksapce) #10662 (alamb)
  • Add FileScanConfig::new() API #10623 (alamb)
  • Minor: Remove GetFieldAccessSchema #10665 (jayzhan211)
  • Move Median to functions-aggregate and Introduce Numeric signature #10644 (jayzhan211)
  • Fix Coalesce casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion #10268 (jayzhan211)
  • Fix compilation "comparison_binary_numeric_coercion not found" #10677 (alamb)
  • refactor: simplify converting List DataTypes to ScalarValue #10675 (jonahgao)
  • Minor: Improve ObjectStoreUrl docs + examples #10619 (alamb)
  • Add tests for reading numeric limits in parquet statistics #10642 (alamb)
  • Update nix requirement from 0.28.0 to 0.29.0 #10684 (dependabot[bot])
  • refactor: Move SchemaAdapter from parquet module to data source #10680 (HawaiianSpork)
  • Convert first, last aggregate function to UDAF #10648 (mustafasrepo)
  • Minor: CastExpr Ordering Handle #10650 (berkaysynnada)
  • Factor out common datafusion types into another proto file #10649 (mustafasrepo)
  • Minor: Add tests showing aggregate behavior for NaNs #10634 (alamb)
  • Improve ParquetExec and related documentation #10647 (alamb)
  • minor: inconsistent group by position planning #10679 (korowa)
  • Remove duplicate function name in its aliases list #10661 (goldmedal)
  • Add protobuf serde support for LogicalPlan::Unnest #10681 (akoshchiy)
  • Support Substrait's VirtualTables #10531 (Blizzara)
  • support serialization and deserialization limit in the aggregation exec #10692 (liukun4515)
  • Display date32/64 in YYYY-MM-DD format #10691 (houqp)
  • Fix: array list values are leaked on nested unnest operators #10689 (duongcongtoai)
  • Support LogicalPlan::Distinct in unparser #10690 (yyy1000)
  • Remove redundant upper case aliases for median, first_value and last_value #10696 (goldmedal)
  • Minor: improve Expr documentation #10685 (alamb)
  • chore: align re-exports in functions-aggregate #10705 (waynexia)
  • Fix typo in bench.sh #10698 (vimt)
  • Fix incorrect statistics read for unsigned integers columns in parquet #10704 (xinlifoobar)
  • Separate Partitioning protobuf serialization code #10708 (lewiszlw)
  • Support consuming Substrait with compound signature function names #10653 (Blizzara)
  • Minor: Add examples of using TreeNode with LogicalPlan #10687 (alamb)
  • Add ParquetExec::builder(), deprecate ParquetExec::new #10636 (alamb)
  • feature: Add a WindowUDFImpl::simplify() API #9906 (guojidan)
  • Chore: clean up udwf example && remove redundant import #10718 (guojidan)
  • Push down filter as table partition list prefix #10693 (houqp)
  • Make swap_hash_join public API #10702 (viirya)
  • ci: fix clippy error on main #10723 (jonahgao)
  • CI: Fix complaints from newer Clippy versions #10725 (comphead)
  • Remove Eager Trait for Joins #10721 (berkaysynnada)
  • Minor: fix signature fn octect_length() #10726 (marvinlanhenke)
  • Update rstest requirement from 0.19.0 to 0.20.0 #10734 (dependabot[bot])
  • Update rstest_reuse requirement from 0.6.0 to 0.7.0 #10733 (dependabot[bot])
  • Add example for building an external secondary index for parquet files #10549 (alamb)
  • Minor: move stddev test to slt #10741 (marvinlanhenke)
  • fix(CLI): can not create external tables with format options #10739 (jonahgao)
  • Add support for AggregateExpr, WindowExpr rewrite. #10742 (mustafasrepo)
  • Fix SMJ Left Anti Join when the join filter is set #10724 (comphead)
  • Introduce FunctionRegistry dependency to optimize and rewrite rule #10714 (jayzhan211)
  • Minor: Add SMJ to TPCH benchmark usage #10747 (comphead)
  • Minor: Split physical_plan/parquet/mod.rs into smaller modules #10727 (alamb)
  • minor: consolidate unparser integration tests #10736 (devinjdangelo)
  • Minor: Move aggregate variance to slt #10750 (marvinlanhenke)
  • Extract parquet statistics from timestamps with timezones #10766 (xinlifoobar)
  • Minor: Add tests for extracting dictionary parquet statistics #10729 (alamb)
  • Update rstest requirement from 0.20.0 to 0.21.0 #10774 (dependabot[bot])
  • Minor: Refactor memory size estimation for HashTable #10748 (marvinlanhenke)
  • Reduce code repetition in datafusion/functions mod files #10700 (MohamedAbdeen21)
  • Support negatives in split part #10780 (tshauck)
  • Extract parquet statistics from LargeUtf8 columns and Add tests for UTF8 And LargeUTF8 #10762 (Weijun-H)
  • Cleanup GetIndexedField #10769 (lewiszlw)
  • Extract parquet statistics from f16 columns, add ScalarValue::Float16 #10763 (Lordworms)
  • Handle empty rows for array_sort #10786 (jayzhan211)
  • Fix extract parquet statistics from LargeBinary columns #10775 (xinlifoobar)
  • Extract parquet statistics from Time32 and Time64 columns #10771 (Lordworms)
  • chore: fix last_value coercion #10783 (appletreeisyellow)
  • Fix extract parquet statistics from Decimal256 columns #10777 (xinlifoobar)
  • Speed up arrow_statistics test #10735 (alamb)
  • minor: Refactor some unparser methods to improve readability #10788 (devinjdangelo)
  • Convert variance sample to udaf #10713 (yyin-dev)
  • Improve docs and fix a typo #10798 (lewiszlw)
  • Avoid the usage of intermediate ScalarValue to improve performance of extracting statistics from parquet files #10711 (xinlifoobar)
  • SMJ: Add more tests and improve comments #10784 (comphead)
  • Handle EmptyRelation during SQL unparsing #10803 (goldmedal)
  • Document Committer and PMC process #10778 (alamb)
  • Int64 as default type for make_array function empty or null case #10790 (jayzhan211)
  • Split SessionState into its own module #10794 (alamb)
  • Add StreamProvider for configuring StreamTable #10600 (matthewmturner)
  • Bench: Add PREFER_HASH_JOIN env variable #10809 (comphead)
  • Add ParquetAccessPlan, unify RowGroup selection and PagePruning selection #10738 (alamb)
  • Fix ScalarUDFImpl::propagate_constraints doc #10810 (lewiszlw)
  • Extract Parquet statistics from Interval column #10801 (marvinlanhenke)
  • build(deps): upgrade sqlparser to 0.47.0 #10392 (tisonkun)
  • Refactor and simplify the SQL unparser #10811 (goldmedal)
  • Minor: Remove code duplication in memory_limit derivation for datafusion-cli #10814 (comphead)
  • build(deps): update Arrow/Parquet to 52.0, object-store to 0.10 #10765 (waynexia)
  • chore: Prepare 39.0.0-rc1 #10828 (andygrove)

Credits

Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.

    44	Andrew Lamb
    18	Jay Zhan
    14	张林伟
    11	Andy Grove
    11	Xin Li
    10	Jonah Gao
     8	Jax Liu
     7	Mustafa Akur
     7	Oleks V
     7	dependabot[bot]
     5	Arttu
     5	Berkay Şahin
     5	Marvin Lanhenke
     4	Lordworms
     4	Ruihang Xia
     3	Bruce Ritchie
     3	Devin D'Angelo
     3	Duong Cong Toai
     3	Eduard Karacharov
     3	Junhao Liu
     3	Liang-Chi Hsieh
     3	Mohamed Abdeen
     3	Nga Tran
     3	Peter Toth
     3	Phillip LeBlanc
     2	Abrar Khan
     2	Adam Curtis
     2	Chunchun Ye
     2	Jeffrey Vo
     2	Michael Maletich
     2	QP Hou
     2	Trent Hauck
     2	Weijie Guo
     2	junxiangMu
     2	yfu
     1	Adrian Tanase
     1	Alex Huang
     1	Andrey Koshchiy
     1	Artem Medvedev
     1	ClSlaid
     1	Dan Harris
     1	Edmondo Porcu
     1	Jeffrey Smith II
     1	Kun Liu
     1	Leonardo Yvens
     1	Marko Milenković
     1	Matthew Turner
     1	Mehmet Ozan Kabak
     1	Michael J Ward
     1	NoeB
     1	Samuel Colvin
     1	Scott Anderson
     1	VimT
     1	Yue Yin
     1	baishen
     1	hsiang-c
     1	nathaniel-daniel
     1	shanretoo
     1	tison

Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.