dev/changelog/51.0.0.md
This release consists of 537 commits from 129 contributors. See credits at the end of this changelog for more information.
See the upgrade guide for information on how to upgrade from previous versions.
Breaking changes:
TypeSignatureClass::Binary to allow accepting arbitrarily sized FixedSizeBinary arguments #17531 (Jefffrey)datafusion-proto to use TaskContext rather thanSessionContext for physical plan serialization #17601 (milenkovicm)reassign_predicate_columns #17703 (rkrishn7)ParquetSource::predicate() and merge into FileSource::filter() #17971 (getChan)type_coercion/aggregate.rs functions #18091 (Jefffrey)DESCRIBE SELECT to show schema rather than EXPLAIN plan #18238 (djanderson)FileScanConfig own a list of ProjectionExprs #18253 (friendlymatthew)expr_fields to AccumulatorArgs to hold input argument fields #18100 (Jefffrey)is_ordered_set_aggregate to supports_within_group_clause for UDAFs #18397 (Jefffrey)Performance related:
Hash and Ord speed for dyn LogicalType #17437 (findepi)&&String::to_string #17583 (findepi)NOT(IN ..) to NOT IN and NOT (EXISTS ..) to NOT EXISTS #17848 (Tpt)string_agg() aggregate function (1000x speed for no DISTINCT and ORDER case) #17837 (2010YOUY01)regexp_like calls as ~ and *~ operator expressions when possible #17839 (pepijnve)aggregate_vectorized bench benchmark for PrimitiveGroupValueBuilder as well #17930 (rluvaton)NullBuffer::union for Spark concat #18087 (comphead)array_has #18161 (2010YOUY01)ScalarValue::to_array_of_size for Boolean and some null values #18180 (rluvaton)ExpressionOrExpression case evaluation method #18444 (pepijnve)Implemented enhancements:
DFSchema.print_schema_tree() method #17459 (comphead)length function #17475 (wForget)join_fuzz testing #17497 (jonathanc-n)sql feature to make sql planning optional #17332 (timsaucer)OR REPLACE to creating external tables #17580 (jonathanc-n)NullEquality in join executor's EXPLAIN output #17664 (2010YOUY01)make_interval function #17424 (davidlghellin)udafs and udwfs methods on FunctionRegistry #17650 (milenkovicm)Utf8View for more args of regexp_replace #17195 (mbutrovich)map function map_from_arrays #17456 (SparkApplicationMaster)make_dt_interval function #17728 (davidlghellin)map function map_from_entries #17779 (SparkApplicationMaster)RightMark Join #17651 (jonathanc-n)try_parse_url function #17485 (rafafrdz)elt function #17729 (davidlghellin)concat string function #18063 (comphead)null_treatment, distinct, and filter for window functions in proto #18024 (dqkqd)EXPLAIN ANALYZE detail level #18098 (2010YOUY01)ClassicJoin for PWMJ #17482 (jonathanc-n)deregister_object_store #17999 (jonathanc-n)DataSourceExec with parquet source #18196 (2010YOUY01)output_bytes to baseline metrics #18268 (2010YOUY01)PruningMetrics and use it in parquet file pruning metric #18297 (2010YOUY01)selectivity metrics to FilterExec #18406 (2010YOUY01)reduction_factor metric to AggregateExec for EXPLAIN ANALYZE #18455 (petern48)Fixed bugs:
QUALIFY #17313 (rkrishn7)CONTRIBUTING.md #17507 (Weijun-H)FileScanConfig #17546 (rkrishn7)SortExec TopK OOM #17622 (nuno-faria)OuterReferenceColumn to contain the entire outer field to prevent metadata loss #17524 (Kontinuation)array_reverse on FixedSizeList #17673 (chenkovsky)NestedLoopJoinExec #17680 (duongcongtoai)DataType::Null in possible types during csv type inference #17796 (dqkqd)ParquetSource - with_predicate() don't have to reset metrics #17858 (2010YOUY01)common_sub_expression_eliminate fails in a window function #17852 (dqkqd)PrimitiveGroupValueBuilder to match NaN correctly in scalar equal_to #17979 (rluvaton)array_distinct inner nullability causing type mismatch #18104 (dqkqd)abs() #18304 (Jefffrey)PartitionedFile paths during deserialization #18346 (lonless9)DataFrame::select_columns and DataFrame::drop_columns for qualified duplicated field names #18236 (dqkqd)Documentation updates:
CREATE EXTERNAL TABLE #17232 (BlakeOrth)introduction.md #17669 (Jefffrey)datafusion 50.0.0 is not released #17695 (nuno-faria)avg_distinct() and sum_distinct() functions to DataFrame API #17536 (Jefffrey)WHERE, ORDER BY, LIMIT, SELECT, EXTEND pipe operators #17278 (simonvandel)auto_doc_cfg with doc_cfg #17845 (mbrobbel)FunctionRegistry udafs and udwfs methods mandatory #17847 (milenkovicm)AS, UNION, INTERSECTION, EXCEPT, AGGREGATE pipe operators #17312 (simonvandel)Window::try_new_with_schema with a descriptive error message #17926 (dqkqd)JOIN pipe operator #17969 (simonvandel)working-with-exprs.md #18033 (Weijun-H)nvl a thin wrapper for coalesce #17991 (pepijnve)Metrics section to the user-guide #18216 (2010YOUY01)rust,ignore blocks #18239 (Jefffrey)AggregateUDFImpl::is_ordered_set_aggregate documentation #17805 (Jefffrey)time_zone to None (was "+00:00") #18359 (Omega359)nvl and nvl2 simplification #18567 (alamb)Other:
TableFunction clonable #17457 (sunng87)PartialEq, Eq speed for LexOrdering, make PartialEq and PartialOrd consistent #17442 (findepi)PartialOrd for logical plan nodes and expressions #17438 (findepi)SMJ tests into own file #17495 (jonathanc-n)required_status_checks for now #17537 (blaginin)encode_arrow_schema from arrow-rs. #17543 (samueleresca)TableProvider::scan_with_args #17336 (adriangb)substring.rs #17570 (AdamGS)unicode_expressions in dev-dependencies to fix substring planning test #17584 (kosiew)datafusion/physical-expr-adapter crate #17591 (xudong963)Display formatting of DataType:s in error messages #17565 (emilk)avg(distinct) support for decimal types #17560 (Jefffrey)datafusion-sql package dependencies have sql flag #17644 (Jefffrey)IS NOT DISTINCT FROM joins as Hash Joins #17319 (2010YOUY01)proto crate has datetime & unicode expr flags in datafusion dev dependency #17656 (Jefffrey)datafusion-functions macros #17638 (Jefffrey)LimitPushPastWindows public #17736 (linhr)OptimizerContext with provided ConfigOptions #17742 (MichaelScofield)LargeList in array_has simplification to InList #17732 (Jefffrey)arrow / parquet to 56.2.0 #17631 (alamb)Expr::qualified_name() and Column::new() to extract partition keys from window and aggregate operators #17757 (masonh22)CAST from temporal to Utf8View #17535 (findepi)sql_planner benchmark query #17809 (alamb)AsRef for Expr #17819 (findepi)partition_statistics API for InterleaveExec #17051 (liamzwbao)CastColumnExpr for struct-aware column casting #17773 (kosiew)apply_schema_adapter_tests #17905 (alamb)InListExpr plan display #17884 (pepijnve)extended_tests #17922 (blaginin)print_schema_tree to tree_string #17919 (comphead)Execution and Internal errors #17921 (comphead)ListingScan projection against table schema including partition columns #17911 (mach-kernel)vectorized_equal_to bench mutated between iterations #17968 (rluvaton)replace_with_order_preserving_variants tests to use insta snapshots for easier updates #17962 (blaginin)join_selection tests to snapshot-based testing #17974 (blaginin)gather_filters_for_pushdown for CoalescePartitionsExec #18046 (xudong963)SortPreservingMergeExec tree formatting with limit #18009 (AdamGS)min_max_bytes benchmark (Reproduce quadratic runtime in min_max_bytes) #18041 (ctsk)skip_failed_rules config in slt #18117 (Jefffrey)insta #18106 (blaginin)datafusion-datasource-arrow crate #18082 (timsaucer)to_timestamp(double) for vectorized input #18147 (dqkqd)concat_elements_utf8view capacity initialization. #18003 (samueleresca)no space left on device (#18141) #18151 (alamb)DISTINCT ON for tables with no columns (ReplaceDistinctWithAggregate: do not fail when on input without columns) #18133 (Tpt)CASE WHEN #18203 (rluvaton)nvl2 Function to Support Lazy Evaluation and Simplification via CASE Expression #18191 (kosiew)DataTypeExt and FieldExt #18271 (alamb)half to 2.7.1, ignore RUSTSEC-2025-0111 #18287 (alamb)is_set on first_value and last_value #18303 (marc-pydantic)0.25.2 and drop ignore of RUSTSEC-2025-0111 #18305 (DDtKey)try_append_value from arrow-rs 57.0.0 #18313 (samueleresca)concat_elements_utf8view #18316 (2010YOUY01)merge in case expression #18369 (pepijnve)range/gen_series signature away from user defined #18317 (Jefffrey)0.60.0 to use substrait spec v0.75.0 #17866 (benbellick)date_trunc granularity #18390 (comphead)CASE evaluation #18329 (pepijnve)NowFunc::new() with canonical ConfigOptions timezone and enhance documentation #18347 (kosiew)List(LargeList(_)) types #18363 (sdf-jkl)enforce_distrubution tests to insta #18185 (blaginin)snapshot_physical_expr #18498 (AdamGS)parquet_encryption by default in datafusion-sqllogictests #18492 (zhuqi-lucas)clippy::needless_pass_by_value rule #18468 (2010YOUY01)calculate_binary_math in datafusion-functions #18525 (Jefffrey)log() signature to use coercion API + fixes #18519 (Jefffrey)Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.
88 dependabot[bot]
49 Jeffrey Vo
35 Andrew Lamb
20 Yongting You
19 Adrian Garcia Badaracco
14 Blake Orth
12 Pepijn Van Eeckhoudt
12 Piotr Findeisen
11 Chen Chongchen
11 Dmitrii Blaginin
11 Yu-Chuan Hung
9 Jonathan Chen
9 Khanh Duong
9 Oleks V
9 Peter Nguyen
8 Alex Huang
8 Qi Zhu
8 Raz Luvaton
7 Adam Gutglick
7 Rohan Krishnaswamy
7 Tim Saucer
7 kosiew
6 xudong.w
5 Nuno Faria
4 Dhanush
4 Samuele Resca
4 Simon Vandel Sillesen
4 Sriram Sundar
4 Vegard Stikbakke
3 Bruce Ritchie
3 David López
3 EeshanBembi
3 Jack Kleeman
3 Kazantsev Maksim
3 Marko Milenković
3 Thomas Tanon
2 Andy Grove
2 Bruno Volpato
2 Christian
2 Colin Marc
2 Cora Sutton
2 David Stancu
2 Devam Patel
2 Eugene Tolbakov
2 Evgenii Glotov
2 Kristin Cowalcijk
2 Liam Bao
2 Marc Brinkmann
2 Michael Kleen
2 Namgung Chan
2 Ning Sun
2 Randy
2 Sergey Zhukov
2 Viktor Yershov
2 bubulalabu
2 dennis zhuang
2 jizezhang
2 wiedld
1 Ahmed Mezghani
1 Aldrin M
1 Alfonso Subiotto Marqués
1 Anders
1 Artem Medvedev
1 Aryamaan Singh
1 Ben Bellick
1 Berkay Şahin
1 Bert Vermeiren
1 Brent Gardner
1 Christopher Watford
1 Dan Lovell
1 Daniël Heres
1 Dewey Dunnington
1 Douglas Anderson
1 Duong Cong Toai
1 Emil Ernerfeldt
1 Emily Matheys
1 Enrico La Sala
1 Eshed Schacham
1 Filippo Rossi
1 Gabriel
1 Gene Bordegaray
1 Georgi Krastev
1 Haresh Khanna
1 Heran Lin
1 Hiroaki Yutani
1 Ian Lai
1 Ilya Ostanevich
1 JanKaul
1 Kosta Tarasov
1 LFC
1 Leonardo Yvens
1 Lía Adriana
1 Manasa Manoj
1 Martin
1 Martin Grigorov
1 Martin Hilton
1 Mason
1 Matt Butrovich
1 Matthew Kim
1 Matthijs Brobbel
1 Nga Tran
1 Nihal Rajak
1 Rafael Fernández
1 Renan GEHAN
1 Renato Marroquin
1 Rok Mihevc
1 Ruilei Ma
1 Sai Mahendra
1 Sergei Grebnov
1 Shiv Bhatia
1 Tobias Schwarzinger
1 UBarney
1 Victor Barua
1 Victorien
1 Vyquos
1 Weston Pace
1 XL Liang
1 Xander
1 Zhen Wang
1 aditya singh rathore
1 dario curreri
1 ding-young
1 feniljain
1 gene-bordegaray
1 harshasiddartha
1 mwish
1 peasee
1 r1b
1 theirix
Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.