Back to Datafusion

32.0.0

dev/changelog/32.0.0.md

53.1.020.0 KB
Original Source
<!--- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -->

32.0.0 (2023-10-07)

Full Changelog

Breaking changes:

  • Remove implicit interval type coercion from ScalarValue comparison #7514 (tustvold)
  • Remove get_scan_files and ExecutionPlan::file_scan_config (#7357) #7487 (tustvold)
  • Move FileCompressionType out of common and into core #7596 (haohuaijin)
  • Update arrow 47.0.0 in DataFusion #7587 (tustvold)
  • Rename bounded_order_preserving_variants config to prefer_exising_sort and update docs #7723 (alamb)

Implemented enhancements:

  • Parallelize Stateless (CSV/JSON) File Write Serialization #7452 (devinjdangelo)
  • Create a Priority Queue based Aggregation with limit #7192 (avantgardnerio)
  • feat: add guarantees to simplification #7467 (wjones127)
  • [Minor]: Produce better plan when group by contains all of the ordering requirements #7542 (mustafasrepo)
  • Make AvroArrowArrayReader possible to scan Avro backed table which contains nested records #7525 (sarutak)
  • feat: Support spilling for hash aggregation #7400 (kazuyukitanimura)
  • Parallelize Parquet Serialization #7562 (devinjdangelo)
  • feat: natively support more data types for the abs function. #7568 (jonahgao)
  • feat: Parallel collecting parquet files statistics #7573 #7595 (hengfeiyang)
  • Support hashing List columns #7616 (jonmmease)
  • feat: Better large output display in datafusion-cli with --maxrows option #7617 (2010YOUY01)
  • feat: make parse_float_as_decimal work on negative numbers #7648 (jonahgao)
  • Update Default Parquet Write Compression #7692 (devinjdangelo)
  • Support all the codecs supported by Avro #7718 (sarutak)
  • Optimize "ORDER BY + LIMIT" queries for speed / memory with special TopK operator #7721 (Dandandan)

Fixed bugs:

  • fix: inconsistent behaviors when dividing floating numbers by zero #7503 (jonahgao)
  • fix: skip EliminateCrossJoin rule if inner join with filter is found #7529 (epsio-banay)
  • fix: check for precision overflow when parsing float as decimal #7627 (jonahgao)
  • fix: substrait limit when fetch is None #7669 (waynexia)
  • fix: coerce text to timestamps with timezones #7720 (mhilton)
  • fix: avro_to_arrow: Handle avro nested nullable struct (union) #7663 (Samrose-Ahmed)

Documentation updates:

  • Documentation Updates for New Write Related Features #7520 (devinjdangelo)
  • Create 2023 Q4 roadmap #7551 (graydenshand)
  • docs: add section on supports_filters_pushdown #7680 (tshauck)
  • Add LanceDB to the list of Known Users #7716 (alamb)
  • Document crate feature flags #7713 (alamb)

Merged pull requests:

  • Prepare 31.0.0 release #7508 (andygrove)
  • Minor(proto): Implement TryFrom<&DFSchema> for protobuf::DfSchema #7505 (jonahgao)
  • fix: inconsistent behaviors when dividing floating numbers by zero #7503 (jonahgao)
  • Parallelize Stateless (CSV/JSON) File Write Serialization #7452 (devinjdangelo)
  • Minor: Remove stray comment markings from encoding error message #7512 (devinjdangelo)
  • Remove implicit interval type coercion from ScalarValue comparison #7514 (tustvold)
  • Minor: deprecate ScalarValue::get_datatype() #7507 (Weijun-H)
  • Propagate error from spawned task reading spills #7510 (viirya)
  • Refactor the EnforceDistribution Rule #7488 (mustafasrepo)
  • Remove get_scan_files and ExecutionPlan::file_scan_config (#7357) #7487 (tustvold)
  • Simplify ScalarValue::distance (#7517) #7519 (tustvold)
  • typo: change delimeter to delimiter #7521 (Weijun-H)
  • Fix some simplification rules for floating-point arithmetic operations #7515 (jonahgao)
  • Documentation Updates for New Write Related Features #7520 (devinjdangelo)
  • [MINOR]: Move tests from repartition to enforce_distribution file #7539 (mustafasrepo)
  • Update the async-trait crate to resolve clippy bug #7541 (metesynnada)
  • Fix flaky test_sort_fetch_memory_calculation test #7534 (viirya)
  • Move common code to utils #7545 (mustafasrepo)
  • Minor: Add comments and clearer constructors to Interval #7526 (alamb)
  • fix: skip EliminateCrossJoin rule if inner join with filter is found #7529 (epsio-banay)
  • Create a Priority Queue based Aggregation with limit #7192 (avantgardnerio)
  • feat: add guarantees to simplification #7467 (wjones127)
  • [Minor]: Produce better plan when group by contains all of the ordering requirements #7542 (mustafasrepo)
  • Minor: beautify interval display #7554 (Weijun-H)
  • replace ptree with termtree #7560 (avantgardnerio)
  • Make AvroArrowArrayReader possible to scan Avro backed table which contains nested records #7525 (sarutak)
  • Fix a race condition issue on reading spilled file #7538 (sarutak)
  • [MINOR]: Add is single method #7558 (mustafasrepo)
  • Fix describe <table> to work without SessionContext #7441 (alamb)
  • Make the tests in SHJ faster #7543 (metesynnada)
  • feat: Support spilling for hash aggregation #7400 (kazuyukitanimura)
  • Make backtrace as a cargo feature #7527 (comphead)
  • Minor: Fix clippy by switching to timestamp_nanos_opt instead of (deprecated) timestamp_nanos #7572 (alamb)
  • Update sqllogictest requirement from 0.15.0 to 0.16.0 #7569 (dependabot[bot])
  • extract datafusion-physical-plan to its own crate #7432 (alamb)
  • First and Last Accumulators should update with state row excluding is_set flag #7565 (viirya)
  • refactor: simplify code of eliminate_cross_join.rs #7561 (jackwener)
  • Update release instructions for datafusion-physical-plan crate #7576 (alamb)
  • Minor: Update chrono pin to 0.4.31 #7575 (alamb)
  • [feat] Introduce cacheManager in session ctx and make StatisticsCache share in session #7570 (Ted-Jiang)
  • Enhance/Refactor Ordering Equivalence Properties #7566 (mustafasrepo)
  • fix misplaced statements in sqllogictest #7586 (jonahgao)
  • Update substrait requirement from 0.13.1 to 0.14.0 #7585 (dependabot[bot])
  • chore: use the create_udwf function in simple_udwf, consistent with simple_udf and simple_udaf #7579 (tanruixiang)
  • Implement protobuf serialization for AnalyzeExec #7574 (adhish20)
  • chore: fix catalog's usage docs error and add docs about CatalogList trait #7582 (tanruixiang)
  • Implement CardinalityAwareRowConverter while doing streaming merge #7401 (JayjeetAtGithub)
  • Parallelize Parquet Serialization #7562 (devinjdangelo)
  • feat: natively support more data types for the abs function. #7568 (jonahgao)
  • implement string_to_array #7577 (casperhart)
  • Create 2023 Q4 roadmap #7551 (graydenshand)
  • chore: reduce physical-plan dependencies #7599 (crepererum)
  • Minor: add githubs start/fork buttons to documentation page #7588 (alamb)
  • Minor: add more examples for CREATE EXTERNAL TABLE doc #7594 (comphead)
  • Update nix requirement from 0.26.1 to 0.27.1 #7438 (dependabot[bot])
  • Update sqllogictest requirement from 0.16.0 to 0.17.0 #7606 (dependabot[bot])
  • Fix panic in TopK #7609 (avantgardnerio)
  • Move FileCompressionType out of common and into core #7596 (haohuaijin)
  • Expose contents of Constraints #7603 (tv42)
  • Change the unbounded_output API default #7605 (metesynnada)
  • feat: Parallel collecting parquet files statistics #7573 #7595 (hengfeiyang)
  • Support hashing List columns #7616 (jonmmease)
  • [MINOR] Make the sink input aware of its plan #7610 (metesynnada)
  • [MINOR] Reduce complexity on SHJ #7607 (metesynnada)
  • feat: Better large output display in datafusion-cli with --maxrows option #7617 (2010YOUY01)
  • Minor: add examples for arrow_cast and arrow_typeof to user guide #7615 (alamb)
  • [MINOR]: Fix stack overflow bug for get field access expr #7623 (mustafasrepo)
  • Group By All #7622 (berkaysynnada)
  • Implement protobuf serialization for (Bounded)WindowAggExec. #7557 (vrongmeal)
  • Make it possible to compile datafusion-common without default features #7625 (jonmmease)
  • Minor: Adding backtrace documentation #7628 (comphead)
  • fix(5975/5976): timezone handling for timestamps and date_trunc, date_part and date_bin #7614 (wiedld)
  • Minor: remove unecessary Arcs in datetime_expressions #7630 (alamb)
  • fix: check for precision overflow when parsing float as decimal #7627 (jonahgao)
  • Update arrow 47.0.0 in DataFusion #7587 (tustvold)
  • Add test crate to compile DataFusion with wasm-pack #7633 (jonmmease)
  • Minor: Update documentation of case expression #7646 (ongchi)
  • Minor: improve docstrings on SessionState #7654 (alamb)
  • Update example in the DataFrame documentation. #7650 (jsimpson-gro)
  • Add HTTP object store example #7602 (pka)
  • feat: make parse_float_as_decimal work on negative numbers #7648 (jonahgao)
  • Minor: add doc comments to ExtractEquijoinPredicate #7658 (alamb)
  • [MINOR]: Do not add unnecessary hash repartition to the physical plan #7667 (mustafasrepo)
  • Minor: add ticket references to parallel parquet writing code #7592 (alamb)
  • Minor: Add ticket reference and add test comment #7593 (alamb)
  • Support Avro's Enum type and Fixed type #7635 (sarutak)
  • Minor: Migrate datafusion-proto tests into it own binary #7668 (ongchi)
  • Upgrade apache-avro to 0.16 #7674 (sarutak)
  • Move window analysis to the window method #7672 (mustafasrepo)
  • Don't add filters to projection in TableScan #7670 (Dandandan)
  • Minor: Improve TableProviderFilterPushDown docs #7685 (alamb)
  • FIX: Test timestamp with table #7701 (jayzhan211)
  • Fix bug in SimplifyExpressions #7699 (Dandandan)
  • Enhance Enforce Dist capabilities to fix, sub optimal bad plans #7671 (mustafasrepo)
  • docs: add section on supports_filters_pushdown #7680 (tshauck)
  • Improve cache usage in CI #7678 (sarutak)
  • fix: substrait limit when fetch is None #7669 (waynexia)
  • minor: revert parsing precedence between Aggr and UDAF #7682 (waynexia)
  • Minor: Move hash utils to common #7684 (jayzhan211)
  • Update Default Parquet Write Compression #7692 (devinjdangelo)
  • Stop using cache for the benchmark job #7706 (sarutak)
  • Change rust.yml to run benchmark #7708 (sarutak)
  • Extend infer_placeholder_types to support BETWEEN predicates #7703 (andrelmartins)
  • Minor: Add comment explaining why verify benchmark results uses release mode #7712 (alamb)
  • Support all the codecs supported by Avro #7718 (sarutak)
  • Update substrait requirement from 0.14.0 to 0.15.0 #7719 (dependabot[bot])
  • fix: coerce text to timestamps with timezones #7720 (mhilton)
  • Add LanceDB to the list of Known Users #7716 (alamb)
  • Enable avro reading/writing in datafusion-cli #7715 (alamb)
  • Document crate feature flags #7713 (alamb)
  • Minor: Consolidate UDF tests #7704 (alamb)
  • Minor: fix CI failure due to Cargo.lock in datafusioncli #7733 (yjshen)
  • MINOR: change file to column index in page_filter trace log #7730 (mapleFU)
  • preserve array type / timezone in date_bin and date_trunc functions #7729 (mhilton)
  • Remove redundant is_numeric for DataType #7734 (qrilka)
  • fix: avro_to_arrow: Handle avro nested nullable struct (union) #7663 (Samrose-Ahmed)
  • Rename SessionContext::with_config_rt to SessionContext::new_with_config_from_rt, etc #7631 (alamb)
  • Rename bounded_order_preserving_variants config to prefer_exising_sort and update docs #7723 (alamb)
  • Optimize "ORDER BY + LIMIT" queries for speed / memory with special TopK operator #7721 (Dandandan)
  • Minor: Improve crate docs #7740 (alamb)
  • [MINOR]: Resolve linter errors in the main #7753 (mustafasrepo)
  • Minor: Build concat_internal() with ListArray construction instead of ArrayData #7748 (jayzhan211)
  • Minor: Add comment on input_schema from AggregateExec #7727 (viirya)
  • Fix column name for COUNT(*) set by AggregateStatistics #7757 (qrilka)
  • Add documentation about type signatures, and export TIMEZONE_WILDCARD #7726 (alamb)
  • [feat] Support cache ListFiles result cache in session level #7620 (Ted-Jiang)
  • Support SHOW ALL VERBOSE to show settings description #7735 (comphead)