Back to Datafusion

17.0.0

dev/changelog/17.0.0.md

53.1.020.8 KB
Original Source
<!--- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -->

17.0.0 (2023-01-27)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Add null-equals-null JOIN support in Substrait producer/consumer #5084
  • Cleaner code for Read Options in reader methdos. #5024
  • Substrait donation follow-on work #4897
  • Add len method to DataFrame #1926

Fixed bugs:

  • Clippy failures in master branch and in PRs (due to new nightly Rust) #5080

Merged pull requests:

17.0.0-rc1 (2023-01-26)

Full Changelog

Breaking changes:

  • Change ExecutionPlan::maintains_input_order to return vector (to support multi children executors better) #5035 (mustafasrepo)
  • Allow overriding error type in DataFusion Result #5000 (tustvold)
  • Add dictionary_expresions feature (#4386) #4999 (tustvold)

Implemented enhancements:

  • Retain the ordering of fields in the table schema when creating the projection for an update plan #5052
  • [sqllogictest] Remove integration-tests directory #5011
  • [sqllogictest] Consolidate normalization code for the postgres and non-postgres paths #5010
  • [sqllogictest] Don't orchestrate the postgres containers with rust / docker #5009
  • check external table exist before creating a table #4997
  • Implement std::error::Error for DataFusionError #4991
  • Return Vec<bool> instead of bool in ExecutionPlan::maintains_input_order #4980
  • Add support for linear range search #4979
  • Add support for bounded execution when window query involves UNBOUNDED PRECEDING #4978
  • Infer prepared statement parameter types for insert queries with values clauses #4976
  • The filter of outer table happens multiple time after optimizing in-subquery to join #4914
  • Support Describe FILE in datafusion-cli #4913
  • Release DataFusion 16 #4776
  • Support writing lists in the arrow csv writer #4502
  • Replace python based integration test with sqllogictest #4462
  • Support CREATE TABLE table_name(...schema_fields) #4396
  • Make Binary Dictionary Operations Optional #4386
  • Improve / Cleanup DataFusion CI #3045
  • More frequent DataFusion releases to crates.io (discussion) #2327

Fixed bugs:

  • UPDATE statment for non existent column doesn't error out #5068
  • Limit doesn't drop on first batch when limit size == fetch size. #5064
  • Performance regressions since DataFusion 15.x #5060
  • Quoted schema and table names result in double-quoted names in logical plan. #5058
  • Homebrew release script has the amount of arguments being incorrect #5043
  • CI Failing with Out of Disk #5040
  • Doc links to LogicalPlan in the core package need updating. #5036
  • explain analyze can not see csvexec execution time metrics #5014
  • AVG(nulls) returns 0 rather than NULL #5007
  • Invalid Placeholders return internal error (rather than Plan error) #5005
  • select * from csv error #4996
  • Incorrect nested error wrapped to ArrowError:External variant for joins #4981

Documentation updates:

Closed issues:

  • Support sub directories in sqllogictest runner #4709
  • Bug displaying fractional seconds in IntervalMonthDayNano #4220

Merged pull requests:

  • Add release-crates.sh script #5070 (iajoiner)
  • Validate assignment target column existence for UPDATE statements #5069 [sql] (gruuya)
  • Fix limit when size of batch to poll == skip/fetch value #5066 (Dandandan)
  • Fix CREATE SCHEMA schema name double quoting issue. #5059 [sql] (neumark)
  • Minor: Move some aggregate error tests to sqllogictests #5055 (alamb)
  • Add decimal support to substrait serde #5054 (andygrove)
  • Retain schema order in projection #5053 [sql] (avantgardnerio)
  • Improve join type support in substrait #5051 (andygrove)
  • [Substrait] ReadRel. Get column names from TableScan source #5050 (andygrove)
  • Ensure insert projections are of correct type #5049 [sql] (avantgardnerio)
  • Remove unnecessary pyo3 dependency from datafusion crate #5048 (tustvold)
  • Cleanup CI (#5040) #5047 (tustvold)
  • Fix homebrew publish script #5044 (iajoiner)
  • Update docs links to logical plans module. #5037 (vincev)
  • [sqllogictest] Read subdirectories in test_files #5033 (melgenek)
  • minor: Fix docs for create_default_catalog_and_schema #5032 (alamb)
  • Remove python based posgres comparsion integration-test #5031 (alamb)
  • [sqllogictest] Create empty tables #5026 [sql] (melgenek)
  • Simplify the PushDownLimit. #5021 (HaoYang670)
  • [BugFix] fix explain csv/json/avro exec can not see metrics bug #5018 (xiaoyong-z)
  • Check placeholder __timeTo and return Datafusion::Plan error #5017 [sql] (matthias-Q)
  • [sqllogictets] Remove postgres container orchestration #5015 (alamb)
  • Sqllogictest: use the same normalization for all tests #5013 (melgenek)
  • Minor: Remove invalid comments #5012 [sql] (xudong963)
  • AVG(null) is NULL (not zero) #5008 (alamb)
  • Minor: improve internal error message #5006 (alamb)
  • Support for bounded execution when window frame involves UNBOUNDED PRECEDING #5003 (mustafasrepo)
  • Bump sqllogictest to v0.11.1 #5002 (xudong963)
  • Minor: Document how to create ListingTables #5001 (alamb)
  • [Enhancement] early check table exist before create #4998 (xiaoyong-z)
  • [Feature] support describe file #4995 [sql] (xiaoyong-z)
  • Implement std::error::Error::source() for DataFusionError, make DataFusionError::find_root more generic #4992 (alamb)
  • Add support for linear range calculation in WINDOW functions #4989 (mustafasrepo)
  • re-export substrait crate #4988 (jdye64)
  • minor: Update data type support documentation #4984 (alamb)
  • fix(4981): incorrect error wrapping in OnceFut #4983 (DDtKey)
  • Infer values for inserts #4977 [sql] (avantgardnerio)
  • Simplify GroupByHash implementation (to prepare for more work) #4972 (alamb)
  • Add DataFusionError::Substrait variant to DataFusionError enum #4971 (jdye64)
  • refactor: display input partitions for RepartitionExec #4969 (crepererum)
  • Upgrade to Substrait 0.4.0 #4966 (mbrobbel)
  • Expose sql_to_statement and statement_to_plan on SessionState #4958 (avantgardnerio)
  • Minor: Make messages consistent for LogicalPlan::Dml #4953 [sql] (alamb)
  • Do not resort inputs to UnionExec if they are already sorted #4946 (alamb)
  • Minor: Reduce even more redundancy creating window_agg in sort_enforcement tests #4945 (alamb)
  • Only add outer filter once when transforming exists/in subquery to join #4944 (ygf11)
  • fix: FieldNotFound error message without valid fields #4942 [sql] (DDtKey)
  • Propagate planning error back to user #4940 (fsdvh)
  • Make it able to specify a session id for SessionState #4933 (yahoNanJing)
  • SUPPORT SEMI/ANTI JOIN SQL syntax in DataFusion #4932 [sql] (mingmwang)
  • Support gs:// as GCS schema #4930 (jychen7)
  • Upgrade object_store from 0.5.0 to 0.5.3 #4929 (jychen7)
  • Reduce redundancy in sort_enforcement tests #4928 (alamb)
  • Update to arrow 31 #4927 [sql] (tustvold)
  • Unify Row hash and hash implementation #4924 (mustafasrepo)
  • Support join-filter pushdown for semi/anti join #4923 (ygf11)
  • Minor add ticket link to broken test #4919 (alamb)
  • Improve documentation for ExprVisitor, port simple uses to new walking function #4916 (alamb)
  • Add substrait label to PRs #4915 (alamb)
  • Executing ProjectionExec with no column should not return an Err #4912 (viirya)
  • Refactor: Add LogicalPlan::observe_expressions to walk expressions #4906 (alamb)
  • Minor: Port information schema tests to sqllogictest #4905 (alamb)
  • Add insert/update/delete to LogicalPlan and add SQL planner support #4902 [sql] (avantgardnerio)
  • fix: Visit subqueries in Expr::Alias #4900 (askoa)
  • [Substrait] Change API to return LogicalPlan instead of DataFrame #4896 (andygrove)
  • Upgrade to substrait 0.3 #4895 (andygrove)
  • Add datafusion-substrait crate to workspace #4893 (andygrove)
  • refactor and add simple function to deserialize and serialize proto b… #4892 (jdye64)
  • Update optimize_children to return Result<Option<LogicalPlan>> #4888 (HaoYang670)
  • Do not repartition inputs whose sort order is required #4885 (alamb)
  • Minor: Add docstrings to UnionExec #4884 (alamb)
  • Update datafusion-substrait crate to build against repo version of DataFusion #4879 (andygrove)
  • Fix column indices in EnforceDistribution optimizer in Partial AggregateMode #4878 (jonmmease)
  • refactor: improve repartition buffering #4867 (crepererum)
  • Rewrite coerce_plan_expr_for_schema to fix union type coercion #4862 (ygf11)
  • (#4462) Postgres compatibility tests using sqllogictest #4834 (melgenek)
  • Support non-tuple expression for in-subquery to join #4826 (ygf11)
  • Update to arrow 30.0.1 #4818 [sql] (tustvold)
  • Refine the statistics estimation for the limit and aggregate operator #4716 (yahoNanJing)
  • Infer prepared statement parameter types #4701 [sql] (avantgardnerio)
  • Add datafusion-substrait crate #4543 (andygrove)
  • Refactor loser tree code in SortPreservingMerge per PR comments #4407 (alamb)