Back to Datafusion

9.0.0

dev/changelog/9.0.0.md

53.1.019.5 KB
Original Source
<!--- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -->

9.0.0 (2022-06-10)

Full Changelog

Breaking changes:

  • MINOR: Move simplify_expression rule to datafusion-optimizer crate #2686 (andygrove)
  • Move physical expression planning to datafusion-physical-expr crate #2682 (andygrove)
  • Create new datafusion-optimizer crate for logical optimizer rules #2675 (andygrove)
  • Remove ExecutionProps dependency from OptimizerRule #2666 (andygrove)
  • Remove ObjectStoreSchemaProvider (#2656) #2665 (tustvold)
  • Move LogicalPlanBuilder to datafusion-expr crate #2576 (andygrove)
  • LogicalPlanBuilder now uses TableSource instead of TableProvider #2569 (andygrove)
  • Remove scan_empty method from LogicalPlanBuilder #2568 (andygrove)
  • MINOR: Move expression utils from sql module to expr crate #2553 (andygrove)
  • Remove scan_json methods from LogicalPlanBuilder #2541 (andygrove)
  • Remove scan_avro methods from LogicalPlanBuilder #2540 (andygrove)
  • Remove scan_parquet methods from LogicalPlanBuilder #2539 (andygrove)
  • MINOR: Move ExprVisitable and exprlist_to_columns to datafusion-expr crate #2538 (andygrove)
  • Remove scan_csv methods from LogicalPlanBuilder #2537 (andygrove)
  • Fix Redundant ScalarValue Boxed Collection #2523 (comphead)
  • Support for OFFSET in LogicalPlan #2521 (jdye64)

Implemented enhancements:

  • [EPIC] JIT support for DataFusion #2703
  • Show column names instead of column indices in query plans #2689
  • Proposal: remove automated ballista CI checks from DataFusion #2679
  • Pass SessionState to TableProvider #2658
  • Is ObjectStoreSchemaProvider Still Needed? #2656
  • Add logical plan support to datafusion-proto #2630
  • Like, NotLike expressions work with literal NULL #2626
  • Move JOIN ON predicates push down logic from planner to optimizer #2619
  • Remove ExecutionProps from OptimizerRule trait #2614
  • Add, Minus, Multiply, divide, Modulo operator work with literal NULL #2609
  • Support DESCRIBE <table> to show table schemas #2606
  • Support CREATE OR REPLACE TABLE #2605
  • filter_push_down tests should not rely on TableProvider and ExecutionPlan #2600
  • Move logical optimizer rules out of the core datafusion crate #2599
  • Push Limit through outer Join #2579
  • datafusion_proto crate should have exhaustive match statements for handling Expr #2565
  • String representation of Expr variant #2563
  • File URI Scheme Interpretation #2562
  • Implement physical plan for OFFSET #2551
  • Update limit pushdown rule to support offsets #2550
  • Move LogicalPlanBuilder to datafusion-expr crate #2536
  • Logical optimizer rule "simplify expressions" should not depend on the core datafusion crate #2535
  • Support optional filter in Join #2509
  • Improve SQL planner & logical plan support for JOIN conditions #2496
  • Numeric, String, Boolean comparisons with literal NULL #2482
  • Redundant ScalarValue Boxed Collection #2449
  • ObjectStore Directory Semantics #2445
  • Add support for OFFSET in SQL query planner + logical plan #2377
  • SQL planner should use TableSource not TableProvider #2346
  • Move SQL query planning to new crate #2345
  • Update LogicalPlan rustdoc code to not use LogicalPlanBuilder #2308
  • [Optimizer] Refactor convert join #2256
  • [Optimizer] Infer is not null predicate from where clause #2254
  • Support ArrayIndex for ScalarValue(List) #2207
  • [Ballista] Fill functional gaps between datafusion and ballista #2062
  • [Ballista] support datafusion built_in UDAF work in ballista cluster #1985
  • Export C API #1113

Fixed bugs:

  • Fix Typos in Docs #2695
  • Unable to build a docker image #2691
  • Optimization pass AggregateStatistics changes type of output from Int64 to UInt64 #2673
  • ViewTable Circular Reference #2657
  • ScalarValue::to_array_of_size panics computing statistics for nested parquet file #2653
  • The result type of count/count_distinct #2635
  • limit_push_down is not working properly with OFFSET #2624
  • Avro Tests Fail To Compile #2570
  • Unused Window functions experssion is wrongly removed from LogicalPlan during optimalization #2542
  • Bug: ObjectStoreRegistry get_by_uri does not return correct path when "scheme" is provided #2525
  • There are duplicate and inconsistent copies of datafusion.proto #2514
  • Projection pushdown produces incorrect results when column names are reused #2462
  • Incorrect Parquet Projection For Nested Types #2453
  • LogicalPlanBuilder::scan_csv creates scans with invalid table names #2278
  • Inner join incorrectly pushdown predicate with OR operation #2271
  • Ignored alias for columns with aggregate function and incorrect results when collecting statistics is enabled #2176
  • Join on path partitioned columns fails with error #2145

Documentation updates:

Closed issues:

  • [Question] Converting TableSource to custom TableProvider #2644
  • [Question] Why DataFusion is shipped with arrow version 9.1.0 on crates.io ? #2474

Merged pull requests:

  • Test optional features in CI #2708 (tustvold)
  • support indexed fields proto #2707 (nl5887)
  • Update sqlparser-rs to 0.18.0 #2705 (alamb)
  • [MINOR]: Add documentation to datafusion-row modules #2704 (alamb)
  • Make sure that the data types are supported in hashjoin before genera… #2702 (AssHero)
  • Move remaining code out of legacy core/logical_plan module #2701 (andygrove)
  • Move some tests from core to expr #2700 (andygrove)
  • MINOR: Improve Docs Readability #2696 (ryanrussell)
  • Combine limit and offset to fetch and skip and implement physical plan support #2694 (ming535)
  • MINOR: Add datafusion-sql example #2693 (andygrove)
  • Remove Ballista related lines from Dockerfile #2692 (mocknen)
  • Show column names instead of indices in query plans #2690 (andygrove)
  • MINOR: Remove uses of TryClone for Parquet #2681 (tustvold)
  • Fix AggregateStatistics optimization so it doesn't change output type #2674 (alamb)
  • If statistics of column Max/Min value does not exists in parquet file, sent Min/Max to None #2671 (AssHero)
  • MINOR: Move more expression code to datafusion-expr crate #2669 (andygrove)
  • MINOR: Rewrite imports in optimizer moduler #2667 (andygrove)
  • Update snmalloc-rs requirement from 0.2 to 0.3 #2663 (dependabot[bot])
  • Add module doc for RuntimeEnv, SessionContext, TaskContext, etc... #2655 (tustvold)
  • Prune unused dependencies from datafusion-proto #2651 (tustvold)
  • MINOR: Implement serde for join filter #2649 (andygrove)
  • pushdown support for predicates in ON clause of joins #2647 (korowa)
  • Move SortKeyCursor and RowIndex into modules, add sort_key_cursor test #2645 (alamb)
  • Implement DESCRIBE <table> #2642 (LiuYuHui)
  • Implement LogicalPlan serde in datafusion-proto #2639 (andygrove)
  • Fix limit + offset pushdown #2638 (ming535)
  • change result type of count/count_distinct from uint64 to int64 #2636 (liukun4515)
  • if none columns in window expr are needed, remove the window exprs #2634 (AssHero)
  • Like, NotLike expressions work with literal NULL #2627 (WinkerDu)
  • MINOR: Refactor datafusion-proto dependencies and imports #2623 (andygrove)
  • MINOR: add optimizer struct #2616 (jackwener)
  • Remove FilterPushDown dependency on physical plan #2615 (andygrove)
  • Support CREATE OR REPLACE TABLE #2613 (AssHero)
  • Support binary mathematical operators work with NULL literals #2610 (WinkerDu)
  • chore: try fix CI coverage #2608 (Ted-Jiang)
  • MINOR: Rename benchmark crate #2607 (andygrove)
  • chore(dep): bump cranelift to 0.84.0 #2598 (waynexia)
  • fix some typos #2597 (ming535)
  • Support limit pushdown through left right outer join #2596 (Ted-Jiang)
  • Unignore rustdoc code examples in datafusion-expr crate #2590 (andygrove)
  • Evaluate JIT'd expression over arrays #2587 (waynexia)
  • [minor]Fix ci clippy for unused import #2586 (Ted-Jiang)
  • [Doc]add doc for enable SIMD need cargo nightly #2577 (Ted-Jiang)
  • Add DataFrame union_distinct and fix documentation for distinct #2574 (andygrove)
  • Fix avro tests (#2570) #2571 (tustvold)
  • Make datafusion-proto match exhaustive #2567 (andygrove)
  • Support limit push down for offset_plan #2566 (Ted-Jiang)
  • Introduce Expr.variant_name() function #2564 (jdye64)
  • Fix some 404 links in the contribution guide #2561 (hi-rustin)
  • Update datafusion-cli readme cli version #2559 (hi-rustin)
  • MINOR: Move expr_rewriter.rs to datafusion-expr crate #2552 (andygrove)
  • Fix JOINs with complex predicates in ON (split ON expressions only by AND operator) #2534 (korowa)
  • Reduce duplication in file scan tests #2533 (tustvold)
  • Fix size_of_scalar test #2531 (alamb)
  • Update to arrow-rs 14.0.0 #2528 (alamb)
  • ObjectStoreRegistry get_by_uri now returns correct path when "scheme" is provided #2526 (timvw)
  • MINOR: Add ORDER BY clause to test #2524 (andygrove)
  • Remove unused binary_array_op_scalar! in binary.rs #2512 (alamb)
  • fix NULL <op> column evaluation, tests for same #2510 (alamb)
  • Fix projection pushdown produces incorrect results when column names are reused #2463 (jonmmease)
  • Benchmark for sort preserving merge #2431 (alamb)
  • Support GetIndexedFieldExpr for ScalarValue #2196 (ovr)