Back to Datafusion

11.0.0

dev/changelog/11.0.0.md

53.1.030.3 KB
Original Source
<!--- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -->

11.0.0 (2022-08-16)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Make RowAccumulator public #3138
  • docs: proposal for consolidating docs into a Contributor Guide #3127
  • feat: support Timestamp +/- Interval #3103
  • a arrow_typeof like posgresql's pg_typeof #3095
  • Add DataFrame section to user guide #3066
  • Document all scalar SQL functions in user guide #3065
  • Simplify implementation of approx_median so that it can be exposed in Python #3063
  • Support double quoted literal strings for dialects(such as mysql,bigquery) #3055
  • Simplify / speed up implementation of character_length to unicode points #3049
  • Follow-up on Clickbench benchmark #3048
  • Why the PhysicalPlanner is an async trait ? #3032
  • Optimize file stream metrics. #3024
  • Proposal: Enable typed strings expressions for VALUES clause #3017
  • Proposal: Add date_bin function #3015
  • The upcoming release of Arrow (20?) breaks datafusion #3006
  • Can I select some files for query based on the filtering rules in the directory? #2993
  • Rename FormatReader to FileOpener #2990
  • Derive Hash trait for JoinType #2971
  • CAST from Utf8 to Boolean #2967
  • Add baseline_metrics for FileStream to record metrics like elapsed time, record output, etc #2961
  • Example to show how to convert query result into rust struct #2959
  • simplify not clause #2957
  • Implement Debug for ColumnarValue #2950
  • Parallel fetching of column chunks when reading parquet files #2949
  • Extension mechanism for SessionConfig #2939
  • Streaming CSV/JSON Object Store Read #2935
  • Support CSV Limit Pushdown to Object Storage #2930
  • Add support for pow scalar function #2926
  • Add support for exact median aggregate function #2925
  • Support mean as synonym for avg #2922
  • Rename a column name #2919
  • Move ScalarValue tests alongside implementation, move from_slice to core #2913
  • Fail gracefully if optimization rule fails #2908
  • Make ObjectStoreRegistry as a trait which can allow Ballista to introduce a self registry ObjectStoreRegistry #2905
  • Remove datafusion-data-access crate #2903
  • Improve formatting of logical plans containing subquery expressions #2898
  • Atan2 added to built-in functions #2897
  • The explain statements only print logical plans for debug/other purpose. #2894
  • JSON version of display_indent() #2889
  • It would be nice to have a way to generate unique IDs in optimizer rules #2886
  • Add support for TIME literal values #2883
  • Add h2o benchmark #2879
  • Implement from_unixtime function #2871
  • Add cast function for creating logical cast expression #2870
  • Release DataFusion 10.0.0 #2862
  • Implement information_schema.views #2857
  • Migrate from avro_rs to apache_avro #2783
  • Add optimizer rule to remove OFFSET 0 #2584
  • Preserve Element Name in ScalarValue::List #2450
  • Add EXISTS subquery support to Ballista #2338
  • Add documentation on supported functions to datafusion website #1487
  • documentations for datafusion-cli can be consolidated a bit more #1352
  • Optimizer: Predicate Rewrite pass for TPCH Q19 #217
  • feat: add optimize rule rewrite_disjunctive_predicate #2858 (xudong963)

Fixed bugs:

  • Regression in SQL support for ORDER BY and aliased expressions #3160
  • panic when deal with @ operator #3137
  • Incorrect type coercion rule for date + interval #3093
  • Cast string to timestamp crash while we input time before 1970 with floating number second #3082
  • INTEGER type does't work while importing csv #3059
  • Cannot GROUP BY Binary #3050
  • incorrect i32 coercion for to_timestamp #3046
  • Error pruning IsNull expressions: Column 'instance_null_count' is declared as non-nullable but contains null values #3042
  • I want to query some files in a directory. Is there any way? #3013
  • The expression to get an indexed field is only valid for List types (common_sub_expression_eliminate) #3002
  • Double to_timestamp_seconds produces abnormal result #2998
  • External parquet table fails when schema contains differing key / value metadata #2982
  • SELECT on column with uppercase column name fails with FieldNotFound error #2978
  • panic reading AWS-generated parquet file #2963
  • Can't filter rowgroup for parquet prune for some data type #2962
  • CI test is failing with final link failed: No space left on device #2947
  • bug: new ObjectStore breaks backward compatibility with contrib plugins #2931
  • bug: file types handled wrong #2929
  • bug: changing the number of partitions does not increase concurrency #2928
  • csv_explain fails on RC verifier #2916
  • index out of range error from datafusion_row::write::write_field #2910
  • Optimization rule CommonSubexprEliminate creates invalid projections #2907
  • serde_json requires that either std (default) or alloc feature is enabled #2896
  • Inconsistent type coercion rules with comparison expressions #2890
  • Doc Error: the test directory link 404 which is in CONTRIBUTING.md #2880
  • Round trips through ScalarValue's sometimes don't preserve types (e.g. change types from DictionaryArray) #2874
  • Error with CASE and DictionaryArrays: ArrowError(InvalidArgumentError("arguments need to have the same data type")) #2873
  • window functions not supported in expressions #2869
  • Unable to work with month intervals #2796
  • Discord invite link in communication page has expired #2743
  • Test (path normalization) failures while verifying release candidate 9.0.0 RC1 #2719
  • Reading parquet with (pre-release) arrow fails with "out of order projection is not supported" #2543
  • Fix SQL planner bug when resolving columns with same name as a relation #3003 [sql] (andygrove)
  • fix RowWriter index out of bounds error #2968 (comphead)
  • fix: support decimal statistic for row group prune #2966 (liukun4515)
  • Fix invalid projection in CommonSubexprEliminate #2915 (andygrove)

Documentation updates:

Performance improvements:

  • Use code points instead of grapheme clusters for string functions #3054 (Dandandan)

Closed issues:

  • Rename do_data_time_math() to do_date_time_math() #3172
  • Automatic version updates for github actions with dependabot #3106
  • [EPIC] Proposal for Date/Time enhancement #3100
  • Upgrade prost/tonic everywhere #3028
  • [Question] interested in helping with documentation #2866
  • Introducing a new optimizer framework for datafusion. #2633
  • Enable discussion tab? #2350
  • Add support for AVG(Timestamp) types #200
  • TPC-H Query 22 #175
  • TPC-H Query 21 #172
  • TPC-H Query 20 #171
  • TPC-H Query 17 #168
  • TPC-H Query 11 #163
  • TPC-H Query 4 #160
  • TPC-H Query 2 #159
  • [Datafusion] Optimize literal expression evaluation #106

Merged pull requests: