dev/changelog/7.0.0.md
Breaking changes:
batch_size #1565ExecutionPlan to know about sortedness and repartitioning optimizer pass respect the invariants #1776 (alamb)arrow 8.0.0 #1673 (alamb)DataFusionError::into_arrow_external_error in favor of From conversion #1645 (alamb)Accumulator::update and Accumulator::merge #1582 (Jimexist)Hash for various types and replace PartialOrd #1580 (Jimexist)DatafusionError with GenericError in ObjectStore interface #1541 (matthewmturner)FLOAT SQL type map to Float32 rather than Float64 #1423 [sql] (liukun4515)REAL SQL type to Float32 rather than Float64 to be consistent with pg #1390 [sql] (hntd187)Implemented enhancements:
datafusion_expr crate #1753datafusion_common crate #1752DFSchema #1725Expr::ScalarFunction programatically #1718Vec<u8> based row-wise representation for DataFusion #1708ListingTable #1705DataFusionError::into_arrow_external_error in favor of From conversion #1644eq_dyn_scalar, etc kernels #1610Accumulator::update and Accumulator::merge #1549approx_quantile support #1538new(state: Arc<Mutex<ExecutionContextState>>) method #1439boolean == boolean and boolean != boolean operators #1159MemoryStream public #150approx_median() aggregate function #1729 (realno)corr aggregate function #1561 (realno)covar, covar_pop and covar_samp aggregate functions #1551 (realno)approx_quantile() aggregation function #1539 (domodwyer)stddev and variance #1525 (realno)rem operation for Expr #1467 (liukun4515)ORDER BY on unprojected columns #1415 (viirya)min and max aggregate #1407 (liukun4515)ConstantFolding and SimplifyExpression #1375 (alamb)array_agg aggregate function #1300 (viirya)=, <, <=, >, >=, !=, is distinct from, is not distinct from for BooleanArray #1163 (alamb)Fixed bugs:
Int64 to Float64 unsuccessfully caused tpch8 to fail #1576Plan("No field named 'foo.x'. Valid fields are 'MIN(foo.x)'.") #1479f.c1 cannot be named in SQL query #1432Select * returns an unexpected result #1412RecordBatch, add SortPreservingMerge fuzz tester #1678 (alamb). in them #1449 [sql] (alamb)send_time metric for hash-repartition #1421 (Dandandan)Documentation updates:
Accumulator::update and Accumulator::update_batch #1542 (alamb)cargo run --example parquet_sql #1482 (sergey-melnychuk)Performance improvements:
IS NULL #1591Closed issues:
release compile to CI #1728pyarrow feature in CI #1635SortPreservingMergeStream (which has quite good tests of what is often quite tricky code, and it will be performance critical) #1572SortExec code (so there is only a single sort operator that does in memory sorting if it has enough memory budget but then spills to disk if needed). #1571LogicalPlan::Values #1170IntoIterator<Item = Expr> in logical plan builder window fn #372Merged pull requests:
Row format backed by raw bytes #1782 (yjshen)Accumulator and ColumnarValue to datafusion-expr #1765 (Jimexist)Expr to datafusion-expr module #1762 [sql] (Jimexist)cargo check --release to ci #1737 (xudong963)DFSchema #1726 (alamb)cargo run --release error #1723 (xudong963)parking_lot::Mutex for std::sync::Mutex #1720 (xudong963)select_to_plan clearer #1714 [sql] (xudong963)signature #1713 (HaoYang670)create_physical_expr and ExecutionContextState or DefaultPhysicalPlanner for faster speed #1700 (alamb)MemTrackingMetrics to ease memory tracking for non-limited memory consumers #1691 (yjshen)info! to debug! #1689 (alamb)SortPreservingMergeStream stable on input stream order #1687 (alamb)information_schema tests out of execution/context.rs to sql_integration tests #1684 (alamb)Gauge + CurrentMemoryUsage to metrics #1682 (yjshen)update and merge #1681 (Jimexist)String in DiskManager #1680 (alamb)MemoryManager and DiskManager #1668 (alamb)MemoryManager and MemoryStream public #1664 (yjshen)AggregatedMetricsSet to metrics for further reuse #1663 (yjshen)DataFusionError -> ArrowError conversion #1643 (alamb)spill_count and spilled_bytes to BaselineMetrics, test sort with spill #1641 (yjshen)SortPreservingMergeStream to avoid SortKeyCursor sharing #1624 (yjshen)binary_rule.rs module #1607 (alamb)mod to sql_integration #1575 (alamb)batch_size configuration in ExecutionConfig, RuntimeConfig and PhysicalPlanConfig #1562 (yjshen)update and merge implementations from Aggregates and supporting ScalarValue arithmetic #1550 (alamb)predicate_builder --> pruning_predicate for consistency #1481 (alamb)simplify and Simplifier #1401 (alamb)simplify and Simplifer consistent #1376 (alamb)simplify_expression.rs #1374 (alamb)BufReader for LocalFileReader to revert performance regression in parquet reading #1366 (Dandandan)EmptyRelation, Limit, Values from LogicalPlan #1325 (liukun4515)