21.0.0 (2023-03-24)
Breaking changes:
- Support arbitrary user defined partition column in
ListingTable
(rather than assuming they are always Dictionary encoded) #5545 (crepererum) - Use TableReference for TableScan #5615 (alamb)
- Update the type of
param_values
to&[ScalarValue]
in functionreplace_params_with_values
#5640 (HaoYang670)
Implemented enhancements:
- feat: extract (epoch from col) #5555 (Weijun-H)
- INSERT INTO support for MemTable #5520 (metesynnada)
- Memory limited nested-loop join #5564 (korowa)
- Timestamp subtraction and interval operations for
ScalarValue
#5603 (berkaysynnada) - Substrait: Add cast expression with bool, integers and decimal128 support #5137 (nseekhao)
- Support
date_bin
with 2 arguments #5643 (Weijun-H) - improve: support combining multiple grouping expressions #5559 (yukkit)
- Substrait: Add support for WindowFunction #5653 (nseekhao)
- feat:
date_bin
supports MonthDayNano, microsecond and nanosecond units #5698 (stuartcarnie) - Handle serialization of TryCast #5692 (thinkharderdev)
Fixed bugs:
- fix: failed to execute sql with subquery #5542 (MichaelScofield)
- fix: cast literal to timestamp #5517 (Weijun-H)
- fix dataframe only boolean/binary column got error on describe #5585 (jiangzhx)
- Median returns null on empty input instead of error #5624 (toppyy)
- add CountWildcardRule to fix error on Count(Expr:Wildcard) with DataFrame API #5627 (jiangzhx)
- fix: correct CountWildcardRule and move analyzer into a new directory. #5671 (jackwener)
Documentation updates:
- Minor: improve docstrings for
ObjectStoreRegistry
andObjectStoreProvider
#5577 (alamb) - Clarify differences of DataFusion with other systems in README.md #5578 (alamb)
- Minor: Document docs build process #5687 (alamb)
Merged pull requests:
- Refactor DecorrelateWhereExists and add back Distinct if needs #5345 (ygf11)
- Simplify simplify test cases, support
^
,&
,|
,<<
and>>
operators for building exprs #5511 (alamb) - minor: improve sqllogictest docs #5553 (alamb)
- Remove unused dependencies found by cargo-machete #5552 (Jefffrey)
- make AggregateStatistics return the same result whether optimizer disabled or enabled #5485 (jiangzhx)
- Avoid circular(ish) dependency parquet-test-utils on datafusion, try 2 #5536 (alamb)
- Enforce ambiguity check whilst normalizing columns #5509 (Jefffrey)
- Generated changelog for 20.0.0 #5563 (andygrove)
- fix: failed to execute sql with subquery #5542 (MichaelScofield)
- Revert describe count() workaround #5556 (Jefffrey)
- fix: cast literal to timestamp #5517 (Weijun-H)
- feat: extract (epoch from col) #5555 (Weijun-H)
- Minor: improve docstrings for
ObjectStoreRegistry
andObjectStoreProvider
#5577 (alamb) - Minor: Move RuntimeEnv to
datafusion_execution
#5580 (alamb) - INSERT INTO support for MemTable #5520 (metesynnada)
- Minor: restore explicit match to help avoid subtle bugs in the future when new
Expr
variants are added #5579 (alamb) - refactor: add more error info when array is empty #5560 (Weijun-H)
- Memory limited nested-loop join #5564 (korowa)
- Support catalog.schema.table.column in SQL SELECT and WHERE #5343 (Jefffrey)
- Minor: clean up aggregates.slt tests #5599 (alamb)
- Minor: Port more aggregate tests to sqllogictests #5574 (alamb)
- Add a utility function to get all of the PartitionedFile for an ExecutionPlan #5572 (yahoNanJing)
- minor: port some join tests to sqllogictests #5567 (ygf11)
- Support arbitrary user defined partition column in
ListingTable
(rather than assuming they are always Dictionary encoded) #5545 (crepererum) - feat: add the similar optimization function for bitwise negative #5516 (izveigor)
- Clarify differences of DataFusion with other systems in README.md #5578 (alamb)
- Minor: Add more documentation about table_partition_columns #5576 (alamb)
- Add Analyzer phase to DataFusion , add basic validation logic to Subquery Plans and Expressions #5570 (mingmwang)
- Use TableReference for TableScan #5615 (alamb)
- Preserve casts in rewrite_sort_cols_by_aggs #5611 (mpurins-coralogix)
- Miscellaneous ArrayData Cleanup #5612 (tustvold)
- Update substrait requirement from 0.4 to 0.5 #5620 (dependabot[bot])
- Do not break pipeline for window queries with GROUPS #5587 (mustafasrepo)
- fix dataframe only boolean/binary column got error on describe #5585 (jiangzhx)
- Minor: Add Documentation and Examples to
TableReference
#5616 (alamb) - [FOLLOWUP] eliminate the duplicated sort keys in Order By clause #5607 (mingmwang)
- Update default behaviour of compression algorithms (support multistreams) #5629 (metesynnada)
- Timestamp subtraction and interval operations for
ScalarValue
#5603 (berkaysynnada) - Use modulus dyn kernels for arithmetic expressions #5634 (viirya)
- Minor: reduce cloning in
infer_placeholder_types
#5638 (alamb) - Move
SessionConfig
todatafusion_execution
#5581 (alamb) - Update the type of
param_values
to&[ScalarValue]
in functionreplace_params_with_values
#5640 (HaoYang670) - WITH ORDER support on CREATE EXTERNAL TABLE #5618 (metesynnada)
- Median returns null on empty input instead of error #5624 (toppyy)
- feat: Memory limited merge join #5632 (korowa)
- Update rstest requirement from 0.16.0 to 0.17.0 #5648 (dependabot[bot])
- add CountWildcardRule to fix error on Count(Expr:Wildcard) with DataFrame API #5627 (jiangzhx)
- Add OuterReferenceColumn to Expr to represent correlated expression #5593 (mingmwang)
- Minor: Simplify
Result<T, DataFusionError>
#5659 (comphead) - minor: remove redundant
DataFusionError
and fixclippy
#5669 (jackwener) - Substrait: Add cast expression with bool, integers and decimal128 support #5137 (nseekhao)
- Support
date_bin
with 2 arguments #5643 (Weijun-H) - Add LogicalPlanSignature and use in the optimizer loop #5623 (mslapek)
- fix: correct CountWildcardRule and move analyzer into a new directory. #5671 (jackwener)
- refactoring: added tests and fixed comments in "math_expressions" #5656 (izveigor)
- improve: support combining multiple grouping expressions #5559 (yukkit)
- community: polish issue template #5668 (jackwener)
- minor: correct issue template #5679 (jackwener)
- Change ObjectStoreRegistry from struct to trait to provide polymorphism #5543 (yahoNanJing)
- Minor: Add
Extensions::new()
#5676 (alamb) - minor: add with_plan for Subquery #5680 (jackwener)
- minor: reduce replication in
date_bin
implementation #5673 (alamb) - Fixes #5500 - Add a GitHub Actions workflow that builds the docs #5670 (martin-g)
- Minor: port some content to the docs #5684 (alamb)
- Docs: Add logo back to sidebar #5688 (alamb)
- Substrait: Add support for WindowFunction #5653 (nseekhao)
- Add -o option to all e2e benches #5658 (jaylmiller)
- create table default to null #5606 (Weijun-H)
- Minor: Document docs build process #5687 (alamb)
- Minor: change doc formatting to force a republish #5702 (alamb)
- Move
TaskContext
to datafusion-execution #5677 (alamb) - feat:
date_bin
supports MonthDayNano, microsecond and nanosecond units #5698 (stuartcarnie) - Return plan error when adding utf8 and timestamp #5696 (Weijun-H)
- Handle serialization of TryCast #5692 (thinkharderdev)
- analyzer: move InlineTableScan into Analyzer. #5683 (jackwener)
- minor: Add doc comments to clarify what Analyzer is for #5705 (alamb)