Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move SMJ filtered Right outer join out of join_partial phase #310

Closed
wants to merge 29 commits into from

Conversation

comphead
Copy link
Owner

Which issue does this PR close?

Related apache#12359.
Followup on apache#12764

Rationale for this change

The same as apache#12764 but for Right Outer Join

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

jcsherin and others added 28 commits October 18, 2024 07:26
… function (apache#12857)

* Move `lead-lag` to `functions-window` package

* Builds with warnings

* Adds `PartitionEvaluatorArgs`

* Extracts `shift_offset` from input expressions

* Computes shift offset

* Get default value from input expression

* Implements `partition_evaluator`

* Fixes compiler warnings

* Comments out failing tests

* Fixes `cargo test` errors and warnings

* Minor: taplo formatting

* Delete code

* Define `lead`, `lag` user-defined window functions

* Fixes `cargo build` errors

* Export udwf and expression public APIs

* Mark result field as nullable

* Delete `return_type` tests for `lead` and `lag`

* Disables test: window function case insensitive

* Fixes: lowercase name in logical plan

* Reverts to old methods for computing `shift_offset`, `default_value`

* Implements expression reversal

* Fixes: lowercase name in logical plans

* Fixes: doc test compilation errors
Fixes: doc test build errors

* Temporarily quite clippy errors

* Fixes proto defintion

* Minor: fixes formatting

* Fixes: doc tests

* Uses macro for defining `lag_udwf()` and `leag_udwf()`

* Fixes: window fuzz test cases

* Copies doc comments verbatim from `BuiltInWindowFunction` enum

* Deletes from window function case insensitive test

* Deletes `BuiltInWindowFunction` expression APIs

* Delete from `create_built_in_window_expr`

* Deletes proto serialization

* Delete from `BuiltInWindowFunction` enum

* Deletes test for finding built-in window function

* Fixes build errors + deletes redundant code

* Deletes more code

* Delete unnecessary structs

* Refactors shift offset computation

* Passes range unit test

* Fixes: clippy::get-first error

* Rewrite unit tests for WindowUDF

* Fixes: unit test for lag with default value

* Consistent input expressions and data types in unit tests

* Minor: fixes formatting

* Restore original helper method for unit tests

* Revert "Refactors shift offset computation"

This reverts commit 000ceb7.

* Moves helper functions into `functions-window-common` package

* Uses common helper functions in `{lead, lag}`

* Minor: formatting

* Revert "Moves helper functions into `functions-window-common` package"

This reverts commit ab8a83c.

* Moves common functions to utils

* Minor: formatting fixes

* Update lowercase names in explain output

* Adds doc for `lead()` and `lag()` expression functions

* Add doc for `WindowShiftKind::shift_offset`

* Remove `arrow` dev dependency

* Minor: formatting

* Update inner doc comment

* Serialize 1 or more window function arguments

* Adds logical plan roundtrip test cases

* Refactor: readability of unit tests

* Minor: rename variable bindings

* Minor: copy edit

* Revert "Remove `arrow` dev dependency"

This reverts commit 3eb0985.

* Move null argument handling helper to utils

* Disable failing sqllogic tests for handling NULL input

* Revert "Disable failing sqllogic tests for handling NULL input"

This reverts commit 270a203.

* Fixes: incorrect NULL handling in `lead`/`lag` window function

* Adds more tests cases

---------

Co-authored-by: Andrew Lamb <[email protected]>
…12969)

* Add BatchSplitter to joins that do not respect batch size

* Group relevant imports

* Update configs.md

* Update SQL logic tests for config

* Review

* Use PrimitiveBuilder for PrimitiveArray concatenation

* Fix into_builder() bug

* Apply suggestions from code review

Co-authored-by: Andrew Lamb <[email protected]>

* Update config docs

* Format

* Update config SQL Logic Test

---------

Co-authored-by: Mehmet Ozan Kabak <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
* swap_hash_join works with joins with projections

* use non swapped hash join's projection

* clean up

* fix hashjoin projection swap.

* assert hashjoinexec.

* Update datafusion/core/src/physical_optimizer/join_selection.rs

Co-authored-by: Eduard Karacharov <[email protected]>

* fix clippy.

---------

Co-authored-by: Onur Satici <[email protected]>
Co-authored-by: Eduard Karacharov <[email protected]>
* feat(substrait): add wildcard expand rule in producer

* add comment describing need for ExpandWildcardRule
…Semi (apache#12764)

* WIP: move filtered join out of join_output stage

* WIP: move filtered join out of join_output stage

* WIP: move filtered join out of join_output stage

* cleanup

* cleanup

* Move Left/LeftAnti filtered SMJ join out of join partial stage

* Move Left/LeftAnti filtered SMJ join out of join partial stage

* Address comments
* Implement regexp_ccount

* Update document

* fix check

* add more tests

* Update the world to 1.80

* Fix doc format

* Add null tests

* Add uft8 support and bench

* Refactoring regexp_count

* Refactoring regexp_count

* Revert ci change

* Fix ci

* Updates for documentation, minor improvements.

* Updates for documentation, minor improvements.

* updates to fix scalar tests, doc updates.

* updated regex and string features to remove deps on other features.

---------

Co-authored-by: Xin Li <[email protected]>
* regexp_match

* update generated docs

---------

Co-authored-by: Andrew Lamb <[email protected]>
…che#12950)

* Add a benchmark for physical plan creation with many aggregates

* Wrap AggregateFunctionExpr with Arc

Patch f5c47fa removed Arc wrappers for AggregateFunctionExpr.
But, it can be inefficient. When physical optimizer decides to replace a node child to other,
it clones the node (with `with_new_children`). Assume, that node is `AggregateExec` than contains
hundreds aggregates and these aggregates are cloned each time.

This patch returns a Arc wrapping to not clone AggregateFunctionExpr itself but clone a pointer.

* Do not build mapping if parent does not require any

This patch adds a small optimization that can soft the edges on
some queries. If there are no parent requirements we do not need to
build column mapping.
* Remove logical cross join in planning

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* Implement some more substrait pieces

* Update datafusion/core/src/physical_planner.rs

Co-authored-by: Oleks V <[email protected]>

* Remove incorrect comment

---------

Co-authored-by: Oleks V <[email protected]>
* refactor

* refactor unnest options

* more test

* resolve comments

* add back doc

* fix proto

* flaky test

* clippy

* use indexmap

* chore: compile err

* chore: update cargo

* chore: fmt cargotoml

---------

Co-authored-by: Andrew Lamb <[email protected]>
* Update version to 42.1.0, add CHANGELOG (apache#12986)

* CHANGELOG for 42.1.0

* Update version to 42.1.0

* Update datafusion-cli/Cargo.lock

* update config docs

* update datafusion-cli
…an (#44) (apache#12979)

* Don't preserve functional dependency when generating UNION logical plan

* Remove extra lines
* Decorrelate more predicate subqueries

* Added sqllogictest explain tests
* Added default value to 'with_standard_argument'

* small fix

* change function

* small changes

* with_argument change

* ran build

* small fix
* fix: Dialect requires table alias (#46)

* fix: Add Dialect option for requiring table aliases

* feat: Add CustomDialectBuilder for requires_table_alias

* docs: Spelling

* refactor: rename requires_derived_table_alias

* refactor: rename requires_derived_table_alias

* review: Rewrite match to if, add another test case

* test: Update RHS expected

* test: Update tests with more cases
* Initial commit

* Fix formatting, minor changes

* Minor changes

* Move test to fuzz tests

* Add comment to test
…to use arrow unary/binary for performance (apache#12972)

* removed last uses of make_function_scalar_inputs

* delete make_function_scalar_inputs

* fix

* refactored other macros

* fix unary CI

* fix base f32/f64 mismatch not caught by tests

* import order changes

* Update log.rs

* stylistic changes

---------

Co-authored-by: berkaysynnada <[email protected]>
* Extract CSE logic

* address review comments, move `HashNode` to `datafusion_common::cse`, shorter names for eliminator and controller, change `CSE::extract_common_nodes()` to return `Result<FoundCommonNodes<N>>` (instead of `Result<Transformed<FoundCommonNodes<N>>>`)
This adds a type check when planning unary plus operator. Since we
currently do not represent the operator in our logical plan we can not
check it later. Instead of introducing a new `Expr` this patch just
verifies the type during the translation instead.
* Test Count accumulator with all-nulls

* Fix count on null values

Before the change, the `ValuesExec` containing `NullArray` would
incorrectly report column statistics as being non-null, which would
misinform `AggregateStatistics` optimizer and fold `count(always_null)`
into row count instead of 0.

This commit fixes the column statistics derivation for values with
`NullArray` and therefore fixes execution of logical plans with count
over such values.

Note that the bug was not reproducible using DataFusion SQL frontend,
because in DataFusion SQL the `VALUES (NULL)` doesn't have type
`DataType:Null` (it has some apparently arbitrarily picked type
instead).

As a follow-up, all usages of `Array:null_count` should be inspected.
The function can easily be misused (it returns "physical nulls", which
do not exist for null type).
@github-actions github-actions bot added documentation Improvements or additions to documentation sql labels Oct 22, 2024
@comphead comphead closed this Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.