forked from apache/datafusion
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move SMJ filtered Right outer join out of join_partial phase #310
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
… function (apache#12857) * Move `lead-lag` to `functions-window` package * Builds with warnings * Adds `PartitionEvaluatorArgs` * Extracts `shift_offset` from input expressions * Computes shift offset * Get default value from input expression * Implements `partition_evaluator` * Fixes compiler warnings * Comments out failing tests * Fixes `cargo test` errors and warnings * Minor: taplo formatting * Delete code * Define `lead`, `lag` user-defined window functions * Fixes `cargo build` errors * Export udwf and expression public APIs * Mark result field as nullable * Delete `return_type` tests for `lead` and `lag` * Disables test: window function case insensitive * Fixes: lowercase name in logical plan * Reverts to old methods for computing `shift_offset`, `default_value` * Implements expression reversal * Fixes: lowercase name in logical plans * Fixes: doc test compilation errors Fixes: doc test build errors * Temporarily quite clippy errors * Fixes proto defintion * Minor: fixes formatting * Fixes: doc tests * Uses macro for defining `lag_udwf()` and `leag_udwf()` * Fixes: window fuzz test cases * Copies doc comments verbatim from `BuiltInWindowFunction` enum * Deletes from window function case insensitive test * Deletes `BuiltInWindowFunction` expression APIs * Delete from `create_built_in_window_expr` * Deletes proto serialization * Delete from `BuiltInWindowFunction` enum * Deletes test for finding built-in window function * Fixes build errors + deletes redundant code * Deletes more code * Delete unnecessary structs * Refactors shift offset computation * Passes range unit test * Fixes: clippy::get-first error * Rewrite unit tests for WindowUDF * Fixes: unit test for lag with default value * Consistent input expressions and data types in unit tests * Minor: fixes formatting * Restore original helper method for unit tests * Revert "Refactors shift offset computation" This reverts commit 000ceb7. * Moves helper functions into `functions-window-common` package * Uses common helper functions in `{lead, lag}` * Minor: formatting * Revert "Moves helper functions into `functions-window-common` package" This reverts commit ab8a83c. * Moves common functions to utils * Minor: formatting fixes * Update lowercase names in explain output * Adds doc for `lead()` and `lag()` expression functions * Add doc for `WindowShiftKind::shift_offset` * Remove `arrow` dev dependency * Minor: formatting * Update inner doc comment * Serialize 1 or more window function arguments * Adds logical plan roundtrip test cases * Refactor: readability of unit tests * Minor: rename variable bindings * Minor: copy edit * Revert "Remove `arrow` dev dependency" This reverts commit 3eb0985. * Move null argument handling helper to utils * Disable failing sqllogic tests for handling NULL input * Revert "Disable failing sqllogic tests for handling NULL input" This reverts commit 270a203. * Fixes: incorrect NULL handling in `lead`/`lag` window function * Adds more tests cases --------- Co-authored-by: Andrew Lamb <[email protected]>
…12969) * Add BatchSplitter to joins that do not respect batch size * Group relevant imports * Update configs.md * Update SQL logic tests for config * Review * Use PrimitiveBuilder for PrimitiveArray concatenation * Fix into_builder() bug * Apply suggestions from code review Co-authored-by: Andrew Lamb <[email protected]> * Update config docs * Format * Update config SQL Logic Test --------- Co-authored-by: Mehmet Ozan Kabak <[email protected]> Co-authored-by: Andrew Lamb <[email protected]>
* swap_hash_join works with joins with projections * use non swapped hash join's projection * clean up * fix hashjoin projection swap. * assert hashjoinexec. * Update datafusion/core/src/physical_optimizer/join_selection.rs Co-authored-by: Eduard Karacharov <[email protected]> * fix clippy. --------- Co-authored-by: Onur Satici <[email protected]> Co-authored-by: Eduard Karacharov <[email protected]>
* feat(substrait): add wildcard expand rule in producer * add comment describing need for ExpandWildcardRule
…Semi (apache#12764) * WIP: move filtered join out of join_output stage * WIP: move filtered join out of join_output stage * WIP: move filtered join out of join_output stage * cleanup * cleanup * Move Left/LeftAnti filtered SMJ join out of join partial stage * Move Left/LeftAnti filtered SMJ join out of join partial stage * Address comments
* Implement regexp_ccount * Update document * fix check * add more tests * Update the world to 1.80 * Fix doc format * Add null tests * Add uft8 support and bench * Refactoring regexp_count * Refactoring regexp_count * Revert ci change * Fix ci * Updates for documentation, minor improvements. * Updates for documentation, minor improvements. * updates to fix scalar tests, doc updates. * updated regex and string features to remove deps on other features. --------- Co-authored-by: Xin Li <[email protected]>
* regexp_match * update generated docs --------- Co-authored-by: Andrew Lamb <[email protected]>
…che#12950) * Add a benchmark for physical plan creation with many aggregates * Wrap AggregateFunctionExpr with Arc Patch f5c47fa removed Arc wrappers for AggregateFunctionExpr. But, it can be inefficient. When physical optimizer decides to replace a node child to other, it clones the node (with `with_new_children`). Assume, that node is `AggregateExec` than contains hundreds aggregates and these aggregates are cloned each time. This patch returns a Arc wrapping to not clone AggregateFunctionExpr itself but clone a pointer. * Do not build mapping if parent does not require any This patch adds a small optimization that can soft the edges on some queries. If there are no parent requirements we do not need to build column mapping.
* Remove logical cross join in planning * WIP * WIP * WIP * WIP * WIP * WIP * WIP * WIP * WIP * WIP * WIP * WIP * WIP * WIP * WIP * WIP * WIP * Implement some more substrait pieces * Update datafusion/core/src/physical_planner.rs Co-authored-by: Oleks V <[email protected]> * Remove incorrect comment --------- Co-authored-by: Oleks V <[email protected]>
* refactor * refactor unnest options * more test * resolve comments * add back doc * fix proto * flaky test * clippy * use indexmap * chore: compile err * chore: update cargo * chore: fmt cargotoml --------- Co-authored-by: Andrew Lamb <[email protected]>
* Update version to 42.1.0, add CHANGELOG (apache#12986) * CHANGELOG for 42.1.0 * Update version to 42.1.0 * Update datafusion-cli/Cargo.lock * update config docs * update datafusion-cli
…an (#44) (apache#12979) * Don't preserve functional dependency when generating UNION logical plan * Remove extra lines
* Decorrelate more predicate subqueries * Added sqllogictest explain tests
* Added default value to 'with_standard_argument' * small fix * change function * small changes * with_argument change * ran build * small fix
* fix: Dialect requires table alias (#46) * fix: Add Dialect option for requiring table aliases * feat: Add CustomDialectBuilder for requires_table_alias * docs: Spelling * refactor: rename requires_derived_table_alias * refactor: rename requires_derived_table_alias * review: Rewrite match to if, add another test case * test: Update RHS expected * test: Update tests with more cases
* Initial commit * Fix formatting, minor changes * Minor changes * Move test to fuzz tests * Add comment to test
…to use arrow unary/binary for performance (apache#12972) * removed last uses of make_function_scalar_inputs * delete make_function_scalar_inputs * fix * refactored other macros * fix unary CI * fix base f32/f64 mismatch not caught by tests * import order changes * Update log.rs * stylistic changes --------- Co-authored-by: berkaysynnada <[email protected]>
* Extract CSE logic * address review comments, move `HashNode` to `datafusion_common::cse`, shorter names for eliminator and controller, change `CSE::extract_common_nodes()` to return `Result<FoundCommonNodes<N>>` (instead of `Result<Transformed<FoundCommonNodes<N>>>`)
This adds a type check when planning unary plus operator. Since we currently do not represent the operator in our logical plan we can not check it later. Instead of introducing a new `Expr` this patch just verifies the type during the translation instead.
* Test Count accumulator with all-nulls * Fix count on null values Before the change, the `ValuesExec` containing `NullArray` would incorrectly report column statistics as being non-null, which would misinform `AggregateStatistics` optimizer and fold `count(always_null)` into row count instead of 0. This commit fixes the column statistics derivation for values with `NullArray` and therefore fixes execution of logical plans with count over such values. Note that the bug was not reproducible using DataFusion SQL frontend, because in DataFusion SQL the `VALUES (NULL)` doesn't have type `DataType:Null` (it has some apparently arbitrarily picked type instead). As a follow-up, all usages of `Array:null_count` should be inspected. The function can easily be misused (it returns "physical nulls", which do not exist for null type).
github-actions
bot
added
documentation
Improvements or additions to documentation
sql
labels
Oct 22, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
common
core
development-process
documentation
Improvements or additions to documentation
execution
functions
logical-expr
optimizer
physical-expr
proto
sql
sqllogictest
substrait
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Related apache#12359.
Followup on apache#12764
Rationale for this change
The same as apache#12764 but for Right Outer Join
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?