forked from NVIDIA/spark-rapids
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve GpuExpand by pre-projecting complicated expressions #24
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
Signed-off-by: Peixin Li <[email protected]>
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
…lure. (NVIDIA#10143) Fixes NVIDIA#10134. This commit fixes test failures in `test_window_aggs_for_batched_finite_row_windows_partitioned`, resulting from ambiguous ordering in the window function input. The failing tests partition by `a`, and order by `b,c`. When the values of `b,c` have repeated values, the results from the window function execution is indeterminate. This commit changes the definition of the aggregation column `c` (that's also included in the order-by clause), to use unique long values. This guarantees deterministic output. Signed-off-by: MithunR <[email protected]>
… machines (NVIDIA#10150) Signed-off-by: Jim Brennan <[email protected]>
) Signed-off-by: Jim Brennan <[email protected]>
…IA#10140) * Removed unnecessary whitespaces * Signing off Signed-off-by: Raza Jafri <[email protected]> --------- Signed-off-by: Raza Jafri <[email protected]>
…on-UTC time zones (NVIDIA#10100) Signed-off-by: Navin Kumar <[email protected]>
workaround for NVIDIA#10115, closes NVIDIA#10105 Signed-off-by: Gera Shegalov <[email protected]>
To fix issue: NVIDIA#10154 Install 'procps' to fix 'ps: command not found' in nvidia Rocky9 Docker containers, when runing integration tests with jenkins/spark-test.sh 'procps' is required for rocky docker containers to run spark standalone cluster, see: https://github.com/apache/spark/blob/v3.3.2/bin/load-spark-env.sh#L68 Signed-off-by: Tim Liu <[email protected]>
… files [databricks] (NVIDIA#10146) Signed-off-by: Robert (Bobby) Evans <[email protected]>
* Fixed Spark 334 build * Signing off Signed-off-by: Raza Jafri <[email protected]> * added 334 to noSnapshots * updated 2.13 pom --------- Signed-off-by: Raza Jafri <[email protected]>
* Support to_utc_timestamp Signed-off-by: Haoyang Li <[email protected]> * add tests Signed-off-by: Haoyang Li <[email protected]> * perf tests Signed-off-by: Haoyang Li <[email protected]> * claen up Signed-off-by: Haoyang Li <[email protected]> * claen up Signed-off-by: Haoyang Li <[email protected]> * address comment Signed-off-by: Haoyang Li <[email protected]> --------- Signed-off-by: Haoyang Li <[email protected]>
…9790) * Use format_float kernel Signed-off-by: Haoyang Li <[email protected]> * Add tests and doc Signed-off-by: Haoyang Li <[email protected]> * use new name from jni change Signed-off-by: Haoyang Li <[email protected]> * move inf/nan replacement to kernel Signed-off-by: Haoyang Li <[email protected]> * claen up Signed-off-by: Haoyang Li <[email protected]> * Address comments Signed-off-by: Haoyang Li <[email protected]> --------- Signed-off-by: Haoyang Li <[email protected]>
…ith GpuStructsToJson [databricks] (NVIDIA#9975) * upmerge * Revert change to csv_test Signed-off-by: Andy Grove <[email protected]> * scalastyle * introduce shim * remove unreachable code * fix some failures with 311 * save progress * fix more failures with 340 * fix test failures with 341 * tests pass with 341 * Add 330 shim and fix failures in test_basic_json_read * save progress on 330 shim * tests pass with 330 * 320 shim * test all date formats with from_json * remove redundant and confusing use of failOnInvalid parameter * Revert unrelated change * Remove comment * Remove blank line * Remove blank line * Revert accidental change to test_basic_json_read * Fix compilation error caused by refactor * Scala style * Scala style * update compatibility guide * move json-specific date parsing into GpuJsonScan to fix regression in CSV tests * fix regression introduced during refactor * fall back to CPU if timestampFormat specified in 320 shim * fall back to CPU if timestampFormat specified in 340 shim * fix ci failure with 341db * add 334 shim * add clarifying comment * update copyright years to 2024 * use None instead of empty string in tests * fix copyright years * remove xfail from tests * fix regression * Update integration_tests/src/main/python/json_test.py Co-authored-by: Jason Lowe <[email protected]> * Update integration_tests/src/main/python/json_test.py Co-authored-by: Jason Lowe <[email protected]> * fix regression * Revert "fix regression" This reverts commit 4b183a4. * update more tests to use None instead of empty string * allow fallback for non-utc in test_json_read_generated_dates * update more tests to use None instead of empty string --------- Signed-off-by: Andy Grove <[email protected]> Co-authored-by: Jason Lowe <[email protected]>
…10171) Signed-off-by: Navin Kumar <[email protected]>
* check shim exists in all.buildversions * Signing off Signed-off-by: Raza Jafri <[email protected]> * addressed review comments * return the shim version from the map method --------- Signed-off-by: Raza Jafri <[email protected]>
…stamp column (NVIDIA#10074) * Update the legacy mode check: only take effect when reading date/timestamp column Signed-off-by: Chong Gao <[email protected]>
* Fix format_number NaN symbol in high jdk version Signed-off-by: Haoyang Li <[email protected]> * Avoid using mutlitple bytes character Signed-off-by: Haoyang Li <[email protected]> --------- Signed-off-by: Haoyang Li <[email protected]>
…10189) Signed-off-by: Chong Gao <[email protected]>
* fix regression against Spark 3.2.x * add comments * Revert a change * signoff Signed-off-by: Andy Grove <[email protected]> --------- Signed-off-by: Andy Grove <[email protected]>
…IA#10131) * With a single row GpuExplode tries to split the generator array Signed-off-by: Alessandro Bellina <[email protected]> --------- Signed-off-by: Alessandro Bellina <[email protected]>
…IDIA#10191) * Updating the config documentation for filecache configs Signed-off-by: Chong Gao <[email protected]> * Update --------- Signed-off-by: Chong Gao <[email protected]> Co-authored-by: Chong Gao <[email protected]>
…andling (NVIDIA#10193) * Fixes bug where on split and retry the carry along columns were corrupted --------- Signed-off-by: Alessandro Bellina <[email protected]>
* Update string to float compatibility doc[skip ci] Signed-off-by: Haoyang Li <[email protected]> * Address comment Signed-off-by: Haoyang Li <[email protected]> --------- Signed-off-by: Haoyang Li <[email protected]>
Signed-off-by: Firestarman <[email protected]>
firestarman
force-pushed
the
improve-expand
branch
from
January 17, 2024 06:49
926724b
to
74810e0
Compare
need to rebase the target branch |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Some Spark platforms will put non-leaf expressions in GpuExpand projections, then it
can not leverage the GPU tiered projection across the projection lists.
This PR tries to factor out these complicated expressions and place them into an existing
child
GpuProject
or a new one to avoid duplicate evaluation for semantic (sub)expressions.