Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve GpuExpand by pre-projecting complicated expressions #24

Closed
wants to merge 207 commits into from

Conversation

firestarman
Copy link
Owner

Some Spark platforms will put non-leaf expressions in GpuExpand projections, then it
can not leverage the GPU tiered projection across the projection lists.

This PR tries to factor out these complicated expressions and place them into an existing
child GpuProject or a new one to avoid duplicate evaluation for semantic (sub)
expressions.

nvauto and others added 30 commits November 14, 2023 14:14
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
mythrocks and others added 24 commits January 3, 2024 22:34
…lure. (NVIDIA#10143)

Fixes NVIDIA#10134.

This commit fixes test failures in `test_window_aggs_for_batched_finite_row_windows_partitioned`, resulting from ambiguous
ordering in the window function input.
The failing tests partition by `a`, and order by `b,c`.  When the values of `b,c` have repeated values, the results from
the window function execution is indeterminate.
This commit changes the definition of the aggregation column `c` (that's also included in the order-by clause), to use unique
long values.  This guarantees deterministic output.

Signed-off-by: MithunR <[email protected]>
…IA#10140)

* Removed unnecessary whitespaces

* Signing off

Signed-off-by: Raza Jafri <[email protected]>

---------

Signed-off-by: Raza Jafri <[email protected]>
workaround for NVIDIA#10115, closes NVIDIA#10105

Signed-off-by: Gera Shegalov <[email protected]>
To fix issue: NVIDIA#10154

Install 'procps' to fix 'ps: command not found' in nvidia Rocky9 Docker containers,

when runing integration tests with jenkins/spark-test.sh

'procps' is required for rocky docker containers to run spark standalone cluster, see:

    https://github.com/apache/spark/blob/v3.3.2/bin/load-spark-env.sh#L68

Signed-off-by: Tim Liu <[email protected]>
* Fixed Spark 334 build

* Signing off

Signed-off-by: Raza Jafri <[email protected]>

* added 334 to noSnapshots

* updated 2.13 pom

---------

Signed-off-by: Raza Jafri <[email protected]>
* Support to_utc_timestamp

Signed-off-by: Haoyang Li <[email protected]>

* add tests

Signed-off-by: Haoyang Li <[email protected]>

* perf tests

Signed-off-by: Haoyang Li <[email protected]>

* claen up

Signed-off-by: Haoyang Li <[email protected]>

* claen up

Signed-off-by: Haoyang Li <[email protected]>

* address comment

Signed-off-by: Haoyang Li <[email protected]>

---------

Signed-off-by: Haoyang Li <[email protected]>
…9790)

* Use format_float kernel

Signed-off-by: Haoyang Li <[email protected]>

* Add tests and doc

Signed-off-by: Haoyang Li <[email protected]>

* use new name from jni change

Signed-off-by: Haoyang Li <[email protected]>

* move inf/nan replacement to kernel

Signed-off-by: Haoyang Li <[email protected]>

* claen up

Signed-off-by: Haoyang Li <[email protected]>

* Address comments

Signed-off-by: Haoyang Li <[email protected]>

---------

Signed-off-by: Haoyang Li <[email protected]>
…ith GpuStructsToJson [databricks] (NVIDIA#9975)

* upmerge

* Revert change to csv_test

Signed-off-by: Andy Grove <[email protected]>

* scalastyle

* introduce shim

* remove unreachable code

* fix some failures with 311

* save progress

* fix more failures with 340

* fix test failures with 341

* tests pass with 341

* Add 330 shim and fix failures in test_basic_json_read

* save progress on 330 shim

* tests pass with 330

* 320 shim

* test all date formats with from_json

* remove redundant and confusing use of failOnInvalid parameter

* Revert unrelated change

* Remove comment

* Remove blank line

* Remove blank line

* Revert accidental change to test_basic_json_read

* Fix compilation error caused by refactor

* Scala style

* Scala style

* update compatibility guide

* move json-specific date parsing into GpuJsonScan to fix regression in CSV tests

* fix regression introduced during refactor

* fall back to CPU if timestampFormat specified in 320 shim

* fall back to CPU if timestampFormat specified in 340 shim

* fix ci failure with 341db

* add 334 shim

* add clarifying comment

* update copyright years to 2024

* use None instead of empty string in tests

* fix copyright years

* remove xfail from tests

* fix regression

* Update integration_tests/src/main/python/json_test.py

Co-authored-by: Jason Lowe <[email protected]>

* Update integration_tests/src/main/python/json_test.py

Co-authored-by: Jason Lowe <[email protected]>

* fix regression

* Revert "fix regression"

This reverts commit 4b183a4.

* update more tests to use None instead of empty string

* allow fallback for non-utc in test_json_read_generated_dates

* update more tests to use None instead of empty string

---------

Signed-off-by: Andy Grove <[email protected]>
Co-authored-by: Jason Lowe <[email protected]>
* check shim exists in all.buildversions

* Signing off

Signed-off-by: Raza Jafri <[email protected]>

* addressed review comments

* return the shim version from the map method

---------

Signed-off-by: Raza Jafri <[email protected]>
…stamp column (NVIDIA#10074)

* Update the legacy mode check: only take effect when reading date/timestamp column
Signed-off-by: Chong Gao <[email protected]>
* Fix format_number NaN symbol in high jdk version

Signed-off-by: Haoyang Li <[email protected]>

* Avoid using mutlitple bytes character

Signed-off-by: Haoyang Li <[email protected]>

---------

Signed-off-by: Haoyang Li <[email protected]>
* fix regression against Spark 3.2.x

* add comments

* Revert a change

* signoff

Signed-off-by: Andy Grove <[email protected]>

---------

Signed-off-by: Andy Grove <[email protected]>
…IA#10131)

* With a single row GpuExplode tries to split the generator array

Signed-off-by: Alessandro Bellina <[email protected]>

---------

Signed-off-by: Alessandro Bellina <[email protected]>
…IDIA#10191)

* Updating the config documentation for filecache configs

Signed-off-by: Chong Gao <[email protected]>

* Update

---------

Signed-off-by: Chong Gao <[email protected]>
Co-authored-by: Chong Gao <[email protected]>
…andling (NVIDIA#10193)

* Fixes bug where on split and retry the carry along columns were corrupted

---------

Signed-off-by: Alessandro Bellina <[email protected]>
* Update string to float compatibility doc[skip ci]

Signed-off-by: Haoyang Li <[email protected]>

* Address comment

Signed-off-by: Haoyang Li <[email protected]>

---------

Signed-off-by: Haoyang Li <[email protected]>
@firestarman
Copy link
Owner Author

need to rebase the target branch

@firestarman firestarman deleted the improve-expand branch January 17, 2024 06:51
@firestarman firestarman restored the improve-expand branch January 17, 2024 06:53
@firestarman firestarman deleted the improve-expand branch January 17, 2024 06:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.