Improve GpuExpand by pre-projecting complicated expressions #24

firestarman · 2024-01-16T07:36:01Z

Some Spark platforms will put non-leaf expressions in GpuExpand projections, then it
can not leverage the GPU tiered projection across the projection lists.

This PR tries to factor out these complicated expressions and place them into an existing
child GpuProject or a new one to avoid duplicate evaluation for semantic (sub)
expressions.

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Signed-off-by: Peixin Li <[email protected]>

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

…lure. (NVIDIA#10143) Fixes NVIDIA#10134. This commit fixes test failures in `test_window_aggs_for_batched_finite_row_windows_partitioned`, resulting from ambiguous ordering in the window function input. The failing tests partition by `a`, and order by `b,c`. When the values of `b,c` have repeated values, the results from the window function execution is indeterminate. This commit changes the definition of the aggregation column `c` (that's also included in the order-by clause), to use unique long values. This guarantees deterministic output. Signed-off-by: MithunR <[email protected]>

… machines (NVIDIA#10150) Signed-off-by: Jim Brennan <[email protected]>

) Signed-off-by: Jim Brennan <[email protected]>

…IA#10140) * Removed unnecessary whitespaces * Signing off Signed-off-by: Raza Jafri <[email protected]> --------- Signed-off-by: Raza Jafri <[email protected]>

…on-UTC time zones (NVIDIA#10100) Signed-off-by: Navin Kumar <[email protected]>

workaround for NVIDIA#10115, closes NVIDIA#10105 Signed-off-by: Gera Shegalov <[email protected]>

To fix issue: NVIDIA#10154 Install 'procps' to fix 'ps: command not found' in nvidia Rocky9 Docker containers, when runing integration tests with jenkins/spark-test.sh 'procps' is required for rocky docker containers to run spark standalone cluster, see: https://github.com/apache/spark/blob/v3.3.2/bin/load-spark-env.sh#L68 Signed-off-by: Tim Liu <[email protected]>

… files [databricks] (NVIDIA#10146) Signed-off-by: Robert (Bobby) Evans <[email protected]>

* Fixed Spark 334 build * Signing off Signed-off-by: Raza Jafri <[email protected]> * added 334 to noSnapshots * updated 2.13 pom --------- Signed-off-by: Raza Jafri <[email protected]>

* Support to_utc_timestamp Signed-off-by: Haoyang Li <[email protected]> * add tests Signed-off-by: Haoyang Li <[email protected]> * perf tests Signed-off-by: Haoyang Li <[email protected]> * claen up Signed-off-by: Haoyang Li <[email protected]> * claen up Signed-off-by: Haoyang Li <[email protected]> * address comment Signed-off-by: Haoyang Li <[email protected]> --------- Signed-off-by: Haoyang Li <[email protected]>

…9790) * Use format_float kernel Signed-off-by: Haoyang Li <[email protected]> * Add tests and doc Signed-off-by: Haoyang Li <[email protected]> * use new name from jni change Signed-off-by: Haoyang Li <[email protected]> * move inf/nan replacement to kernel Signed-off-by: Haoyang Li <[email protected]> * claen up Signed-off-by: Haoyang Li <[email protected]> * Address comments Signed-off-by: Haoyang Li <[email protected]> --------- Signed-off-by: Haoyang Li <[email protected]>

…ith GpuStructsToJson [databricks] (NVIDIA#9975) * upmerge * Revert change to csv_test Signed-off-by: Andy Grove <[email protected]> * scalastyle * introduce shim * remove unreachable code * fix some failures with 311 * save progress * fix more failures with 340 * fix test failures with 341 * tests pass with 341 * Add 330 shim and fix failures in test_basic_json_read * save progress on 330 shim * tests pass with 330 * 320 shim * test all date formats with from_json * remove redundant and confusing use of failOnInvalid parameter * Revert unrelated change * Remove comment * Remove blank line * Remove blank line * Revert accidental change to test_basic_json_read * Fix compilation error caused by refactor * Scala style * Scala style * update compatibility guide * move json-specific date parsing into GpuJsonScan to fix regression in CSV tests * fix regression introduced during refactor * fall back to CPU if timestampFormat specified in 320 shim * fall back to CPU if timestampFormat specified in 340 shim * fix ci failure with 341db * add 334 shim * add clarifying comment * update copyright years to 2024 * use None instead of empty string in tests * fix copyright years * remove xfail from tests * fix regression * Update integration_tests/src/main/python/json_test.py Co-authored-by: Jason Lowe <[email protected]> * Update integration_tests/src/main/python/json_test.py Co-authored-by: Jason Lowe <[email protected]> * fix regression * Revert "fix regression" This reverts commit 4b183a4. * update more tests to use None instead of empty string * allow fallback for non-utc in test_json_read_generated_dates * update more tests to use None instead of empty string --------- Signed-off-by: Andy Grove <[email protected]> Co-authored-by: Jason Lowe <[email protected]>

…10171) Signed-off-by: Navin Kumar <[email protected]>

…abricks] (NVIDIA#10168)

* check shim exists in all.buildversions * Signing off Signed-off-by: Raza Jafri <[email protected]> * addressed review comments * return the shim version from the map method --------- Signed-off-by: Raza Jafri <[email protected]>

…stamp column (NVIDIA#10074) * Update the legacy mode check: only take effect when reading date/timestamp column Signed-off-by: Chong Gao <[email protected]>

* Fix format_number NaN symbol in high jdk version Signed-off-by: Haoyang Li <[email protected]> * Avoid using mutlitple bytes character Signed-off-by: Haoyang Li <[email protected]> --------- Signed-off-by: Haoyang Li <[email protected]>

…10189) Signed-off-by: Chong Gao <[email protected]>

* fix regression against Spark 3.2.x * add comments * Revert a change * signoff Signed-off-by: Andy Grove <[email protected]> --------- Signed-off-by: Andy Grove <[email protected]>

…IA#10131) * With a single row GpuExplode tries to split the generator array Signed-off-by: Alessandro Bellina <[email protected]> --------- Signed-off-by: Alessandro Bellina <[email protected]>

…IDIA#10191) * Updating the config documentation for filecache configs Signed-off-by: Chong Gao <[email protected]> * Update --------- Signed-off-by: Chong Gao <[email protected]> Co-authored-by: Chong Gao <[email protected]>

…andling (NVIDIA#10193) * Fixes bug where on split and retry the carry along columns were corrupted --------- Signed-off-by: Alessandro Bellina <[email protected]>

* Update string to float compatibility doc[skip ci] Signed-off-by: Haoyang Li <[email protected]> * Address comment Signed-off-by: Haoyang Li <[email protected]> --------- Signed-off-by: Haoyang Li <[email protected]>

Signed-off-by: Firestarman <[email protected]>

firestarman · 2024-01-17T06:51:22Z

need to rebase the target branch

nvauto and others added 30 commits November 14, 2023 14:14

Merge pull request NVIDIA#9694 from NVIDIA/branch-23.12

7118506

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9699 from NVIDIA/branch-23.12

1a548eb

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9700 from NVIDIA/branch-23.12

2f088e3

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9704 from NVIDIA/branch-23.12

eacf2c8

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9706 from NVIDIA/branch-23.12

3b0d65d

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9709 from NVIDIA/branch-23.12

03496bf

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9710 from NVIDIA/branch-23.12

b21e21f

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9713 from NVIDIA/branch-23.12

a3eee5c

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9725 from NVIDIA/branch-23.12

f092553

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9727 from NVIDIA/branch-23.12

aeb70db

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9730 from NVIDIA/branch-23.12

a46849d

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9734 from NVIDIA/branch-23.12

a3d1e46

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9735 from NVIDIA/branch-23.12

ef427f4

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Initiate project version 24.02.0-SNAPSHOT (NVIDIA#9716)

342b67b

Signed-off-by: Peixin Li <[email protected]>

Merge pull request NVIDIA#9740 from NVIDIA/branch-23.12

3f1cddc

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9742 from NVIDIA/branch-23.12

42f38a2

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9749 from NVIDIA/branch-23.12

9ff3b7c

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9752 from NVIDIA/branch-23.12

c18d6ef

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9754 from NVIDIA/branch-23.12

198cfbd

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9756 from NVIDIA/branch-23.12

ecea3f4

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9757 from NVIDIA/branch-23.12

9a85791

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9762 from NVIDIA/branch-23.12

88d88ac

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9764 from NVIDIA/branch-23.12

ef79e11

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9772 from NVIDIA/branch-23.12

12fa043

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9791 from NVIDIA/branch-23.12

6e3881b

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9793 from NVIDIA/branch-23.12

5e53ed5

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9796 from NVIDIA/branch-23.12

2b116a2

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9797 from NVIDIA/branch-23.12

e61ce58

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9799 from NVIDIA/branch-23.12

fb7b8fc

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9803 from NVIDIA/branch-23.12

2660b0c

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

mythrocks and others added 24 commits January 3, 2024 22:34

Increase timeouts in HostAllocSuite to avoid timeout failures on slow…

beba547

… machines (NVIDIA#10150) Signed-off-by: Jim Brennan <[email protected]>

fix test_hash_agg_with_nan_keys floating point sum failure (NVIDIA#10148

4e0c16e

) Signed-off-by: Jim Brennan <[email protected]>

Removed Unnecessary Whitespaces From Spark 3.3.4 Shim [skip ci] (NVID…

b1452dd

…IA#10140) * Removed unnecessary whitespaces * Signing off Signed-off-by: Raza Jafri <[email protected]> --------- Signed-off-by: Raza Jafri <[email protected]>

Enable to_date (via gettimestamp and casting timestamp to date) for n…

240d661

…on-UTC time zones (NVIDIA#10100) Signed-off-by: Navin Kumar <[email protected]>

Disable multi-threaded Maven (NVIDIA#10153)

26bdf0c

workaround for NVIDIA#10115, closes NVIDIA#10105 Signed-off-by: Gera Shegalov <[email protected]>

Refactor the window code so it is not mostly kept in a few very large…

ef0b037

… files [databricks] (NVIDIA#10146) Signed-off-by: Robert (Bobby) Evans <[email protected]>

Fix Spark 334 Build (NVIDIA#10162)

d4ec3e9

* Fixed Spark 334 build * Signing off Signed-off-by: Raza Jafri <[email protected]> * added 334 to noSnapshots * updated 2.13 pom --------- Signed-off-by: Raza Jafri <[email protected]>

test needs marks for non-UTC and for non_supported timezones (NVIDIA#…

4e57f5f

…10171) Signed-off-by: Navin Kumar <[email protected]>

Prevent a degenerative join in test_dpp_reuse_broadcast_exchange [dat…

c92bbc0

…abricks] (NVIDIA#10168)

Defined Shims Should Be Declared In POM (NVIDIA#10167)

dfc1d97

* check shim exists in all.buildversions * Signing off Signed-off-by: Raza Jafri <[email protected]> * addressed review comments * return the shim version from the map method --------- Signed-off-by: Raza Jafri <[email protected]>

Update the legacy mode check: only take effect when reading date/time…

708bbac

…stamp column (NVIDIA#10074) * Update the legacy mode check: only take effect when reading date/timestamp column Signed-off-by: Chong Gao <[email protected]>

test needs marks for non-UTC and for non_supported timezones (NVIDIA#…

ffcae6f

…10189) Signed-off-by: Chong Gao <[email protected]>

Fix build regression against Spark 3.2.x [databricks] (NVIDIA#10179)

490b77c

* fix regression against Spark 3.2.x * add comments * Revert a change * signoff Signed-off-by: Andy Grove <[email protected]> --------- Signed-off-by: Andy Grove <[email protected]>

With a single row GpuExplode tries to split the generator array (NVID…

97bc92a

…IA#10131) * With a single row GpuExplode tries to split the generator array Signed-off-by: Alessandro Bellina <[email protected]> --------- Signed-off-by: Alessandro Bellina <[email protected]>

Fix explode with carry-along columns on GpuExplode single row retry h…

df144b1

…andling (NVIDIA#10193) * Fixes bug where on split and retry the carry along columns were corrupted --------- Signed-off-by: Alessandro Bellina <[email protected]>

Update string to float compatibility doc[skip ci] (NVIDIA#10156)

6419da6

* Update string to float compatibility doc[skip ci] Signed-off-by: Haoyang Li <[email protected]> * Address comment Signed-off-by: Haoyang Li <[email protected]> --------- Signed-off-by: Haoyang Li <[email protected]>

Improve Expand by pre-projecting some columns

74810e0

Signed-off-by: Firestarman <[email protected]>

firestarman force-pushed the improve-expand branch from 926724b to 74810e0 Compare January 17, 2024 06:49

firestarman closed this Jan 17, 2024

firestarman deleted the improve-expand branch January 17, 2024 06:51

firestarman restored the improve-expand branch January 17, 2024 06:53

firestarman deleted the improve-expand branch January 17, 2024 06:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve GpuExpand by pre-projecting complicated expressions #24

Improve GpuExpand by pre-projecting complicated expressions #24

firestarman commented Jan 16, 2024

firestarman commented Jan 17, 2024

Improve GpuExpand by pre-projecting complicated expressions #24

Improve GpuExpand by pre-projecting complicated expressions #24

Conversation

firestarman commented Jan 16, 2024

firestarman commented Jan 17, 2024