Make tables spillable by default #8264

abellina · 2023-05-09T19:51:23Z

This depends on rapidsai/cudf#13260 (rapidsai/cudf#13180 overarching issue)

The idea behind this is to be able to add a Table as an object into the spillable store. The Table will not be made contiguous until that is needed at spill time (to host memory), which is done using chunked_pack. Once it has gone through chunked_pack the table will be reconstituted as a regular RapidsDeviceMemoryBuffer with a single contiguous allocation.

It is draft because I still need to clean it up some and I need to add unit tests specific to this scenario. I also did a bunch of cleanup today and I have not run the latest, so I could have bugs here. But nevertheless, I think it is worth starting to take a look at it.

It won't pass the build because the cuDF PRs are not in yet, and I have one test failure that I know of.

Signed-off-by: Alessandro Bellina <[email protected]>

…llable

sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsBufferCatalog.scala

revans2

Mostly nits and comment requests. But I want to spend some more time looking at the tests.

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuDeviceManager.scala

sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsBuffer.scala

sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsBufferCatalog.scala

sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsConf.scala

sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsDeviceMemoryStore.scala

sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsHostMemoryStore.scala

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/stringFunctions.scala

tests/src/test/scala/com/nvidia/spark/rapids/RmmSparkRetrySuiteBase.scala

jbrennan333 · 2023-05-11T13:57:35Z

Overall this is excellent work @abellina! I will likely do another pass, but so far my comments all overlap with those from @revans2.

abellina · 2023-05-15T18:16:45Z

Updating the full NDS benchmark with rapidsai/cudf#13260 + rapidsai/cudf#13278 + #8264 (this), I am seeing a ~4% improvement vs the last nightly.

I looked at one of the queries (q95) which becomes ~14% faster. Before the change ~7% of kernel time was spent in this function, whereas now it's less than 0.1%.

Name = benchmark
Means = 417757.4, 401496.8
Time diff = 16260.600000000035
Speedup = 1.0404999491901306
T-Test (test statistic, p value, df) = 9.948692331384382, 8.820392981431996e-06, 8.0
T-Test Confidence Interval = 12491.56085504279, 20029.63914495728
ALERT: significant change has been detected (p-value < 0.05)
ALERT: improvement in performance has been observed

abellina · 2023-05-19T16:53:33Z

working on review comments locally and will push but this won't be merge-able until Monday.

…ore obvious

abellina · 2023-05-22T18:44:20Z

build

abellina · 2023-05-22T20:48:43Z

build

abellina · 2023-05-22T21:26:12Z

build

abellina · 2023-05-22T22:27:55Z

Ok, @revans2 the reason for c3c2179 was a change I had made in the test locally and forgotten to undo it. It should be a single close because the code throws a retry and split within a retry (this tests the specific exception), and so every batch that was open was told to close. I do not remember why I had changed this locally, but overall it removes the whole file from the unit test changes.

abellina added 3 commits May 9, 2023 14:46

Make tables spillable by default

e10b0c7

Signed-off-by: Alessandro Bellina <[email protected]>

Added a few tests for non-contiguous tables

436234e

Add test showing that incRefCounting a column keeps the table non-spi…

7766720

…llable

jbrennan333 reviewed May 10, 2023

View reviewed changes

sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsBufferCatalog.scala Outdated Show resolved Hide resolved

revans2 reviewed May 10, 2023

View reviewed changes

sameerz added reliability Features to improve reliability or bugs that severly impact the reliability of the plugin performance A performance related task/issue labels May 11, 2023

abellina added 5 commits May 22, 2023 08:53

Address review comments + make the RapidsTable->Column registration m…

0f9e75b

…ore obvious

Upmerge 23.06

047d3cd

Update docs

71dc175

Remove pinned memory leak

921e526

If chunked_pack fails to initialize make sure we mark it uninitialized

4f78c56

revans2 previously approved these changes May 22, 2023

View reviewed changes

abellina marked this pull request as ready for review May 22, 2023 18:44

abellina added 2 commits May 22, 2023 15:13

Fix mocks in RapidsShuffleServerSuite

3fd1e28

Fix test issues

a5b008f

abellina dismissed revans2’s stale review via a5b008f May 22, 2023 20:40

revans2 previously approved these changes May 22, 2023

View reviewed changes

One more test failure

c3c2179

abellina dismissed revans2’s stale review via c3c2179 May 22, 2023 21:25

revans2 approved these changes May 23, 2023

View reviewed changes

abellina merged commit 6bd9768 into NVIDIA:branch-23.06 May 23, 2023

abellina deleted the spillable_tables branch May 23, 2023 21:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make tables spillable by default #8264

Make tables spillable by default #8264

abellina commented May 9, 2023

revans2 left a comment

jbrennan333 commented May 11, 2023

abellina commented May 15, 2023

abellina commented May 19, 2023

abellina commented May 22, 2023

abellina commented May 22, 2023

abellina commented May 22, 2023

abellina commented May 22, 2023

Make tables spillable by default #8264

Make tables spillable by default #8264

Conversation

abellina commented May 9, 2023

revans2 left a comment

Choose a reason for hiding this comment

jbrennan333 commented May 11, 2023

abellina commented May 15, 2023

abellina commented May 19, 2023

abellina commented May 22, 2023

abellina commented May 22, 2023

abellina commented May 22, 2023

abellina commented May 22, 2023