Add in a GpuMemoryLeaseManager #7361

revans2 · 2022-12-14T22:06:17Z

This depends on rapidsai/cudf#12390

This fixes #7253
It is the first step towards better memory coordination in the plugin.

Signed-off-by: Robert (Bobby) Evans <[email protected]>

tgravescs · 2022-12-15T14:13:55Z

it would be nice to have a description here and in the code as to what it is, how it works/replaces semaphore.

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuMemoryLeaseManager.scala

jlowe · 2022-12-15T16:05:10Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuMemoryLeaseManager.scala

+   * Initializes the GPU memory lease manager
+   * @param savedForShuffle the amount of memory that is saved for suffle
+   */
+  def initialize(savedForShuffle: Long): Unit = synchronized {


Do we want to call it reserved memory or something similarly more generic? We may come up with other situations outside of shuffle that need memory set aside from this. Or are you thinking we will add more parameters to this for each use-case that needs memory reserved in the future?

Along those same lines, I'm also wondering if it's this class's responsibility to calculate the memory amount to manage or the one calling it. The benefit of having the caller specify the amount of memory to manage is that it makes it easier to unit test and reduces the coupling to other things (like shuffle memory reservation).

If you want me to refactor it so we only have a pool size passed in I am fine with it.

I do think it would be cleaner to move this out. As it is now, the caller could ask for more memory to be reserved than there is memory available on the GPU, for example, which isn't covered. By having the caller calculate the amount of memory for the lease manager to manage, I think we might be able to more cleanly handle these error cases and may be able to remove the need for standard vs. testing init.

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuMemoryLeaseManager.scala

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuSemaphore.scala

jlowe · 2022-12-15T16:57:45Z

tests/src/test/scala/com/nvidia/spark/rapids/GpuMemoryLeaseManagerSuite.scala

+      assert(lease.leaseAmount == 900)
+      child.start()
+      // The child should block so give it some time to come up...
+      Thread.sleep(100)


Sleeps are almost always an indication of a flaky test, especially a sleep that's this small. One full-GC hiccup could easily make this essentially a sleep(0) in practice, which means the test doesn't actually test what we want it to, despite that it passes this check.

yes, I was being lazy. Thanks for making me fix it.

This test is still racy, since the child could fail to block on a lease (or still be in the middle of calculating whether or not it should block) when the main thread is checking for the result of getting a lease. Completely eliminating the race probably will require exposing the internals of the lease manager to some degree so the test can verify a specific thread has indeed been blocked by the lease manager (e.g.: add a test-only predicate function to the lease manager that can be used to see if a particular thread is known to be blocked by the manager).

revans2 · 2022-12-19T16:41:54Z

@jlowe @tgravescs I think I have addressed all of the current issues, but I am happy to make more changes if needed.

revans2 · 2022-12-19T21:18:34Z

Moved this to draft because I want to get some feedback from @abellina

tgravescs

so one concern I have is that now we potentially override what users had set for spark.rapids.sql.concurrentGpuTasks. I specifically know of one customer using T4's that has this set to 3 and it performed better then 2 and they just had the default batch size. Perhaps we need some override or logic if its explicitly set until we get more logic to be smarter throughout.

It would be interesting to get some NDS runs on smaller gpus to see if it has a performance impact there as well.

tgravescs · 2022-12-19T21:19:26Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuMemoryLeaseManager.scala

+ * Be aware that the numbers here are just made up, so only take this as an example of what
+ * could be done, and not exactly what will be done.
+ * <br/>
+ * An ORC write can take a number of different forms. We are doing to start with the simplest


Suggested change

* An ORC write can take a number of different forms. We are doing to start with the simplest

* An ORC write can take a number of different forms. We are going to start with the simplest

tgravescs · 2022-12-19T21:20:34Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuMemoryLeaseManager.scala

+ * the operation - 4 GiB available to use = 5 GiB more needed to complete the operation.
+ * Because we need more memory to finish the task we can make the input batch is spillable
+ * and call `requestLease(taskContext, 0, moreNeeded)` where the `moreNeeded` holds 5 GiB, the
+ * result of the math, and is the `requiredAmount` parameter to `requestLEase`. The method may


Suggested change

* result of the math, and is the `requiredAmount` parameter to `requestLEase`. The method may

* result of the math, and is the `requiredAmount` parameter to `requestLease`. The method may

mattahrens · 2022-12-20T16:25:48Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuMemoryLeaseManager.scala

+   *                       memory to fulfill this request an exception will be thrown.
+   * @return a MemoryLease that allows the task to allocate more memory.
+   */
+  def requestLease(tc: TaskContext, optionalAmount: Long, requiredAmount: Long = 0): MemoryLease =


naming ideas: use maximumAmount instead of optionalAmount and minimumAmount instead of requiredAmount? that seems clearer than optional/required. and if you specify both maximumAmount and minimumAmount, then maximumAmount would be greater than minimumAmount. seems confusing where existing parameters as independent (e.g. "this is on top of any required amount of memory").

I think this api suggestion makes it clearer as well and ports well for the non-blocking lease.

abellina · 2022-12-20T21:16:46Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/Plugin.scala

+      } else {
+        0
+      }
+      GpuMemoryLeaseManager.initialize(reservedShuffleMemory)
      val concurrentGpuTasks = conf.concurrentGpuTasks
      logInfo(s"The number of concurrent GPU tasks allowed is $concurrentGpuTasks")


nit, could we include in this line the config for the memory lease manager (or ask it to log its own line). It should include the total amount of memory it sees, and optionally add the reserve amount it discounted.

abellina · 2022-12-20T21:28:06Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuMemoryLeaseManager.scala

+    val alreadyRequested = getTotalLease(tc)
+    require(requiredAmount + alreadyRequested <= memoryForLease,
+      s"Task: $taskAttemptId requested at least $requiredAmount more bytes, but already has " +
+          s"leased $alreadyRequested bytes which would got over the total for the worker " +


Suggested change

s"leased $alreadyRequested bytes which would got over the total for the worker " +

s"leased $alreadyRequested bytes which would go over the total for the executor " +

abellina · 2022-12-20T21:35:01Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuMemoryLeaseManager.scala

+    } else if (totalAmountRequested + alreadyRequested > memoryForLease) {
+      logWarning(s"Task: $taskAttemptId requested $totalAmountRequested bytes, but has " +
+          s"already requested $alreadyRequested bytes. This would go over the total for the " +
+          s"worker $memoryForLease reducing the request on the hope that this was an " +


Suggested change

s"worker $memoryForLease reducing the request on the hope that this was an " +

s"executor $memoryForLease reducing the request on the hope that this was an " +

abellina · 2022-12-20T21:45:03Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuMemoryLeaseManager.scala

+
+  def getTotalLease(tc: TaskContext): Long = synchronized {
+    val taskAttemptId = tc.taskAttemptId()
+    val data = activeTasks.get(taskAttemptId)


nit, instead of data here and below in releaseLease we could use task as we did above.

abellina · 2022-12-20T21:47:17Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuMemoryLeaseManager.scala

+    val taskAttemptId = tc.taskAttemptId()
+    val data = activeTasks.get(taskAttemptId)
+    if (data != null) {
+      // For now we are just going to ignore that it is gone. Could be a race with releaseAllForTask


I think if we loose this race we should throw. It seems no code should call releaseLease if the task is complete already.

tests/src/test/scala/com/nvidia/spark/rapids/GpuMemoryLeaseManagerSuite.scala

revans2 · 2022-12-20T22:34:08Z

I checked in some code for trying to address issues with users configuring the concurrent number of tasks, but not setting the targetBatchSize when tuning their jobs. It is not great, but I checked it in just in case my desktop dies while I am off for the holidays.

revans2 · 2023-01-23T17:19:34Z

Closing in favor of a different approach.

Add in a GpuMemoryLeaseManager

07757d6

Signed-off-by: Robert (Bobby) Evans <[email protected]>

revans2 self-assigned this Dec 14, 2022

Updated to deal with RMM API Changes

be9f2ae

jlowe reviewed Dec 15, 2022

View reviewed changes

sameerz added the reliability Features to improve reliability or bugs that severly impact the reliability of the plugin label Dec 18, 2022

revans2 marked this pull request as ready for review December 19, 2022 14:49

revans2 added 2 commits December 19, 2022 08:50

Merge branch 'branch-23.02' into GMLM

bfcbec0

Addressed review comments

fc75c63

revans2 marked this pull request as draft December 19, 2022 21:18

tgravescs reviewed Dec 19, 2022

View reviewed changes

revans2 added 2 commits December 20, 2022 08:46

Merge branch 'branch-23.02' into GMLM

a1f820f

Fixed more comments, and added a cutoff switch

fadc026

mattahrens reviewed Dec 20, 2022

View reviewed changes

abellina reviewed Dec 20, 2022

View reviewed changes

Checkpoint

bed97b0

revans2 closed this Jan 23, 2023

mattahrens added the feature request New feature or request label Jan 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add in a GpuMemoryLeaseManager #7361

Add in a GpuMemoryLeaseManager #7361

revans2 commented Dec 14, 2022

tgravescs commented Dec 15, 2022

jlowe Dec 15, 2022

revans2 Dec 19, 2022

jlowe Dec 19, 2022

jlowe Dec 15, 2022

revans2 Dec 19, 2022

jlowe Dec 19, 2022

revans2 commented Dec 19, 2022

revans2 commented Dec 19, 2022

tgravescs left a comment •

edited

Loading

tgravescs Dec 19, 2022

tgravescs Dec 19, 2022

mattahrens Dec 20, 2022

abellina Dec 20, 2022

abellina Dec 20, 2022

abellina Dec 20, 2022

abellina Dec 20, 2022

abellina Dec 20, 2022

abellina Dec 20, 2022

revans2 commented Dec 20, 2022

revans2 commented Jan 23, 2023

	* An ORC write can take a number of different forms. We are doing to start with the simplest
	* An ORC write can take a number of different forms. We are going to start with the simplest

	* result of the math, and is the `requiredAmount` parameter to `requestLEase`. The method may
	* result of the math, and is the `requiredAmount` parameter to `requestLease`. The method may

	s"leased $alreadyRequested bytes which would got over the total for the worker " +
	s"leased $alreadyRequested bytes which would go over the total for the executor " +

	s"worker $memoryForLease reducing the request on the hope that this was an " +
	s"executor $memoryForLease reducing the request on the hope that this was an " +

Add in a GpuMemoryLeaseManager #7361

Add in a GpuMemoryLeaseManager #7361

Conversation

revans2 commented Dec 14, 2022

tgravescs commented Dec 15, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

revans2 commented Dec 19, 2022

revans2 commented Dec 19, 2022

tgravescs left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

revans2 commented Dec 20, 2022

revans2 commented Jan 23, 2023

tgravescs left a comment •

edited

Loading