Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add gpu memory watermark apis to JNI #11950

Merged
merged 15 commits into from
Oct 24, 2022

Conversation

abellina
Copy link
Contributor

@abellina abellina commented Oct 20, 2022

This PR addresses #11949.

We are adding methods to get the current memory usage watermarks at the whole process level and adding a "scoped" maximum, where the user can reset the initial value, run cuDF functions, and then call the API to get what happened since the reset.

For the scoped maximum, the getScopedMaximumOutstanding could have somewhat surprising results. If the scoped maximum is reset to 0 for example, and we only see frees for allocations done before the reset, we are going to see that the scoped maximum returned is 0. This is because our memory usage is literally negative in this scenario.

The APIs here assume that the caller process is using a single thread to call into the GPU (for Spark it would be 1 concurrent task).

Note I assume Rmm.initialize has been called, otherwise this doesn't track allocations done before that.

@github-actions github-actions bot added the Java Affects Java cuDF API. label Oct 20, 2022
@abellina abellina added feature request New feature or request non-breaking Non-breaking change and removed Java Affects Java cuDF API. labels Oct 20, 2022
@github-actions github-actions bot added the Java Affects Java cuDF API. label Oct 20, 2022
@abellina abellina added the Spark Functionality that helps Spark RAPIDS label Oct 20, 2022
Copy link
Contributor

@jrhemstad jrhemstad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how much you care about performance here or how often you intend to use this functionality. There's definitely room for optimization, but may not be worth the effort if you don't care much about the perf.

java/src/main/native/src/RmmJni.cpp Outdated Show resolved Hide resolved
java/src/main/native/src/RmmJni.cpp Outdated Show resolved Hide resolved
@abellina
Copy link
Contributor Author

@jrhemstad fyi, we are going to simplify this a lot removing the stack you just commented on. So it may not be worth a review right now. The idea is now to simply keep a watermark of the maximum memory used at a global level, and then add "local" watermark that can be reset (like an odometer essentially). This means we'd use this functionality single threaded while devs are trying to debug an issue, so it removes a whole host of issues as well.

@jrhemstad
Copy link
Contributor

You may want to check out the tracking_resource_adapter that RMM provides as well: https://github.com/rapidsai/rmm/blob/1394c281a9294c87342802e8392d4f60d17ed7de/include/rmm/mr/device/tracking_resource_adaptor.hpp#L45

@abellina
Copy link
Contributor Author

abellina commented Oct 20, 2022

Yes @jrhemstad on tracking_resource_adaptor. One of the things we discussed locally was to propose modifying it or creating something similar where a stack trace could be dumped when we reach a certain level (so we know what part of the code "triggered" the oom). That said, the same could be accomplished by enabling the tracking_resource_adaptor and limiting the RMM pool, so on OOM we could get the cuDF/JNI stack trace that caused us to go overboard.

End result is we want to figure out ways to stay within some limits, which dictates how big a table we should be aiming for and how much concurrency we should allow. So far we think we can limit concurrency to 1 thread, and run with reduced pool or some debug flags to tell us what part of a query violated our assumption (not a production setup obviously). We are also trying to estimate how much memory we might use, but all of that is Spark plugin-side and will be separate changes.

@abellina abellina changed the title Add gpu memory tracking api in JNI to track maximum memory usage Add gpu memory watermark apis to JNI Oct 20, 2022
@abellina abellina marked this pull request as ready for review October 20, 2022 19:23
@abellina abellina requested a review from a team as a code owner October 20, 2022 19:23
@abellina
Copy link
Contributor Author

So the above has a bug because I am using size_t and given some of the reset logic, it can go negative (and overflow). I'll put up a patch + tests.

@codecov
Copy link

codecov bot commented Oct 20, 2022

Codecov Report

Base: 88.11% // Head: 88.14% // Increases project coverage by +0.03% 🎉

Coverage data is based on head (bec9818) compared to base (5c2150e).
Patch has no changes to coverable lines.

Additional details and impacted files
@@               Coverage Diff                @@
##           branch-22.12   #11950      +/-   ##
================================================
+ Coverage         88.11%   88.14%   +0.03%     
================================================
  Files               133      133              
  Lines             21982    21982              
================================================
+ Hits              19369    19376       +7     
+ Misses             2613     2606       -7     
Impacted Files Coverage Δ
python/cudf/cudf/core/dataframe.py 93.77% <0.00%> (+0.04%) ⬆️
python/cudf/cudf/core/column/string.py 88.65% <0.00%> (+0.12%) ⬆️
python/cudf/cudf/core/groupby/groupby.py 91.51% <0.00%> (+0.20%) ⬆️
python/cudf/cudf/core/tools/datetimes.py 84.49% <0.00%> (+0.30%) ⬆️
python/cudf/cudf/core/column/lists.py 93.75% <0.00%> (+0.96%) ⬆️
python/strings_udf/strings_udf/__init__.py 86.27% <0.00%> (+1.96%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.


// `total_allocated - local_allocated` can be negative in the case where we free
// after we call `reset_local_max_outstanding`
std::size_t local_diff = std::max(static_cast<long>(total_allocated - local_allocated), 0L);
Copy link
Contributor

@jbrennan333 jbrennan333 Oct 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use static_cast<intptr_t> instead of static_cast<long>
I don't think long is guaranteed to be big enough to hold a size_t.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think long == long long (I think this is 32bit vs 64bit compiled programs). To cover all of std::size_t, I'd have to go to unsigned long. That's a lot of GPU memory ;) I am not sure we need to worry too much about that, especially since we are going to send this to Spark shortly, which runs java, and java's long is 64-bit and signed.

size_t max value: 18446744073709551615
long max value: 9223372036854775807
unsigned long max value: 18446744073709551615
long long max value: 9223372036854775807
unsigned long long max value: 18446744073709551615

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in this case, long is sufficient because we are an LP64 architecture (we don't run on Windows, do we?).
std::intptr_t is guaranteed to be the same width as std::size_t, but signed (I don't think ssize_t is standard?). You could use int64_t here, since as you say we know we are going to pass it to java, which is using 64 bits. This was more of a technical nit, than an actual concern that it will break (too much history cross-porting to different architectures...)

Copy link
Contributor

@ttnghia ttnghia Oct 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These types are very confusing which can be alias of other types depending on the system. Therefore, for clarity, please always use the fix-width types: (u)int32_t and (u)int64_t. They guarantee you to have known limits.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is a debug feature, we don't use (u)int* right now (e.g. I'd make things more inconsistent unless I change the whole thing), and I am not sure whether cuDF is moving away from the alias types. I think we can update in one PR that is "go away from these old types to the better ones" in the future.

java/src/main/java/ai/rapids/cudf/Rmm.java Outdated Show resolved Hide resolved
java/src/main/java/ai/rapids/cudf/Rmm.java Outdated Show resolved Hide resolved
java/src/main/java/ai/rapids/cudf/Rmm.java Outdated Show resolved Hide resolved
java/src/main/native/src/RmmJni.cpp Outdated Show resolved Hide resolved
@abellina
Copy link
Contributor Author

@jlowe this should be ready for another look

java/src/main/java/ai/rapids/cudf/Rmm.java Outdated Show resolved Hide resolved
java/src/main/java/ai/rapids/cudf/Rmm.java Outdated Show resolved Hide resolved
java/src/main/java/ai/rapids/cudf/Rmm.java Outdated Show resolved Hide resolved
java/src/main/java/ai/rapids/cudf/Rmm.java Outdated Show resolved Hide resolved
java/src/main/java/ai/rapids/cudf/Rmm.java Outdated Show resolved Hide resolved
@abellina
Copy link
Contributor Author

I see some unrelated test failures in the python side. Upmerging for now.

Copy link
Contributor

@jbrennan333 jbrennan333 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 lgtm

@abellina
Copy link
Contributor Author

@jlowe this should be ready for another look

@abellina
Copy link
Contributor Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 1e93af8 into rapidsai:branch-22.12 Oct 24, 2022
@abellina abellina deleted the oom/memory_track branch October 24, 2022 19:33
rapids-bot bot pushed a commit that referenced this pull request Nov 4, 2022
This adds `onAllocated` and `onDeallocated` to `RmmEventHandler` as debug callbacks. If the event handler is installed with debug enabled (in `Rmm.setEventHandler`) these callbacks will be invoked when an allocation or deallocation finishes.

It also fixes a bug with #11950 where the initial allocated amount was not getting set appropriately. It was getting set to 0, but instead it should be set to the new initial value/maximum.

Closes #11949.

Authors:
  - Alessandro Bellina (https://github.com/abellina)

Approvers:
  - Jason Lowe (https://github.com/jlowe)

URL: #12054
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request Java Affects Java cuDF API. non-breaking Non-breaking change Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants