Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add RMM PyTorch allocator #1168
Add RMM PyTorch allocator #1168
Changes from all commits
f1e9413
53a18ab
aab7e07
63f2692
3fcd2ec
6404dac
73e0846
dc16ef4
9c27e85
39251db
6713361
a53ce95
741a1df
4534bca
dbf43b7
db5fc52
2b343c2
232fbd0
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q:
device
is ignored by design? (I was reviewing cupy/cupy#7210 and noticed this.)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, great catch! This brings out a subtle problem:
In RMM, each device has its own memory resource. Thus, to do the allocation on a specified
device
with RMM, I would write the torchallocate
function like this:Unforunately, the deallocation function does not accept a
device
argument, so we cannot retrieve the memory resource that was used for allocation:I don't really see a way around this other than for the
deallocate
signature to include thedevice
argument. cc: @emcastilloThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would love to know too. TBH I am puzzled by PyTorch's (long-time) behavior of asking for
device
. It should just honor the current device...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll submit a follow-up PR adding support for
device
, once pytorch/pytorch#91398 is merged.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this bite a user that accesses this attribute and passes it around thinking that it's fine when it's really just None? It might be safer to override
__getattr__
for the module and have it raise an error to prevent the user from accessing this attribute whenCUDAPluggableAllocator
failed to import.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we alternatively
pass
onImportError
to achieve the same effect as defining that module__getattr__
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you'd get close to the same effect, just a slightly less user-friendly version. With a
__getattr__
override you could provide a more friendly error message indicating that this happened because the torch allocator failed to import, whereas if you just avoid defining it the user will see anAttributeError
without any additional diagnostics and may think it's a bug in rmm.It's a very minor point though, I'm fine leaving this as is for now and only revisiting in the future if we get a lot of user questions about why the allocator is None.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is neat!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: Would this honor
rmm.reinitialize()
if a user changes the MR?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rmm.reinitialize()
resets the default memory resource used by RMM. Each call toallocate()
anddeallocate()
queries the default memory resource via a call toget_current_device_resource()
, so -- yes.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a pytest utility for this if you want to use it.
pytest.importorskip
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that handle
import
ing specific functions?This is using
importorskip
fortorch
generally above. Thetorch.cuda.memory
module has been around for a while. Though the functionality we need from it is pretty new.Maybe in the future this could require a specific PyTorch version. There doesn't seem to be one yet that has what we need though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No,
pytest.importorskip
only handles modules and you have to use attribute accessors to get functions. It's kind of messy. The current solution is probably easier to read, let's keep it as-is.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this function behave if passed None (the case where the torch allocator hasn't been defined)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm - I wouldn't expect it to be
None
if we were able to importchange_current_allocator
, since the existence ofchange_current_allocator
implies thatrmm_torch_allocator
was defined (although somewhat implicitly:change_current_allocator
andCudaPluggableAllocator
in PyTorch were introduced together).Should we also skip this test if
rmm.rmm_torch_allocator
isNone
for some reason?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah OK I wasn't sure about that, I thought they weren't introduced entirely concurrently. Up to you on the extra skip, it sounds like it would be pedantically correct but not practically necessary.