-
Notifications
You must be signed in to change notification settings - Fork 871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opal/cuda: Handle stream-ordered allocations and assign primary device context #12835
opal/cuda: Handle stream-ordered allocations and assign primary device context #12835
Conversation
a251ab3
to
419e9b7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you ever checked that the mpool are supported in the setup (using the CU_DEVICE_ATTRIBUTE_MEMORY_POOLS_SUPPORTED attribute for cuDeviceGetAttribute
) ?
Hi @bosilca I have not. Are you suggesting we check the attribute and always return 0 for cuda_check_mpool if mpools are unsupported? If so, DeviceAttribute queries are more expensive and I would say that Pointer Attribute query we have effectively does the same but with lesser overhead. But if we assume homogeneity of GPUs then I agree that your suggestion would eliminate even the pointer query if we have a static variable initialized to reflect mpool support. Question then is if we can assume homogeneity. What are your thoughts on this? |
We can determine all that upfront. Check for mpool support once for each visible devices, generate a warning if they are different and activate the most generic checks for the pointers. If homogeneous, we can remove some of the checks. I wonder how many heterogeneous setups are really used for anything else than toys ? |
Indeed this will not be common. |
419e9b7
to
5328616
Compare
CUmemAccess_flags flags; | ||
CUmemLocation location; | ||
|
||
if (device_count == -1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic here is a little too lax for the cases with 0 devices. If I understand the code correctly, without a device (aka. without a valid device 0) the call to cuDeviceGetAttribute
will never change mpool_supported
, so the check cuDeviceGetAttribute
will be executed every time. You should bail out of this function if device_count
is 0.
5328616
to
6516714
Compare
…e context Signed-off-by: Akshay Venkatesh <[email protected]>
6516714
to
cafcce9
Compare
This PR is similar to #12757 and it detects if a given pointer belongs to CUDA memory pools.
Additionally, we assign primary device context to calling thread in the absence of a device context as VMM and memory pool pointers do not have a device context associated with them by design. While ideally we release references on the primary context by making an equal number of cuDevicePrimaryCtxRelease calls as cuDevicePrimaryCtxRetain calls, unfortunately there is no good place to make the release call as we don't know up front when the last reference to the given pointer will be in process lifetime (especially as we don't intercept user CUDA calls to free said memory). For this reason, there will be at most one unreleased reference against the primary device context per user process thread and this shouldn't have any undesired effects on GPU resources as such.