Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opal/cuda: Handle stream-ordered allocations and assign primary device context #12835

Merged

Conversation

Akshay-Venkatesh
Copy link
Contributor

This PR is similar to #12757 and it detects if a given pointer belongs to CUDA memory pools.

Additionally, we assign primary device context to calling thread in the absence of a device context as VMM and memory pool pointers do not have a device context associated with them by design. While ideally we release references on the primary context by making an equal number of cuDevicePrimaryCtxRelease calls as cuDevicePrimaryCtxRetain calls, unfortunately there is no good place to make the release call as we don't know up front when the last reference to the given pointer will be in process lifetime (especially as we don't intercept user CUDA calls to free said memory). For this reason, there will be at most one unreleased reference against the primary device context per user process thread and this shouldn't have any undesired effects on GPU resources as such.

Copy link
Member

@bosilca bosilca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you ever checked that the mpool are supported in the setup (using the CU_DEVICE_ATTRIBUTE_MEMORY_POOLS_SUPPORTED attribute for cuDeviceGetAttribute) ?

@Akshay-Venkatesh
Copy link
Contributor Author

Akshay-Venkatesh commented Oct 1, 2024

Have you ever checked that the mpool are supported in the setup (using the CU_DEVICE_ATTRIBUTE_MEMORY_POOLS_SUPPORTED attribute for cuDeviceGetAttribute) ?

Hi @bosilca I have not. Are you suggesting we check the attribute and always return 0 for cuda_check_mpool if mpools are unsupported? If so, DeviceAttribute queries are more expensive and I would say that Pointer Attribute query we have effectively does the same but with lesser overhead. But if we assume homogeneity of GPUs then I agree that your suggestion would eliminate even the pointer query if we have a static variable initialized to reflect mpool support. Question then is if we can assume homogeneity. What are your thoughts on this?

@bosilca
Copy link
Member

bosilca commented Oct 1, 2024

We can determine all that upfront. Check for mpool support once for each visible devices, generate a warning if they are different and activate the most generic checks for the pointers. If homogeneous, we can remove some of the checks.

I wonder how many heterogeneous setups are really used for anything else than toys ?

@Akshay-Venkatesh
Copy link
Contributor Author

I wonder how many heterogeneous setups are really used for anything else than toys ?

Indeed this will not be common.

CUmemAccess_flags flags;
CUmemLocation location;

if (device_count == -1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic here is a little too lax for the cases with 0 devices. If I understand the code correctly, without a device (aka. without a valid device 0) the call to cuDeviceGetAttribute will never change mpool_supported, so the check cuDeviceGetAttribute will be executed every time. You should bail out of this function if device_count is 0.

@Akshay-Venkatesh Akshay-Venkatesh force-pushed the topic/handle-masync-assign-ctx branch from 5328616 to 6516714 Compare October 3, 2024 18:36
@Akshay-Venkatesh Akshay-Venkatesh force-pushed the topic/handle-masync-assign-ctx branch from 6516714 to cafcce9 Compare October 3, 2024 20:53
@janjust janjust merged commit 041a904 into open-mpi:main Oct 4, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants