-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Q27 intermittent failure in nightly automation #159
Comments
CUDA version mismatches would be the normal culprit, but I'm not sure how this would only show up intermittently. @jakirkham do you have any thoughts on this? |
cc @anaruse (in case you have thoughts here 🙂) |
Hmm, it might have something to do with the issue below. |
Based on the comment they recommend setting the QQ : Is the cc: @beckernick |
I don't think so (meaning it shouldn't use the pool). Edit: To expand on this, we would need an API in cuBLAS (that CuPy would then use), which would allow us to specify a chunk of memory to use for the initialization. |
Is there a way for us to trigger cuBLAS initialization early? If so, maybe we can do this as part of Dask-CUDA startup. |
Thanks for the pointer @anaruse @jakirkham . Let's look into triggering this early and/or perhaps reserving memory. |
@anaruse @jakirkham , do you think there is any downside/risk of initializing a handle to the cuBLAS library context and then essentially "throwing it away" before we do anything else on the cluster? I.e., running something like this on every Dask worker? def init_cublas():
from cupy.cuda import cublas
cublas.create() # allocates 64MB of GPU memory and returns a handle
return None
client.run(init_cublas) |
Initializing the cuBLAS context beforehand wouldn't necessarily change the allocation dynamics of the workload that triggers RMM to grow the pool to just at the edge of total capacity. Without visibility into that chunk of memory, this might still be a risk, right? Perhaps in combination with |
Idk about creating cuBLAS handle like that. The library may expect us to do cleanup. Not sure what happens if we don't do that cleanup. That said, maybe we can do some warmup step (like matrix multiplication), which would get CuPy to initialize cuBLAS. Admittedly that's a bit hacky, but perhaps workable. @anaruse may have a better suggestion |
I poked around the CuPy code and think something like this might work. Should add CuPy takes care of the cleanup of the handle in this case In [1]: import cupy
In [2]: cupy.cuda.device.get_cublas_handle()
Out[2]: 94517515719584
In [3]: cupy.cuda.device.get_cublas_handle()
Out[3]: 94517515719584 The first run is a bit slow (as it allocates the handle), but the second one is a bit faster (as it is cached). Note the pointer returned as a Python We could call this with |
Mentioned in an edit in the RMM thread, but it seems spaCy just uses CuPy for cuBLAS and doesn't use cuBLAS directly. So I think that initialization step should be sufficient |
This is the intermittent
CUBLAS_STATUS_NOT_INITIALIZED
error that I thought was in Q28 @VibhuJawaThe text was updated successfully, but these errors were encountered: