-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] PyTorch and RMM sharing memory pool #501
Comments
There was some internal discussion about a related issue that plagued 27 HF implimentation and it was suggested that a path forward can be:
|
Another idea that came up was using RMM within PyTorch possibly using an external memory allocator (as was done with CuPy and Numba) or possibly even direct usage (as has recently been done with XGBoost). Have filed this as issue ( pytorch/pytorch#43144 ). |
On this usage pattern it's worth looking at how CuPy did something similar. xref: pytorch/pytorch#33860 |
This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. |
This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d. |
This was closed by: #1168 , Can we close this ? |
Is your feature request related to a problem? Please describe.
Currently I'm running a streamz workflow that uses pytorch. I notice that I continue to encounter errors like below where pytorch is not able to allocate enough memory.
I'm wondering if pytorch and rmm are competing for memory and if so if there's a recommended way to manage
Describe the solution you'd like
If possible, for pytorch and rmm to potentially use the same memory pool. Or a recommended method to resolve this type of memory issue
Describe alternatives you've considered
None
Additional context
The streamz workflow end-to-end can be found here. In short summary, it first initializes a streamz worklfow that uses dask to read in data from kafka. Then processes that data using cyBERT inferencing which can be found here. cyBERT uses
cudf
for data pre-processing steps and a BERT model for inferencing. Then the processed data is published back to kafka.The text was updated successfully, but these errors were encountered: