-
Notifications
You must be signed in to change notification settings - Fork 913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Consider disabling managed memory in cudf.pandas on WSL2 #16551
Comments
I can confirm that currently WSL2 fails with import cudf.pandas
cudf.pandas.install() # Enables managed memory and prefetching
cudf.Series([1, 2, 3]) # Fails! Traceback: Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/coder/cudf/python/cudf/cudf/utils/performance_tracking.py", line 51, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/coder/cudf/python/cudf/cudf/core/series.py", line 656, in __init__
column = as_column(
^^^^^^^^^^
File "/home/coder/cudf/python/cudf/cudf/core/column/column.py", line 2241, in as_column
return as_column(arbitrary, nan_as_null=nan_as_null, dtype=dtype)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/coder/cudf/python/cudf/cudf/core/column/column.py", line 1868, in as_column
col = ColumnBase.from_arrow(arbitrary)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/coder/cudf/python/cudf/cudf/core/column/column.py", line 364, in from_arrow
result = libcudf.interop.from_arrow(data)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/coder/.conda/envs/rapids/lib/python3.11/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "interop.pyx", line 162, in cudf._lib.interop.from_arrow
File "/home/coder/.conda/envs/rapids/lib/python3.11/functools.py", line 909, in wrapper
return dispatch(args[0].__class__)(*args, **kw)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "interop.pyx", line 142, in cudf._lib.pylibcudf.interop._from_arrow_table
RuntimeError: CUDA error at: /home/coder/.conda/envs/rapids/include/rmm/prefetch.hpp:53: cudaErrorInvalidDevice invalid device ordinal This means that all In #16552, I have a fix. It works the same as in prior releases, by using a normal pool resource rather than a managed pool, and not enabling prefetching on WSL2 (it detects whether concurrent managed access between CPU/GPU is supported). |
The CUDA docs state:
I suspect that's why we get a |
We had more discussion offline, so summarizing here:
|
Describe the bug
cudf.pandas turns on a managed pool allocator by default to support larger-than-memory workloads. However, this does not work on WSL2 because UVM on Windows does not actually allow oversubscription. Moreover, using UVM could result in far worse slowdowns on WSL2 than observed on Windows due to how it is implemented on that platform.
Expected behavior
We should consider changing cudf.pandas to only enable managed memory by default when oversubscription is properly supported. This can be done by querying the CUDA driver for the appropriate attribute. In addition, we should run some benchmarks to evaluate the relative performance impact of using managed memory on WSL2 in undersubscribed situations
The text was updated successfully, but these errors were encountered: