Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Consider disabling managed memory in cudf.pandas on WSL2 #16551

Closed
vyasr opened this issue Aug 13, 2024 · 4 comments · Fixed by #16552
Closed

[BUG] Consider disabling managed memory in cudf.pandas on WSL2 #16551

vyasr opened this issue Aug 13, 2024 · 4 comments · Fixed by #16552
Assignees
Labels
bug Something isn't working

Comments

@vyasr
Copy link
Contributor

vyasr commented Aug 13, 2024

Describe the bug
cudf.pandas turns on a managed pool allocator by default to support larger-than-memory workloads. However, this does not work on WSL2 because UVM on Windows does not actually allow oversubscription. Moreover, using UVM could result in far worse slowdowns on WSL2 than observed on Windows due to how it is implemented on that platform.

Expected behavior
We should consider changing cudf.pandas to only enable managed memory by default when oversubscription is properly supported. This can be done by querying the CUDA driver for the appropriate attribute. In addition, we should run some benchmarks to evaluate the relative performance impact of using managed memory on WSL2 in undersubscribed situations

@vyasr vyasr added the bug Something isn't working label Aug 13, 2024
@bdice
Copy link
Contributor

bdice commented Aug 14, 2024

I can confirm that currently WSL2 fails with cudf.pandas:

import cudf.pandas
cudf.pandas.install()  # Enables managed memory and prefetching
cudf.Series([1, 2, 3])  # Fails!

Traceback:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/coder/cudf/python/cudf/cudf/utils/performance_tracking.py", line 51, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/coder/cudf/python/cudf/cudf/core/series.py", line 656, in __init__
    column = as_column(
             ^^^^^^^^^^
  File "/home/coder/cudf/python/cudf/cudf/core/column/column.py", line 2241, in as_column
    return as_column(arbitrary, nan_as_null=nan_as_null, dtype=dtype)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/coder/cudf/python/cudf/cudf/core/column/column.py", line 1868, in as_column
    col = ColumnBase.from_arrow(arbitrary)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/coder/cudf/python/cudf/cudf/core/column/column.py", line 364, in from_arrow
    result = libcudf.interop.from_arrow(data)[0]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/coder/.conda/envs/rapids/lib/python3.11/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "interop.pyx", line 162, in cudf._lib.interop.from_arrow
  File "/home/coder/.conda/envs/rapids/lib/python3.11/functools.py", line 909, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "interop.pyx", line 142, in cudf._lib.pylibcudf.interop._from_arrow_table
RuntimeError: CUDA error at: /home/coder/.conda/envs/rapids/include/rmm/prefetch.hpp:53: cudaErrorInvalidDevice invalid device ordinal

This means that all cudf.pandas calls will fall back to CPU, and cudf.pandas is effectively just pandas on WSL2. This affects the 24.08 release, too.

In #16552, I have a fix. It works the same as in prior releases, by using a normal pool resource rather than a managed pool, and not enabling prefetching on WSL2 (it detects whether concurrent managed access between CPU/GPU is supported).

@bdice
Copy link
Contributor

bdice commented Aug 14, 2024

@vyasr Given that cudf.pandas is broken (CPU only) on WSL2 without #16552, should we consider a hotfix for 24.08?

@bdice bdice self-assigned this Aug 14, 2024
@bdice
Copy link
Contributor

bdice commented Aug 14, 2024

The CUDA docs state:

If dstDevice is a GPU, then the device attribute cudaDevAttrConcurrentManagedAccess must be non-zero.

I suspect that's why we get a cudaErrorInvalidDevice here. In RMM, we already handle the case of attempting to prefetch non-managed memory returning cudaErrorInvalidValue. Should we add similar logic to ignore errors from prefetching on devices that do not support managed memory? That would make the RMM API always succeed, as "try to prefetch if possible." Or should we instead require developers to skip all prefetching code if managed memory is not supported? (This is what I implemented in #16552, because I do not enable the experimental prefetching options if managed memory is not supported.)

@vyasr
Copy link
Contributor Author

vyasr commented Aug 14, 2024

We had more discussion offline, so summarizing here:

  • Yes, we will be hotfixing 24.08.
  • We're not going to make any changes to rmm/prefetching internals in the hotfix, just disable managed memory whenever we're on a system where it's not supported.
  • We'll consider more updates to improve testing of this kind of issue in 24.10.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants