Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error serializing CuPy==12.0.0 array #1174

Closed
pentschev opened this issue May 15, 2023 · 6 comments · Fixed by #1191
Closed

Error serializing CuPy==12.0.0 array #1174

pentschev opened this issue May 15, 2023 · 6 comments · Fixed by #1191

Comments

@pentschev
Copy link
Member

pentschev commented May 15, 2023

With the recent change of cuDF to require CuPy 12 a new issue has shown up in serialization tests, it doesn't occur with cupy=11.6.0 and it passes after the first attempt to serialize a CuPy array in the same process. The following is a workaround for the tests:

diff --git a/dask_cuda/tests/test_device_host_file.py b/dask_cuda/tests/test_device_host_file.py
index 4a48079..fc0b4f2 100644
--- a/dask_cuda/tests/test_device_host_file.py
+++ b/dask_cuda/tests/test_device_host_file.py
@@ -51,7 +51,11 @@ def test_device_host_file_short(
     random.shuffle(full)

     for k, v in full:
-        dhf[k] = v
+        try:
+            dhf[k] = v
+        except TypeError as e:
+            print("{e=}, retrying")
+            dhf[k] = v

     random.shuffle(full)

However, only the first parametrized test fails even without the above with all further instances passing. This hints at some change on how CuPy loads and Dask sees that, probably the registration of CuPy serializer only happens during the first serialization attempt.

@wence-
Copy link
Contributor

wence- commented May 15, 2023

The tests are run with warnings as errors. In cupy 12, import cupy.cusparse produces a DeprecationWarning (it needs to be imported from cupyx now).

So this distributed patch fixes things:

diff --git a/distributed/protocol/cupy.py b/distributed/protocol/cupy.py
index dfc19a34..4b16a8a1 100644
--- a/distributed/protocol/cupy.py
+++ b/distributed/protocol/cupy.py
@@ -68,7 +68,11 @@ def dask_deserialize_cupy_ndarray(header, frames):
 
 
 try:
-    from cupy.cusparse import MatDescriptor
+    from packaging.version import Version
+    if Version(cupy.__version__) >= Version("12"):
+        from cupyx.cusparse import MatDescriptor
+    else:
+        from cupy.cusparse import MatDescriptor
     from cupyx.scipy.sparse import spmatrix
 except ImportError:
     MatDescriptor = None

@wence-
Copy link
Contributor

wence- commented May 15, 2023

dask/distributed#7836

@pentschev
Copy link
Member Author

Thanks @wence- , that fixes things. However, we're still pinning to distributed=2023.3.2.1 so we'll need to work around that test for now. I'll implement the patch from the description for distributed<=2023.5.0.

@pentschev
Copy link
Member Author

I opened #1175 to workaround the issue for now.

rapids-bot bot pushed a commit that referenced this issue May 15, 2023
As discussed in #1174, we must workaround test failures until Distributed can be unpinned.

Authors:
  - Peter Andreas Entschev (https://github.com/pentschev)

Approvers:
  - Lawrence Mitchell (https://github.com/wence-)

URL: #1175
@jakirkham
Copy link
Member

Now that Dask & Distributed 2023.5.1 are out (and have the fix). What are the next steps here?

@pentschev
Copy link
Member Author

I opened #1191 to remove the compatibility code introduced in #1175 and #1190 .

@rapids-bot rapids-bot bot closed this as completed in #1191 Jun 7, 2023
rapids-bot bot pushed a commit that referenced this issue Jun 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants