`StridedMemoryView` fails with Jax arrays #285

leofang · 2024-12-11T22:41:42Z

@yangcal was trying the StridedMemoryView but it doesn’t seem to work for jax array, at least with cuda12.2. Below is the script and the error log:

import cupy as cp
import numpy as np
import jax.numpy as jnp

from cuda.core.experimental.utils import args_viewable_as_strided_memory

@args_viewable_as_strided_memory((0,))
def parse_tensor(arr):
    view = arr.view(-1)
    print(type(arr), type(view))
    print(f"shape={view.shape}")
    print(f"strides={view.strides}")

for module in (np, cp, jnp):
    arr = module.eye(2)
    print(f"module={module.__name__}")
    parse_tensor(arr)

Error Log:

E1211 13:52:21.674104 2256128 ptx_compiler_helpers.cc:71] *** WARNING *** Invoking ptxas with version 12.2.140, which corresponds to a CUDA version <=12.6.2. CUDA versions 12.x.y up to and including 12.6.2 miscompile certain edge cases around clamping.
Please upgrade to CUDA 12.6.3 or newer.
module=jax.numpy
Traceback (most recent call last):
  File "/home/scratch.yangg_sw/software/cuda-python/cuda_core/examples/tmp.py", line 18, in <module>
    parse_tensor(arr)
  File "cuda/core/experimental/_memoryview.pyx", line 372, in cuda.core.experimental._memoryview.args_viewable_as_strided_memory.wrapped_func_with_indices.wrapped_func
  File "/home/scratch.yangg_sw/software/cuda-python/cuda_core/examples/tmp.py", line 10, in parse_tensor
    view = arr.view(-1)
           ^^^^^^^^^^^^
  File "cuda/core/experimental/_memoryview.pyx", line 146, in cuda.core.experimental._memoryview._StridedMemoryViewProxy.view
  File "cuda/core/experimental/_memoryview.pyx", line 148, in cuda.core.experimental._memoryview._StridedMemoryViewProxy.view
  File "cuda/core/experimental/_memoryview.pyx", line 180, in cuda.core.experimental._memoryview.view_as_dlpack
  File "/home/Self/marie/miniconda3/envs/jax/lib/python3.12/site-packages/jax/_src/array.py", line 446, in __dlpack__
    return to_dlpack(self, stream=stream,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/Self/marie/miniconda3/envs/jax/lib/python3.12/site-packages/jax/_src/dlpack.py", line 134, in to_dlpack
    return _to_dlpack(
           ^^^^^^^^^^^
  File "/home/Self/marie/miniconda3/envs/jax/lib/python3.12/site-packages/jax/_src/dlpack.py", line 65, in _to_dlpack
    return xla_client._xla.buffer_to_dlpack_managed_tensor(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
jaxlib.xla_extension.XlaRuntimeError: INTERNAL: CUDA error: : CUDA_ERROR_INVALID_HANDLE: invalid resource handle

Is it a semantics error on my side or cuda version issue?

The text was updated successfully, but these errors were encountered:

yangcal · 2024-12-12T00:15:00Z

On a side note, while args_viewable_as_strided_memory is a very helpful decorator, I think it would be nice if there are APIs to allow users to directly construct StridedMemoryView instances from ndarray objects of various packages. This offers more freedom and user can actually write their own decorators.

leofang · 2024-12-12T00:43:11Z

I think it would be nice if there are APIs to allow users to directly construct StridedMemoryView instances from ndarray objects of various packages.

Check out StridedMemoryView docstring, it is supported! But we want to encourage the decorator use case, because it allows a scoped access (as defined by the decorated function) and not having the view dangling forever.

ksimpson-work · 2024-12-12T17:12:45Z

This appears to be an issue specific to Jax's implementation of ArrayImpl(basearray.Array).__dlpack__. I am going to keep digging to get a more specific diagnosis, but I am wondering at which point I should call it a Jax bug and move on @leofang

leofang · 2024-12-12T17:15:47Z

Could you share what you found here? Then we can determine if a bug to Jax is needed or not, or we just need to work around it, or it's simply a user error.

ksimpson-work · 2024-12-12T17:24:11Z

For context I have repro'd this locally using cuda 12.6 and jax.numpy for cuda12.

Using the same logic with a cupy array works.
Jax is failing here https://github.com/jax-ml/jax/blob/99d675ac25f583c0c2e61631355e0d11ba3abf18/jax/_src/dlpack.py#L65
It would appear that the CUDA_ERROR_INVALID_HANDLE is raised because -1 is being passed all the way through. If I create a stream and pass that as the handle the call hangs and then encurs a segmentation fault. I ran this test quickly just to see the behaviour of a stream value != 1. I may be responsible for the segfault.. I need to look into that as well.

Next I am planning on finding the implementation of xla_client._xla.buffer_to_dlpack_managed_tensor to see what is going on, and also possibly build cupy locally so I can investigate how cupy is handling dlpack() and the stream argument

leofang · 2024-12-12T17:40:53Z

@yangcal I think passing -1 in view = arr.view(-1) could be problematic for Jax arrays. In a normal (stream-ordered) CUDA program you should know which stream to use. Could you modify your toy code and see if it works?

Hi @jakevdp we hit a possibly non-compliant DLPack implementation in either Jax or XLA. stream=-1 is a valid input as described in the __dlpack__ docs. Could you guide us which repo is the best to raise this issue? Maybe https://github.com/openxla/xla/?

leofang · 2024-12-12T17:49:13Z

and also possibly build cupy locally so I can investigate how cupy is handling dlpack() and the stream argument

FWIW, -1 is handled around here (the logic is a bit convoluted)

leofang · 2024-12-12T18:33:36Z

There are 3 2 bugs in Jax/XLA that cause DLPack exchange not working:

Jax/XLA does not support DLPack 1.0, but this path exists and returns a (legacy, pre-1.0) capsule.
- The returned capsule, because it's legacy, has the old name (dltensor)
In XLA, stream is passed as-is, without checking if it's -1

So in short even if Yang tested this

I think passing -1 in view = arr.view(-1) could be problematic for Jax arrays. In a normal (stream-ordered) CUDA program you should know which stream to use. Could you modify your toy code and see if it works?

it would not work because we'd then reach an AssertionError (I tried).

ksimpson-work · 2024-12-12T18:37:26Z

+1. All of that is consistent with what I have found.

leofang · 2024-12-12T19:28:15Z

@yangcal could you check if this patch works on your side? Let's work around the Jax bugs assuming they can be fixed in the next version (cc: @jakevdp for vis)

diff --git a/cuda_core/cuda/core/experimental/_memoryview.pyx b/cuda_core/cuda/core/experimental/_memoryview.pyx
index d8eba46..1d4d977 100644
--- a/cuda_core/cuda/core/experimental/_memoryview.pyx
+++ b/cuda_core/cuda/core/experimental/_memoryview.pyx
@@ -7,6 +7,7 @@ cimport cython
 from ._dlpack cimport *
 
 import functools
+import importlib.metadata
 from typing import Any, Optional
 
 from cuda import cuda
@@ -181,6 +182,13 @@ cdef StridedMemoryView view_as_dlpack(obj, stream_ptr, view=None):
             stream=stream_ptr,
             max_version=(DLPACK_MAJOR_VERSION, DLPACK_MINOR_VERSION))
         versioned = True
+        try:
+            if "jax.numpy" in str(obj.__array_namespace__()):
+                ver = tuple(int(i) for i in importlib.metadata.version("jax").split("."))
+                if ver <= (0, 4, 38):
+                    versioned = False
+        except AttributeError:
+            pass
     except TypeError:
         capsule = obj.__dlpack__(
             stream=stream_ptr)

You'd also need to avoid passing -1 as the stream, like this

import cupy as cp
import numpy as np
import jax.numpy as jnp

from cuda.core.experimental import Device
from cuda.core.experimental.utils import args_viewable_as_strided_memory

@args_viewable_as_strided_memory((0,))
def parse_tensor(arr, s, mod):
    view = arr.view(s.handle if mod is not np else -1)
    print(type(arr), type(view))
    print(f"shape={view.shape}")
    print(f"strides={view.strides}")


dev = Device(0)
dev.set_current()
s = dev.create_stream()
for module in (np, cp, jnp):
    arr = module.eye(2)
    parse_tensor(arr, s, module)

leofang · 2024-12-12T19:42:53Z

It could also be that the bug is on our side... Ex: on the max_version argument the doc says

This means the consumer must verify the version even when max_version is passed.

in which case we're the consumer (to create a view).

leofang · 2024-12-12T20:08:02Z

It could also be that the bug is on our side...

See #292.

ksimpson-work · 2024-12-13T21:17:50Z

I have locally verified that the #292 solves the bug on our side. Jax still does not handle -1 correctly. Consider passing a stream handle explicitly until they (hopefully) resolve on their end. Thanks for bringing this to our attention @yangcal!

yangcal · 2024-12-17T00:54:34Z

sorry for the slow response. I can confirm that the fix works and a follow-up question: it seems like the script above requires knowledge on which stream is used to populate the operand on device or any random stream on that device would work? I tried a random integer number as the handle input and encounters segfault, but I also tried generate a new stream object with Device.create_stream() and the script works. So the script just needs a dummy stream/handle?

leofang · 2024-12-17T01:36:09Z

Hi Yang, no it's not any random stream, it's the stream that you will be using to access the content in the decorated function (assuming you're using the decorator). We will order it properly after the stream on which the data is being generated/processed.

leofang assigned ksimpson-work Dec 11, 2024

leofang added this to the cuda.core beta 2 milestone Dec 11, 2024

leofang added triage Needs the team's attention P0 High priority - Must do! cuda.core Everything related to the cuda.core module labels Dec 11, 2024

leofang mentioned this issue Dec 12, 2024

Fix StridedMemoryView by deferring the check for whether a capsule is versioned #292

Merged

leofang added bug Something isn't working and removed triage Needs the team's attention labels Dec 12, 2024

leofang closed this as completed in #292 Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`StridedMemoryView` fails with Jax arrays #285

`StridedMemoryView` fails with Jax arrays #285

leofang commented Dec 11, 2024

yangcal commented Dec 12, 2024

leofang commented Dec 12, 2024 •

edited

Loading

ksimpson-work commented Dec 12, 2024 •

edited

Loading

leofang commented Dec 12, 2024

ksimpson-work commented Dec 12, 2024 •

edited

Loading

leofang commented Dec 12, 2024

leofang commented Dec 12, 2024 •

edited

Loading

leofang commented Dec 12, 2024 •

edited

Loading

ksimpson-work commented Dec 12, 2024

leofang commented Dec 12, 2024 •

edited

Loading

leofang commented Dec 12, 2024

leofang commented Dec 12, 2024

ksimpson-work commented Dec 13, 2024

yangcal commented Dec 17, 2024

leofang commented Dec 17, 2024

StridedMemoryView fails with Jax arrays #285

StridedMemoryView fails with Jax arrays #285

Comments

leofang commented Dec 11, 2024

yangcal commented Dec 12, 2024

leofang commented Dec 12, 2024 • edited Loading

ksimpson-work commented Dec 12, 2024 • edited Loading

leofang commented Dec 12, 2024

ksimpson-work commented Dec 12, 2024 • edited Loading

leofang commented Dec 12, 2024

leofang commented Dec 12, 2024 • edited Loading

leofang commented Dec 12, 2024 • edited Loading

ksimpson-work commented Dec 12, 2024

leofang commented Dec 12, 2024 • edited Loading

leofang commented Dec 12, 2024

leofang commented Dec 12, 2024

ksimpson-work commented Dec 13, 2024

yangcal commented Dec 17, 2024

leofang commented Dec 17, 2024

`StridedMemoryView` fails with Jax arrays #285

`StridedMemoryView` fails with Jax arrays #285

leofang commented Dec 12, 2024 •

edited

Loading

ksimpson-work commented Dec 12, 2024 •

edited

Loading

ksimpson-work commented Dec 12, 2024 •

edited

Loading

leofang commented Dec 12, 2024 •

edited

Loading

leofang commented Dec 12, 2024 •

edited

Loading

leofang commented Dec 12, 2024 •

edited

Loading