-
Notifications
You must be signed in to change notification settings - Fork 933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add __cuda_array_interface__
to Buffer
#4023
Conversation
To make it easier to coerce `Buffer` to Numba or CuPy arrays as needed, add the `__cuda_array_interface__` to `Buffer`.
fe2f7a3
to
e778560
Compare
This winds up being quite a bit faster to build. As that is important to us for serialization purposes, switch Numba for CuPy in this case. Note that when it comes to copying the data back to host, other things start dominating the time spent.
e778560
to
aeb3978
Compare
To simplify generic handling of GPU array-like data, coerce it to CuPy first (known to be fast) and then coerce it to NumPy (same time as using Numba). This should keep things flexible when working with other GPU arrays while maintaining performance.
Seems that `gdf_dtype_from_dtype` isn't able to handle unsigned values at the moment. So switch back to signed 1-byte integers for now. Should clear up some failures.
Works around an issue where a CuPy scalar may be passed. Previously `__array__` would be called by NumPy, but CuPy will throw there. So this calls `int` on the object first. If it is a CuPy array, it will actually coerce itself to a Python `int`. Then we can coerce it to a NumPy `int32` value.
@@ -377,7 +377,7 @@ def test_reflected_ops_scalar(func, dtype, obj_class): | |||
ps_result = func(random_series) | |||
|
|||
# verify | |||
np.testing.assert_allclose(ps_result, gs_result) | |||
utils.assert_eq(ps_result, gs_result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We seem to be hitting some sort of edge case here. Tried switching to the utility function used elsewhere, but it seems to have issues comparing. Have a rough idea of the issue, but additional pointers would be welcome. 🙂
______________ test_reflected_ops_scalar[<lambda>-int160-Series] _______________
func = <function <lambda> at 0x7f531193f320>, dtype = <class 'numpy.int16'>
obj_class = 'Series'
@pytest.mark.parametrize("obj_class", ["Series", "Index"])
@pytest.mark.parametrize("func, dtype", list(product(_reflected_ops, _dtypes)))
def test_reflected_ops_scalar(func, dtype, obj_class):
# create random series
np.random.seed(12)
random_series = utils.gen_rand(dtype, 100, low=10)
# gpu series
gs = Series(random_series)
# class typing
if obj_class == "Index":
gs = as_index(gs)
gs_result = func(gs)
# class typing
if obj_class == "Index":
gs = Series(gs)
# pandas
ps_result = func(random_series)
# verify
> utils.assert_eq(ps_result, gs_result)
cudf/tests/test_binops.py:380:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = 0 True
1 True
2 True
3 True
4 True
...
95 True
96 True
97 True
98 True
99 True
Length: 100, dtype: bool
def __nonzero__(self):
raise ValueError(
"The truth value of a {0} is ambiguous. "
"Use a.empty, a.bool(), a.item(), a.any() or a.all().".format(
> self.__class__.__name__
)
)
E ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
/conda/envs/gdf/lib/python3.7/site-packages/pandas/core/generic.py:1555: ValueError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing we're somehow getting down to this condition: https://github.com/rapidsai/cudf/blob/branch-0.13/python/cudf/cudf/tests/utils.py#L77
What are the types of ps_result
and gs_result
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the slow reply.
Actually it is getting stuck here. The comparison works, but the result is a Pandas Series, which can't be converted to a bool
.
Backing up a bit, gs_result
is a cuDF Series. However ps_result
appears to be a NumPy array. Is the latter expected?
FWIW it appears this has always been the case. Not sure what caused us to now stumble into that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should add forcing ps_result
to a Pandas Series fixes the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know what op this is that's causing the issue? It's a bit surprising that your PR changes are causing this to change behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried locally with just lambda x: 1 + x
, but would guess any of them have this issue.
Are we sure this is suppose to be gen_rand
and not gen_rand_series
though? ( #4091 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤷♂ not sure but if gen_rand_series
fixes it that sounds good to me. This test is already pretty janky to test both indexes and series.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah me neither honestly. So I could easily be wrong 🙂
Not that we should pick up more things here, but would it makes sense to break this test into two separate ones to handle Series and Index? If so, happy to raise an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly in the future, but the level of work for the reward is low right now. All of our index binaryops are implemented by casting to/from Series so once we fix that we can possibly address this.
If the underlying object contains a singleton value, treat it as if it is a 1-D array with 1 value. Should fix some issues where a 0-D array is passed.
Use an RMM-backed CuPy array instead of `rmm.device_array`.
__cuda_array_interface__
to Buffer
__cuda_array_interface__
to Buffer
Given the |
Adds the
__cuda_array_interface__
toBuffer
. This simplifies its coercion to Numba arrays, which is then leveraged to get away from some legacy code in RMM around array handling.