Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add __cuda_array_interface__ to Buffer #4023

Closed
wants to merge 27 commits into from

Conversation

jakirkham
Copy link
Member

Adds the __cuda_array_interface__ to Buffer. This simplifies its coercion to Numba arrays, which is then leveraged to get away from some legacy code in RMM around array handling.

To make it easier to coerce `Buffer` to Numba or CuPy arrays as needed,
add the `__cuda_array_interface__` to `Buffer`.
@jakirkham jakirkham requested a review from a team as a code owner January 31, 2020 07:16
@jakirkham jakirkham force-pushed the add_buf_cai branch 2 times, most recently from fe2f7a3 to e778560 Compare January 31, 2020 07:39
This winds up being quite a bit faster to build. As that is important to
us for serialization purposes, switch Numba for CuPy in this case. Note
that when it comes to copying the data back to host, other things start
dominating the time spent.
To simplify generic handling of GPU array-like data, coerce it to CuPy
first (known to be fast) and then coerce it to NumPy (same time as using
Numba). This should keep things flexible when working with other GPU
arrays while maintaining performance.
@kkraus14 kkraus14 added 2 - In Progress Currently a work in progress Python Affects Python cuDF API. labels Jan 31, 2020
Seems that `gdf_dtype_from_dtype` isn't able to handle unsigned values
at the moment. So switch back to signed 1-byte integers for now. Should
clear up some failures.
kkraus14
kkraus14 previously approved these changes Jan 31, 2020
@kkraus14 kkraus14 added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 2 - In Progress Currently a work in progress labels Jan 31, 2020
Works around an issue where a CuPy scalar may be passed. Previously
`__array__` would be called by NumPy, but CuPy will throw there. So this
calls `int` on the object first. If it is a CuPy array, it will actually
coerce itself to a Python `int`. Then we can coerce it to a NumPy
`int32` value.
@@ -377,7 +377,7 @@ def test_reflected_ops_scalar(func, dtype, obj_class):
ps_result = func(random_series)

# verify
np.testing.assert_allclose(ps_result, gs_result)
utils.assert_eq(ps_result, gs_result)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We seem to be hitting some sort of edge case here. Tried switching to the utility function used elsewhere, but it seems to have issues comparing. Have a rough idea of the issue, but additional pointers would be welcome. 🙂

______________ test_reflected_ops_scalar[<lambda>-int160-Series] _______________

func = <function <lambda> at 0x7f531193f320>, dtype = <class 'numpy.int16'>
obj_class = 'Series'

    @pytest.mark.parametrize("obj_class", ["Series", "Index"])
    @pytest.mark.parametrize("func, dtype", list(product(_reflected_ops, _dtypes)))
    def test_reflected_ops_scalar(func, dtype, obj_class):
        # create random series
        np.random.seed(12)
        random_series = utils.gen_rand(dtype, 100, low=10)
    
        # gpu series
        gs = Series(random_series)
    
        # class typing
        if obj_class == "Index":
            gs = as_index(gs)
    
        gs_result = func(gs)
    
        # class typing
        if obj_class == "Index":
            gs = Series(gs)
    
        # pandas
        ps_result = func(random_series)
    
        # verify
>       utils.assert_eq(ps_result, gs_result)

cudf/tests/test_binops.py:380: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = 0     True
1     True
2     True
3     True
4     True
      ... 
95    True
96    True
97    True
98    True
99    True
Length: 100, dtype: bool

    def __nonzero__(self):
        raise ValueError(
            "The truth value of a {0} is ambiguous. "
            "Use a.empty, a.bool(), a.item(), a.any() or a.all().".format(
>               self.__class__.__name__
            )
        )
E       ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

/conda/envs/gdf/lib/python3.7/site-packages/pandas/core/generic.py:1555: ValueError

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing we're somehow getting down to this condition: https://github.com/rapidsai/cudf/blob/branch-0.13/python/cudf/cudf/tests/utils.py#L77

What are the types of ps_result and gs_result?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the slow reply.

Actually it is getting stuck here. The comparison works, but the result is a Pandas Series, which can't be converted to a bool.

Backing up a bit, gs_result is a cuDF Series. However ps_result appears to be a NumPy array. Is the latter expected?

FWIW it appears this has always been the case. Not sure what caused us to now stumble into that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should add forcing ps_result to a Pandas Series fixes the issue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know what op this is that's causing the issue? It's a bit surprising that your PR changes are causing this to change behavior.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried locally with just lambda x: 1 + x, but would guess any of them have this issue.

Are we sure this is suppose to be gen_rand and not gen_rand_series though? ( #4091 )

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤷‍♂ not sure but if gen_rand_series fixes it that sounds good to me. This test is already pretty janky to test both indexes and series.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah me neither honestly. So I could easily be wrong 🙂

Not that we should pick up more things here, but would it makes sense to break this test into two separate ones to handle Series and Index? If so, happy to raise an issue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly in the future, but the level of work for the reward is low right now. All of our index binaryops are implemented by casting to/from Series so once we fix that we can possibly address this.

If the underlying object contains a singleton value, treat it as if it
is a 1-D array with 1 value. Should fix some issues where a 0-D array is
passed.
@kkraus14 kkraus14 self-requested a review February 5, 2020 06:14
@kkraus14 kkraus14 added 2 - In Progress Currently a work in progress and removed 5 - Ready to Merge Testing and reviews complete, ready to merge labels Feb 5, 2020
@jakirkham jakirkham changed the title Add __cuda_array_interface__ to Buffer Use CuPy for array views Feb 10, 2020
@jakirkham jakirkham changed the title Use CuPy for array views Add __cuda_array_interface__ to Buffer Feb 10, 2020
@jakirkham
Copy link
Member Author

Given the __cuda_array_interface__ bits are in and this now is mostly a PR about using CuPy, have extracted those bits into a fresh draft PR ( #4110 ). We can then consider how we want to proceed with that on its own merits.

@jakirkham jakirkham closed this Feb 10, 2020
@jakirkham jakirkham deleted the add_buf_cai branch February 10, 2020 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 - In Progress Currently a work in progress Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants