Implement comprehensive set of ufuncs #256

mrocklin · 2018-09-25T23:08:19Z

There are a variety of ufuncs that are not yet implemented on pygdf. Operations like exp or round don't seem to be available. Some of these come through Numpy:

np.exp(df.x)

And some of them are methods:

df.round(...)

The text was updated successfully, but these errors were encountered:

mrocklin · 2018-09-25T23:13:52Z

One approach to this would be to combine the __array_ufunc__ protocol with Numba's ability to compile Numpy functions. I imagine that a solution might look something like the following:

    def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
        if method == '__call__':
            numba_gpu_ufunc = numba.gpu_vectorize(ufunc)
            input_arrays = [input.to_gpu_array() if isinstance(pygdf.Series) else input for input in inputs]
            data  = numba_gpu_ufunc(*input_arrays, **kwargs)
            return pygdf.Series(data, index=...)
        else:
            return NotImplemented

Where I'm hoping that a function like numba.gpu_vectorize exists and works as I've expressed above.

cc @sklam is something like this possible?

kkraus14 · 2018-09-26T00:09:30Z

Even if possible, things like round, exp, and other typical functions are things we will want implementations that don't depend on JIT compilation for as the time to compile the functions is non-trivial in many cases and we can often create more optimized implementations.

kkraus14 · 2018-09-26T00:09:52Z

That being said, if we could do this for functionality first and optimize later that would be amazing.

mrocklin · 2018-09-26T00:19:02Z

Here is a list of Numpy ufuncs

In [1]: import numpy as np

In [2]: for x in dir(np):
   ...:     if isinstance(getattr(np, x), np.ufunc):
   ...:         print(x)
   ...:         
abs
absolute
add
arccos
arccosh
arcsin
arcsinh
arctan
arctan2
arctanh
bitwise_and
bitwise_not
bitwise_or
bitwise_xor
cbrt
ceil
conj
conjugate
copysign
cos
cosh
deg2rad
degrees
divide
divmod
equal
exp
exp2
expm1
fabs
float_power
floor
floor_divide
fmax
fmin
fmod
frexp
gcd
greater
greater_equal
heaviside
hypot
invert
isfinite
isinf
isnan
isnat
lcm
ldexp
left_shift
less
less_equal
log
log10
log1p
log2
logaddexp
logaddexp2
logical_and
logical_not
logical_or
logical_xor
maximum
minimum
mod
modf
multiply
negative
nextafter
not_equal
positive
power
rad2deg
radians
reciprocal
remainder
right_shift
rint
sign
signbit
sin
sinh
spacing
sqrt
square
subtract
tan
tanh
true_divide
trunc

mrocklin · 2018-09-26T00:21:11Z

I think that we should implement __array_ufunc__ regardless. Someone might arrive with a custom ufunc and if Numba is able to compile it then we should probably do that if no other solution exists.

It sounds like the solution probably looks like the following:

    def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
        if method == '__call__':
            try:
                func = getattr(pygdf, ufunc.__name__)
            except AttributeError:
                func = numba.gpu_vectorize(ufunc)
            input_arrays = [input.to_gpu_array() if isinstance(pygdf.Series) else input for input in inputs]
            data  = func(*input_arrays, **kwargs)
            return pygdf.Series(data, index=...)
        else:
            return NotImplemented

(There are many things wrong with the above implementation. It's just there for demonstration.

mrocklin · 2018-09-28T17:39:59Z

@sklam @seibert any thoughts on this? My guess is that this is easy-ish to do with numba, and possibly a nice win.

seibert · 2018-09-28T20:18:06Z

Roughly, this requires generating the scalar body of the ufunc (string templating is usually what we have to do) from the ufunc name and then wrapping with @vectorize(target='gpu'). Not all these operations have Numba-known equivalents on the GPU, so you would need to implement them.

Another way to bootstrap could be with CuPy, which has implemented most of these functions with exact NumPy-like names and signatures.

mrocklin · 2019-02-21T21:00:10Z

It looks like we could use this for np.sqrt at least for groupby std. cc @quasiben

kkraus14 · 2019-10-03T02:13:04Z

Considering most of these are now implemented going to close this issue and encourage users to raise an issue for an individual function if it's not implemented already.

kkraus14 added the feature request New feature or request label Sep 26, 2018

kkraus14 added libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API. labels Dec 29, 2018

mrocklin added the dask Dask issue label Feb 26, 2019

mrocklin mentioned this issue Feb 26, 2019

[FEA] Square Root via __array_ufunc__ protocol #1055

Closed

kkraus14 closed this as completed Oct 3, 2019

beckernick mentioned this issue Aug 20, 2021

[FEA] Expand support for cupy universal functions (ufuncs) #9083

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement comprehensive set of ufuncs #256

Implement comprehensive set of ufuncs #256

mrocklin commented Sep 25, 2018

mrocklin commented Sep 25, 2018

kkraus14 commented Sep 26, 2018

kkraus14 commented Sep 26, 2018

mrocklin commented Sep 26, 2018

mrocklin commented Sep 26, 2018 •

edited

Loading

mrocklin commented Sep 28, 2018

seibert commented Sep 28, 2018

mrocklin commented Feb 21, 2019

kkraus14 commented Oct 3, 2019

Implement comprehensive set of ufuncs #256

Implement comprehensive set of ufuncs #256

Comments

mrocklin commented Sep 25, 2018

mrocklin commented Sep 25, 2018

kkraus14 commented Sep 26, 2018

kkraus14 commented Sep 26, 2018

mrocklin commented Sep 26, 2018

mrocklin commented Sep 26, 2018 • edited Loading

mrocklin commented Sep 28, 2018

seibert commented Sep 28, 2018

mrocklin commented Feb 21, 2019

kkraus14 commented Oct 3, 2019

mrocklin commented Sep 26, 2018 •

edited

Loading