Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement comprehensive set of ufuncs #256

Closed
mrocklin opened this issue Sep 25, 2018 · 9 comments
Closed

Implement comprehensive set of ufuncs #256

mrocklin opened this issue Sep 25, 2018 · 9 comments
Labels
dask Dask issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API.

Comments

@mrocklin
Copy link
Collaborator

There are a variety of ufuncs that are not yet implemented on pygdf. Operations like exp or round don't seem to be available. Some of these come through Numpy:

np.exp(df.x)

And some of them are methods:

df.round(...)
@mrocklin
Copy link
Collaborator Author

One approach to this would be to combine the __array_ufunc__ protocol with Numba's ability to compile Numpy functions. I imagine that a solution might look something like the following:

    def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
        if method == '__call__':
            numba_gpu_ufunc = numba.gpu_vectorize(ufunc)
            input_arrays = [input.to_gpu_array() if isinstance(pygdf.Series) else input for input in inputs]
            data  = numba_gpu_ufunc(*input_arrays, **kwargs)
            return pygdf.Series(data, index=...)
        else:
            return NotImplemented

Where I'm hoping that a function like numba.gpu_vectorize exists and works as I've expressed above.

cc @sklam is something like this possible?

@kkraus14 kkraus14 added the feature request New feature or request label Sep 26, 2018
@kkraus14
Copy link
Collaborator

Even if possible, things like round, exp, and other typical functions are things we will want implementations that don't depend on JIT compilation for as the time to compile the functions is non-trivial in many cases and we can often create more optimized implementations.

@kkraus14
Copy link
Collaborator

That being said, if we could do this for functionality first and optimize later that would be amazing.

@mrocklin
Copy link
Collaborator Author

Here is a list of Numpy ufuncs

In [1]: import numpy as np

In [2]: for x in dir(np):
   ...:     if isinstance(getattr(np, x), np.ufunc):
   ...:         print(x)
   ...:         
abs
absolute
add
arccos
arccosh
arcsin
arcsinh
arctan
arctan2
arctanh
bitwise_and
bitwise_not
bitwise_or
bitwise_xor
cbrt
ceil
conj
conjugate
copysign
cos
cosh
deg2rad
degrees
divide
divmod
equal
exp
exp2
expm1
fabs
float_power
floor
floor_divide
fmax
fmin
fmod
frexp
gcd
greater
greater_equal
heaviside
hypot
invert
isfinite
isinf
isnan
isnat
lcm
ldexp
left_shift
less
less_equal
log
log10
log1p
log2
logaddexp
logaddexp2
logical_and
logical_not
logical_or
logical_xor
maximum
minimum
mod
modf
multiply
negative
nextafter
not_equal
positive
power
rad2deg
radians
reciprocal
remainder
right_shift
rint
sign
signbit
sin
sinh
spacing
sqrt
square
subtract
tan
tanh
true_divide
trunc

@mrocklin
Copy link
Collaborator Author

mrocklin commented Sep 26, 2018

I think that we should implement __array_ufunc__ regardless. Someone might arrive with a custom ufunc and if Numba is able to compile it then we should probably do that if no other solution exists.

It sounds like the solution probably looks like the following:

    def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
        if method == '__call__':
            try:
                func = getattr(pygdf, ufunc.__name__)
            except AttributeError:
                func = numba.gpu_vectorize(ufunc)
            input_arrays = [input.to_gpu_array() if isinstance(pygdf.Series) else input for input in inputs]
            data  = func(*input_arrays, **kwargs)
            return pygdf.Series(data, index=...)
        else:
            return NotImplemented

(There are many things wrong with the above implementation. It's just there for demonstration.

@mrocklin
Copy link
Collaborator Author

@sklam @seibert any thoughts on this? My guess is that this is easy-ish to do with numba, and possibly a nice win.

@seibert
Copy link
Contributor

seibert commented Sep 28, 2018

Roughly, this requires generating the scalar body of the ufunc (string templating is usually what we have to do) from the ufunc name and then wrapping with @vectorize(target='gpu'). Not all these operations have Numba-known equivalents on the GPU, so you would need to implement them.

Another way to bootstrap could be with CuPy, which has implemented most of these functions with exact NumPy-like names and signatures.

@kkraus14 kkraus14 added libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API. labels Dec 29, 2018
@mrocklin
Copy link
Collaborator Author

It looks like we could use this for np.sqrt at least for groupby std. cc @quasiben

@kkraus14
Copy link
Collaborator

kkraus14 commented Oct 3, 2019

Considering most of these are now implemented going to close this issue and encourage users to raise an issue for an individual function if it's not implemented already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dask Dask issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API.
Projects
None yet
Development

No branches or pull requests

3 participants