Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Series apply method backed by masked UDFs #9217

Merged
merged 60 commits into from
Oct 1, 2021

Conversation

brandon-b-miller
Copy link
Contributor

@brandon-b-miller brandon-b-miller commented Sep 10, 2021

Depends on #9174

Adds Series.apply which applies a scalar UDF elementwise to the series data returning a new series. Null sensitive. Works in terms of our numba MaskedType extension type. Similar to pd.Series.apply.

@brandon-b-miller brandon-b-miller added feature request New feature or request 2 - In Progress Currently a work in progress numba Numba issue Python Affects Python cuDF API. labels Sep 10, 2021
@brandon-b-miller brandon-b-miller removed gpuCI libcudf Affects libcudf (C++/CUDA) code. CMake CMake build issue Java Affects Java cuDF API. labels Sep 29, 2021
@brandon-b-miller
Copy link
Contributor Author

rerun tests

@brandon-b-miller brandon-b-miller added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Sep 29, 2021
@brandon-b-miller
Copy link
Contributor Author

This is now ready for review.

Copy link
Contributor

@shwina shwina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me!

@brandon-b-miller
Copy link
Contributor Author

rerun tests

@brandon-b-miller brandon-b-miller added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels Oct 1, 2021
@brandon-b-miller
Copy link
Contributor Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 3648783 into rapidsai:branch-21.12 Oct 1, 2021
rapids-bot bot pushed a commit that referenced this pull request Oct 12, 2021
DEPENDS on #9217

Introduces a row-like abstraction to the numba UDF pipeline  which enables functions of the following form:

```python
def f(row):
    return row['a'] + row['b']
```

To be applied to dataframes with the corresponding column labels using

```python
df.apply(f, axis=1)
```


Removes the `nulludf` decorator and as such is a breaking change. However since it was just introduced anyways as somewhat of a stopgap, the impact is hopefully low. Users will still be able to write functions the old way, but will require the `numba.cuda.jit(device=True)` decorator for the function to work when wrapped in a lambda

```python
@cuda.jit(device=True)
def f(x, y):
    return x +y

df.apply(lambda row: f(row['a'], row['b'])
```

Makes it so that pandas and cudf can consume the exact same UDF in the same way.

Authors:
  - https://github.com/brandon-b-miller

Approvers:
  - Graham Markall (https://github.com/gmarkall)
  - Ashwin Srinath (https://github.com/shwina)

URL: #9343
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge feature request New feature or request non-breaking Non-breaking change numba Numba issue Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants