Support `args=` in `Series.apply` #9982

brandon-b-miller · 2022-01-06T16:44:32Z

A lot of code was moved around but also slightly tweaked, making the diff a little harder to parse. Here's a summary of the changes:

Series.apply used to simply turn the incoming scalar lambda function into a row UDF and then turn itself into a dataframe and run the code as normal. Now, it does its own separate unique processing and pipes through Frame._apply instead.
pipeline.py was separated out into row_function.py and lambda_function.py which contain whatever is specific to each type of UDF, whereas everything that was common to both was migrated to utils.py and generalized as much as possible.
a templates.py area was created to hold all the templates and initializers needed to cat together the kernel that we need and a new template specific to series lambdas was created.
The caching machinery was abstracted out into compile_or_get and this function now expects a python function object it can call that will produce the right kernel. DataFrame and Series decide which one to use at the top level API.
Moved _apply from Frame to IndexedFrame

vyasr · 2022-01-06T22:33:43Z

@brandon-b-miller I've been looking at this code a bit recently and we discussed some of this refactoring so feel free to explicitly request me whenever you convert this from a draft.

brandon-b-miller · 2022-01-07T21:52:57Z

I think this is getting there - cc @vyasr

brandon-b-miller · 2022-01-19T16:55:42Z

rerun tests

vyasr

One question (out of scope for this PR). Is it possible to enable users passing in kernels that they've already compiled, or is that too difficult? And if it is possible, should we at least throw a more friendly error if a user tries that? I remember testing this once and being surprised by the behavior.

python/cudf/cudf/core/indexed_frame.py

python/cudf/cudf/core/series.py

python/cudf/cudf/core/udf/row_function.py

python/cudf/cudf/core/udf/scalar_function.py

python/cudf/cudf/core/udf/templates.py

python/cudf/cudf/core/udf/utils.py

Co-authored-by: Vyas Ramasubramani <[email protected]>

codecov · 2022-01-24T23:37:40Z

Codecov Report

Merging #9982 (0af638f) into branch-22.04 (e24fa8f) will increase coverage by 0.03%.
The diff coverage is n/a.

@@               Coverage Diff                @@
##           branch-22.04    #9982      +/-   ##
================================================
+ Coverage         10.37%   10.41%   +0.03%     
================================================
  Files               119      122       +3     
  Lines             20149    20629     +480     
================================================
+ Hits               2091     2148      +57     
- Misses            18058    18481     +423

Impacted Files	Coverage Δ
python/cudf/cudf/errors.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/io/csv.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/io/hdf.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/io/orc.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/__init__.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/_version.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/core/abc.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/api/types.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/io/dlpack.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/core/frame.py	`0.00% <0.00%> (ø)`
... and 64 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3e2474d...0af638f. Read the comment docs.

vyasr

I have a couple of small questions, but I think this looks mostly good from my end. Happy to move this forward once other reviewers are happy.

python/cudf/cudf/core/indexed_frame.py

vyasr · 2022-01-26T22:33:15Z

python/cudf/cudf/core/indexed_frame.py

+            offsets.append(col.offset)
+        launch_args += offsets
+        launch_args += list(args)
+        kernel.forall(len(self))(*launch_args)


If we always generate a kernel with len(self) tasks, do we really need to pass len(self) as one of the launch_args? AFAICT that's just used to avoid out of bounds accesses, but it looks like we always launch a grid with one thread per row right?

I had taken the numba docs for forall to mean that inserting this check was a requirement for kernels that expect to be configured this way. Indeed, taking it out leads to numerous tests failing due to nulls being in the wrong places everywhere. My assumption was that something was happening inside forall that caused unpredictable behavior if this guard was not included. cc @gmarkall for more insight.

I guess maybe numba must be doing something where it generates only a limited set of kernels (maybe templates?) based on the block size and then dispatches to the closest size possible based on the argument to forall? I would be curious to learn more about how this works from @gmarkall. It sounds like you should ignore my suggestion to change anything here though.

isVoid

Stellar work. I feel like reading a prose with current naming of functions.

python/cudf/cudf/core/udf/utils.py

Co-authored-by: Michael Wang <[email protected]>

brandon-b-miller · 2022-01-28T14:15:54Z

@gpucibot merge

brandon-b-miller added 9 commits January 4, 2022 17:49

basic

cff4d1f

enough for now

4b7181d

stuff works

be9deb4

unify get_udf_return_type and change how its called

c3ed817

lots of progress here

76fe430

bugfixes

9131e23

all passing

4e9c876

moving a few things around

af56462

rename things

21c4b95

github-actions bot added the Python Affects Python cuDF API. label Jan 6, 2022

updates

1feed89

brandon-b-miller added 2 commits January 7, 2022 10:04

more updates, slowly refactoring

c77333d

pretty close now

3543e15

brandon-b-miller marked this pull request as ready for review January 7, 2022 21:51

brandon-b-miller requested a review from a team as a code owner January 7, 2022 21:51

brandon-b-miller requested review from isVoid and charlesbluca January 7, 2022 21:51

brandon-b-miller added feature request New feature or request non-breaking Non-breaking change numba Numba issue 2 - In Progress Currently a work in progress labels Jan 7, 2022

move _apply from Frame to IndexedFrame

caff641

brandon-b-miller added 3 commits January 10, 2022 18:27

merge latest

8338061

Merge branch 'branch-22.02' into fea-series-apply-args

ebdd9d4

rename confusing function names

ac442b2

brandon-b-miller removed the 2 - In Progress Currently a work in progress label Jan 11, 2022

brandon-b-miller added 9 commits January 18, 2022 07:09

lambda -> scalar

e1635f8

bugfix

34f5c57

prefix everything with an underscore

841cad8

address more reviews

c784b12

style

809b4c1

factor out common logic

1f320e6

Merge branch 'branch-22.02' into fea-series-apply-args

b099ddc

dont use a locals dict

ac0bb27

merge latest and resolve conflicts

f9b6bbd

vyasr requested changes Jan 19, 2022

View reviewed changes

brandon-b-miller and others added 2 commits January 19, 2022 15:59

Apply suggestions from code review

18c256d

Co-authored-by: Vyas Ramasubramani <[email protected]>

partially address reviews

25ffcdb

shwina changed the base branch from branch-22.02 to branch-22.04 January 20, 2022 21:23

brandon-b-miller added 3 commits January 24, 2022 11:54

shorten comments

1844344

move to a TypingError

c84ac0a

address more reviews

8d41c89

merge 22.04 and resolve conflicts

2415a98

vyasr approved these changes Jan 26, 2022

View reviewed changes

isVoid approved these changes Jan 27, 2022

View reviewed changes

python/cudf/cudf/core/udf/utils.py Show resolved Hide resolved

brandon-b-miller and others added 2 commits January 27, 2022 13:34

Update python/cudf/cudf/core/udf/utils.py

643d55e

Co-authored-by: Michael Wang <[email protected]>

updates

0af638f

rapids-bot bot merged commit 896564a into rapidsai:branch-22.04 Jan 28, 2022

shwina mentioned this pull request Mar 23, 2022

[DOC] RAPIDS 22.04 Release Blog Outline #10383

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support `args=` in `Series.apply` #9982

Support `args=` in `Series.apply` #9982

brandon-b-miller commented Jan 6, 2022 •

edited

Loading

vyasr commented Jan 6, 2022

brandon-b-miller commented Jan 7, 2022

brandon-b-miller commented Jan 19, 2022

vyasr left a comment

codecov bot commented Jan 24, 2022 •

edited

Loading

vyasr left a comment

vyasr Jan 26, 2022

brandon-b-miller Jan 27, 2022

vyasr Jan 27, 2022

isVoid left a comment

brandon-b-miller commented Jan 28, 2022

Support args= in Series.apply #9982

Support args= in Series.apply #9982

Conversation

brandon-b-miller commented Jan 6, 2022 • edited Loading

vyasr commented Jan 6, 2022

brandon-b-miller commented Jan 7, 2022

brandon-b-miller commented Jan 19, 2022

vyasr left a comment

Choose a reason for hiding this comment

codecov bot commented Jan 24, 2022 • edited Loading

Codecov Report

vyasr left a comment

Choose a reason for hiding this comment

vyasr Jan 26, 2022

Choose a reason for hiding this comment

brandon-b-miller Jan 27, 2022

Choose a reason for hiding this comment

vyasr Jan 27, 2022

Choose a reason for hiding this comment

isVoid left a comment

Choose a reason for hiding this comment

brandon-b-miller commented Jan 28, 2022

Support `args=` in `Series.apply` #9982

Support `args=` in `Series.apply` #9982

brandon-b-miller commented Jan 6, 2022 •

edited

Loading

codecov bot commented Jan 24, 2022 •

edited

Loading