-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update guide to UDFs with notes about Series.applymap
deprecation and related changes
#10607
Update guide to UDFs with notes about Series.applymap
deprecation and related changes
#10607
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
rerun tests |
Codecov Report
@@ Coverage Diff @@
## branch-22.06 #10607 +/- ##
================================================
+ Coverage 86.31% 86.34% +0.02%
================================================
Files 140 140
Lines 22255 22280 +25
================================================
+ Hits 19209 19237 +28
+ Misses 3046 3043 -3
Continue to review full report at Codecov.
|
@@ -7,13 +7,24 @@ | |||
"# Overview of User Defined Functions with cuDF" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it's good to do this in tests, maybe we don't want to compare the invocations of apply()
with Pandas -- unless there's a difference we want to call out.
Reply via ReviewNB
@@ -7,13 +7,24 @@ | |||
"# Overview of User Defined Functions with cuDF" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functions used withcudf.Series.apply
andcudf.DataFrame.apply
can be expected to handle nulls using the same rules as the rest of cuDF. In most cases this translates to nulls propagating through unary and binary operations and yielding more nulls.
I think we could simplify while also being a bit more explicit here:
The null value NA
an propagates through unary and binary operations. Thus, NA + 1
,abs(NA)
, and NA == NA
all return NA
. To make this concrete, let's look at the same example from above, this time using nullable data:
Reply via ReviewNB
@@ -7,13 +7,24 @@ | |||
"# Overview of User Defined Functions with cuDF" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other parts of this notebook, we use the second person "you", whereas here we use "the user". Let's be consistent -- I have a strong preference for the second person "you".
Reply via ReviewNB
@@ -7,13 +7,24 @@ | |||
"# Overview of User Defined Functions with cuDF" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many problems in data science and engineering are well studied and there exist known parallel algorithms for making some desired transformation to some data. Many have corresponding CUDA solutions that may not exist as column level API in cuDF. To expose the ability to use these custom kernels, cuDF supports directly using custom cuda kernels written usingnumba
on cuDFSeries
objects. In short, this means that if a user has knowledge of how to write a CUDA kernel in numba, they may simply pass cuDFSeries
objects to that kernel as if they were numba device arrays. Let's look at a basic example of how to do this.
Two observations here:
- We should avoid categorizing users as coming from a data science or engineering background
- It takes a bit of reading before I understand what this section is about.
I think we should be more direct here; something along the lines of:
In addition to theSeries.apply()
method for performing custom operations, you can also passSeries
objects directly into [CUDA kernels written with Numba](https://numba.pydata.org/numba-doc/latest/cuda/kernels.html).
Reply via ReviewNB
@@ -7,13 +7,24 @@ | |||
"# Overview of User Defined Functions with cuDF" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -7,13 +7,24 @@ | |||
"# Overview of User Defined Functions with cuDF" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example is adapted from cuDF's API documentation.
Orthogonal to this PR, but the link here is broken. Ditto below with apply_grouped.
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed this link but actually I think the docs for apply_grouped
are missing as I dont see them anywhere in our stable docs. Looking into this separately.
@gpucibot merge |
This PR updates our guide to UDFs notebook in the following ways:
cudf.Series.applymap
cudf.Series.apply
andcudf.DataFrame.apply
are encountered beforeapplymap
,apply_rows
andapply_chunks
EDIT: decided to just remove the docs for
applymap
at this point.