Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update guide to UDFs with notes about Series.applymap deprecation and related changes #10607

Merged

Conversation

brandon-b-miller
Copy link
Contributor

@brandon-b-miller brandon-b-miller commented Apr 6, 2022

This PR updates our guide to UDFs notebook in the following ways:

  • Notes deprecation of cudf.Series.applymap
  • reorder sections such that cudf.Series.apply and cudf.DataFrame.apply are encountered before applymap, apply_rows and apply_chunks
  • Adds recommendations about use cases for each of the above APIs to distinguish them
  • Minor updates all over to better describe current state

EDIT: decided to just remove the docs for applymap at this point.

@brandon-b-miller brandon-b-miller added feature request New feature or request doc Documentation Python Affects Python cuDF API. non-breaking Non-breaking change labels Apr 6, 2022
@brandon-b-miller brandon-b-miller self-assigned this Apr 6, 2022
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@github-actions github-actions bot removed the Python Affects Python cuDF API. label Apr 6, 2022
@brandon-b-miller
Copy link
Contributor Author

rerun tests

@brandon-b-miller brandon-b-miller added doc Documentation Python Affects Python cuDF API. and removed doc Documentation feature request New feature or request labels Apr 6, 2022
@codecov
Copy link

codecov bot commented Apr 6, 2022

Codecov Report

Merging #10607 (6e1920b) into branch-22.06 (6675c75) will increase coverage by 0.02%.
The diff coverage is n/a.

@@               Coverage Diff                @@
##           branch-22.06   #10607      +/-   ##
================================================
+ Coverage         86.31%   86.34%   +0.02%     
================================================
  Files               140      140              
  Lines             22255    22280      +25     
================================================
+ Hits              19209    19237      +28     
+ Misses             3046     3043       -3     
Impacted Files Coverage Δ
python/cudf/cudf/core/indexed_frame.py 91.77% <0.00%> (-0.87%) ⬇️
python/cudf/cudf/core/series.py 95.28% <0.00%> (-0.01%) ⬇️
python/cudf/cudf/core/column/string.py 89.10% <0.00%> (ø)
python/cudf/cudf/core/column/column.py 89.45% <0.00%> (+0.10%) ⬆️
python/cudf/cudf/core/column/lists.py 90.62% <0.00%> (+0.12%) ⬆️
python/cudf/cudf/core/column/numerical.py 96.17% <0.00%> (+0.29%) ⬆️
python/cudf/cudf/core/scalar.py 89.32% <0.00%> (+0.56%) ⬆️
python/cudf/cudf/core/frame.py 94.75% <0.00%> (+1.02%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f359ec7...6e1920b. Read the comment docs.

@github-actions github-actions bot removed the Python Affects Python cuDF API. label Apr 7, 2022
@@ -7,13 +7,24 @@
"# Overview of User Defined Functions with cuDF"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it's good to do this in tests, maybe we don't want to compare the invocations of apply() with Pandas -- unless there's a difference we want to call out.


Reply via ReviewNB

@@ -7,13 +7,24 @@
"# Overview of User Defined Functions with cuDF"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functions used with cudf.Series.apply and cudf.DataFrame.apply can be expected to handle nulls using the same rules as the rest of cuDF. In most cases this translates to nulls propagating through unary and binary operations and yielding more nulls.

I think we could simplify while also being a bit more explicit here:

The null value NA an propagates through unary and binary operations. Thus, NA + 1,abs(NA), and NA == NA all return NA. To make this concrete, let's look at the same example from above, this time using nullable data:


Reply via ReviewNB

@@ -7,13 +7,24 @@
"# Overview of User Defined Functions with cuDF"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other parts of this notebook, we use the second person "you", whereas here we use "the user". Let's be consistent -- I have a strong preference for the second person "you".


Reply via ReviewNB

@@ -7,13 +7,24 @@
"# Overview of User Defined Functions with cuDF"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many problems in data science and engineering are well studied and there exist known parallel algorithms for making some desired transformation to some data. Many have corresponding CUDA solutions that may not exist as column level API in cuDF. To expose the ability to use these custom kernels, cuDF supports directly using custom cuda kernels written using numba on cuDF Series objects. In short, this means that if a user has knowledge of how to write a CUDA kernel in numba, they may simply pass cuDF Series objects to that kernel as if they were numba device arrays. Let's look at a basic example of how to do this.

Two observations here:

  • We should avoid categorizing users as coming from a data science or engineering background
  • It takes a bit of reading before I understand what this section is about.

I think we should be more direct here; something along the lines of:

In addition to the Series.apply() method for performing custom operations, you can also pass Series objects directly into [CUDA kernels written with Numba](https://numba.pydata.org/numba-doc/latest/cuda/kernels.html).

Reply via ReviewNB

@@ -7,13 +7,24 @@
"# Overview of User Defined Functions with cuDF"
Copy link
Contributor

@shwina shwina Apr 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>we just need to pre-allocate an output array and leverage the forall method mentioned above

I can't remember where I first learned about avoiding the 'J' word just - but it was probably here

Reply via ReviewNB

@@ -7,13 +7,24 @@
"# Overview of User Defined Functions with cuDF"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example is adapted from cuDF's API documentation.

Orthogonal to this PR, but the link here is broken. Ditto below with apply_grouped.


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed this link but actually I think the docs for apply_grouped are missing as I dont see them anywhere in our stable docs. Looking into this separately.

@brandon-b-miller
Copy link
Contributor Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 28aa895 into rapidsai:branch-22.06 Apr 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Documentation non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants