Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move compute_column API out of ast namespace #8957

Merged

Conversation

vyasr
Copy link
Contributor

@vyasr vyasr commented Aug 4, 2021

The compute_column API is functionally analogous to cudf::transform, except that the former expresses the transformation in terms of a cudf expression while the latter accepts string UDFs to be JITted and evaluated. Moving compute_column into transform.hpp means that the only public APIs in the ast namespace are those required to define an expression, so we clearly separate the logic for defining and parsing of expressions from APIs (like conditional joins and compute_column) that evaluate those expressions on device and then use the output in different ways.

Contributes to #8145.

@vyasr vyasr requested review from a team as code owners August 4, 2021 18:30
@vyasr vyasr requested review from trxcllnt and vuule August 4, 2021 18:30
@vyasr vyasr self-assigned this Aug 4, 2021
@github-actions github-actions bot added CMake CMake build issue Java Affects Java cuDF API. libcudf Affects libcudf (C++/CUDA) code. labels Aug 4, 2021
@vyasr vyasr added 3 - Ready for Review Ready for review by team 4 - Needs cuDF (Java) Reviewer 4 - Needs Review Waiting for reviewer to review or respond breaking Breaking change improvement Improvement / enhancement to an existing function and removed CMake CMake build issue labels Aug 4, 2021
@vyasr vyasr added this to the Conditional Joins milestone Aug 4, 2021
@vyasr
Copy link
Contributor Author

vyasr commented Aug 4, 2021

@harrism you requested a better name for the compute_column API if it's moved outside of the ast namespace here. The closest API currently in cudf is the transform function, which is also extremely generic. Any thoughts on how you'd like to move forward with that?

@jlowe I've updated the JNI bindings, so we should be good to go for now. If the function name changes I'll attempt to propagate that through as well, but I may need a bit of help with that to make sure that I get everything since that will also affect all the Java code using this API (right now the change is just a namespace change when calling the C++ API, which is internal to the JNI).

@vyasr vyasr mentioned this pull request Aug 4, 2021
10 tasks
@github-actions github-actions bot added CMake CMake build issue conda labels Aug 4, 2021
@codecov
Copy link

codecov bot commented Aug 4, 2021

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.10@3b391ae). Click here to learn what that means.
The diff coverage is n/a.

❗ Current head 6fb1117 differs from pull request most recent head 3339f62. Consider uploading reports for the commit 3339f62 to get more accurate results
Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.10    #8957   +/-   ##
===============================================
  Coverage                ?   10.61%           
===============================================
  Files                   ?      116           
  Lines                   ?    19003           
  Branches                ?        0           
===============================================
  Hits                    ?     2017           
  Misses                  ?    16986           
  Partials                ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3b391ae...3339f62. Read the comment docs.

Copy link
Contributor

@vuule vuule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Not sure if I understand why the changes are made, but PR itself is 👍

@vyasr
Copy link
Contributor Author

vyasr commented Aug 5, 2021

Looks good. Not sure if I understand why the changes are made, but PR itself is 👍

This API just says "compute a new column by performing some operation on the rows of a table", which is essentially the same as cudf::transform. That behavior should dictate where the API lives, not the fact that it happens to represent the operation by an AST expression, otherwise APIs like conditional joins would also live in the cudf::ast namespace. In general, since AST expression evaluation could be used in many places we want to limit the cudf::ast namespace to just the logic for defining expressions (and cudf::ast::detail to internal code for parsing and evaluating those expressions in device code).

Copy link
Contributor

@robertmaynard robertmaynard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CMake changes LGTM

@vyasr
Copy link
Contributor Author

vyasr commented Aug 6, 2021

@gpucibot merge

@vyasr vyasr added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team 4 - Needs Review Waiting for reviewer to review or respond 4 - Needs cuDF (Java) Reviewer labels Aug 6, 2021
@trxcllnt
Copy link
Contributor

trxcllnt commented Aug 6, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit a4eabf0 into rapidsai:branch-21.10 Aug 7, 2021
shwina pushed a commit to shwina/cudf that referenced this pull request Aug 9, 2021
The `compute_column` API is functionally analogous to `cudf::transform`, except that the former expresses the transformation in terms of a cudf expression while the latter accepts string UDFs to be JITted and evaluated. Moving `compute_column` into `transform.hpp` means that the only public APIs in the `ast` namespace are those required to _define_ an expression, so we clearly separate the logic for defining and parsing of expressions from APIs (like conditional joins and `compute_column`) that evaluate those expressions on device and then use the output in different ways.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Jason Lowe (https://github.com/jlowe)
  - Vukasin Milovanovic (https://github.com/vuule)
  - Robert Maynard (https://github.com/robertmaynard)
  - Paul Taylor (https://github.com/trxcllnt)
  - Ray Douglass (https://github.com/raydouglass)

URL: rapidsai#8957
@vyasr vyasr deleted the refactor/move_compute_column branch March 9, 2022 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge breaking Breaking change CMake CMake build issue improvement Improvement / enhancement to an existing function Java Affects Java cuDF API. libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants