Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Port contents of functions.h to new cudf::column and associated types #2931

Closed
12 tasks done
harrism opened this issue Oct 3, 2019 · 10 comments
Closed
12 tasks done
Assignees
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS

Comments

@harrism
Copy link
Member

harrism commented Oct 3, 2019

Is your feature request related to a problem? Please describe.

Now that #2207 is merged we need to start porting functionality to use it. This issue covers functionality defined in cudf/functions.h.

Describe the solution you'd like

Specifically:

  • gdf_order_by (already implemented in cudf::column redesign #2207, so this is simply a header/namespace change)
  • gdf_context_view (not sure what to do with this -- need to look at usage
    Also, functionality in functions.h largely uses the legacy "gdf" file structure and
  • gdf_valid_allocation_size and gdf_num_bitmask_elements won't be needed once the transition is complete, so for now they should just move to a header in the legacy folder and legacy namespace.
  • nvtx functions: useful to keep, should just be updated to throw exceptions and moved to their own header and in the cudf::utility namespace or similar.
  • Error handling functions gdf_error_get_name etc.
  • gdf_extract_datetime_second and similar function may need to be handled in a separate refactor of date time functions.
  • gdf_digitize
  • gdf_transpose
  • gdf_hash
  • gdf_hash_partition
  • gdf_to_dlpack
  • gdf_from_dlpack

Not all of the above needs to be done in a single PR. In fact, I recommend grouping it logically into multiple PRs to make reviewing and merging easy (and to enable collaboration).

@harrism harrism added feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. tech debt labels Oct 3, 2019
@harrism harrism added the Spark Functionality that helps Spark RAPIDS label Oct 4, 2019
@jrhemstad
Copy link
Contributor

gdf_order_by can be removed entirely.

gdf_error_get_name and error handling functions as well as gdf_context_view can/should just be removed all together.

gdf_valid_allocation_size and gdf_num_bitmask_elements are superseded by https://github.com/rapidsai/cudf/blob/branch-0.10/cpp/include/cudf/null_mask.hpp#L47

@trevorsm7
Copy link
Contributor

I started on this today

@harrism
Copy link
Member Author

harrism commented Oct 11, 2019

Please follow the two-step approach (Start with a PR that only moves files to legacy and points include paths at them)

@trevorsm7
Copy link
Contributor

trevorsm7 commented Oct 11, 2019

Why are the gdf_nvtx_range_* functions declared in functions.h instead of nvtx_utils.h? These don't have anything to do with columns, but I could update them to CUDF_EXPECTS instead of returning gdf_error.
EDIT: Oh, I missed this at the bottom of your list, Mark. Will do just that.

@trevorsm7
Copy link
Contributor

First step here: #3070

@jrhemstad
Copy link
Contributor

Why are the gdf_nvtx_range_* functions declared in functions.h instead of nvtx_utils.h?

Because all external APIs used to be defined in functions.h :)

@trevorsm7
Copy link
Contributor

datetime_ops functions are covered by #3201

@trevorsm7
Copy link
Contributor

digitize is being deprecated by upper_bound and lower_bound: #3258

@trevorsm7
Copy link
Contributor

All PRs merged. Closing!

@harrism
Copy link
Member Author

harrism commented Dec 17, 2019

Congrats and great work @trevorsm7, this was a tall order.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

No branches or pull requests

3 participants