-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Review] Add optimized 2x string column renumbering code #1996
[Review] Add optimized 2x string column renumbering code #1996
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great start!
I have a bunch of comments, largely related to making this a bit more consistent with other cugraph software which will ultimately make this easier to maintain.
@@ -16,9 +16,954 @@ | |||
#include <cugraph_etl/functions.hpp> | |||
|
|||
#include <cugraph/utilities/error.hpp> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RAPIDS convention is to group includes from nearest to farthest with a blank line between groups. So these should be reordered so that:
- All of the cugraph_etl files and files that are part of lib_cugraph_etl/src are listed in the first group
- All of the cugraph files included next
- All of the cudf files included next
- All of the rmm files included next
- All of the thrust/cub files included next
- System includes (cuda, stl, etc) last
@@ -0,0 +1,545 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is copied from cudf. I don't recall the exact reason we had to copy this code instead of using cuco.
It would be helpful to add some documentation here describing this so that we will remember down the road.
Convention within cugraph is to insert comments prefixed with FIXME:
that describe this. So something like:
/*
* FIXME: This file is copied from cudf because XXX
* The plan is to migrate to using the cuco version (or the libcudacxx version) once YYY is
* completed. At that point this file can be deleted.
*/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comment.
cpp/libcugraph_etl/include/hash/concurrent_unordered_multimap.cuh
Outdated
Show resolved
Hide resolved
constexpr uint32_t hash_inc_constant = 9999; | ||
|
||
typedef struct str_hash_value{ | ||
__host__ __device__ str_hash_value() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using initialization instead of assignment here.
If you defined them below as:
size_type row_{std::numeric_limits<size_type>::max()};
accum_type count_{std::numeric_limits<accum_type>::max()};
int32_t col_{std::numeric_limits<int32_t>::max()};
It might be a little cleaner and you wouldn't need to define the default constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to initialization, however empty default constructor still required otherwise thrust::sort is complaining.
str_col_view.offsets().data<str_offset_type>())); | ||
} | ||
|
||
accum_type *hist_insert_counter = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this an std::vector? The RAAI construct is generally cleaner and easier to maintain.
The cugraph code uses raft utilities to support easy transfers between host and device. https://github.com/rapidsai/raft/blob/6a8c7a3bebe85d8fef34951e6f09c93fa733b06f/cpp/include/raft/cudart_utils.h#L251
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to efficiently use raft utilities, we might need to add the raft::handle_t
as a parameter to the renumber_cudf_tables method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing this to a raft::mr::host::buffer for RAII. I directly write some values to pinned memory so can't use a std::vector.
A couple of other notes...
|
Codecov Report
@@ Coverage Diff @@
## branch-22.02 #1996 +/- ##
================================================
+ Coverage 69.90% 72.50% +2.59%
================================================
Files 142 146 +4
Lines 8689 9470 +781
================================================
+ Hits 6074 6866 +792
+ Misses 2615 2604 -11
Continue to review full report at Codecov.
|
@ChuckHastings all the suggested edits are done. Please take a look. |
@gpucibot merge |
No description provided.