-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Uniform Neighborhood Sampling #2258
Changes from 11 commits
040e2d0
f6e4020
0c53129
1f438ae
977df06
cf5f203
58b856e
517c7ed
b5eb863
eeaa14c
bc1a358
dfa3a50
dcc87cb
000d811
67844a1
7acf024
d8f69c8
bad27dc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1503,6 +1503,7 @@ void core_number(raft::handle_t const& handle, | |
|
||
/** | ||
* @brief Multi-GPU Uniform Neighborhood Sampling. | ||
* @deprecated will be removed later in this release (22.06) | ||
* | ||
* @tparam graph_view_t Type of graph view. | ||
* @tparam gpu_t Type of rank (GPU) indices; | ||
|
@@ -1536,6 +1537,32 @@ uniform_nbr_sample(raft::handle_t const& handle, | |
std::vector<int> const& h_fan_out, | ||
bool with_replacement = true); | ||
|
||
/** | ||
* @brief Multi-GPU Uniform Neighborhood Sampling. | ||
* | ||
* @tparam graph_view_t Type of graph view. | ||
* @param handle RAFT handle object to encapsulate resources (e.g. CUDA stream, communicator, and | ||
* handles to various CUDA libraries) to run graph algorithms. | ||
* @param graph_view Graph View object to generate NBR Sampling on. | ||
* @param d_starting_vertices Device span of starting vertex IDs for the NBR Sampling. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
* @param h_fan_out Host span defining branching out (fan-out) degree per source vertex for each | ||
* level | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
* @param with_replacement boolean flag specifying if random sampling is done with replacement | ||
* (true); or, without replacement (false); default = true; | ||
* @return tuple of tuple of device vectors and counts: | ||
* ((vertex_t source_vertex, vertex_t destination_vertex, int rank, edge_t index), rx_counts) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess this comment is out-dated copy-and-paste from the previous implementation. I assume we are returning a tuple of edge source, edge destination, and edge weight vectors (the last might be actually edge ID right at this moment?). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed in next push |
||
*/ | ||
template <typename graph_view_t> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah... and we are sort of mixing
I think we'd better be consistent and any preference in one over the other? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No strong preference for me. There is, I think, and advantage to the But the syntax is a bit cleaner with your original approach. I don't think it's likely that we will frequently change the template signature of the API, and we will eventually get rid of the legacy graph class. I'd be happy to change this back to your original approach, or if we like the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah... I don't have strong preference either but I have strong preference for consistency. I am also using for primitives but wondering I should better use I am getting more inclined to the And hopefully we can eliminate the legacy code sooner than later; at that point, I slightly prefer There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds good. I will make those changes in the next push. I will leave Louvain as it is now. I plan to create a PR to add Louvain to the C API, I will refactor the Louvain API in that PR. |
||
std::tuple<rmm::device_uvector<typename graph_view_t::vertex_type>, | ||
rmm::device_uvector<typename graph_view_t::vertex_type>, | ||
rmm::device_uvector<typename graph_view_t::weight_type>> | ||
uniform_nbr_sample(raft::handle_t const& handle, | ||
graph_view_t const& graph_view, | ||
raft::device_span<typename graph_view_t::vertex_type> d_starting_vertices, | ||
raft::host_span<const int> h_fan_out, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah.... and this API is way more intuitive than the previous one!!! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I love how the span variants clean up the API. I'll drop the extra prefixes in the next push |
||
bool with_replacement = true, | ||
uint64_t seed = 0); | ||
|
||
/* | ||
* @brief Compute triangle counts. | ||
* | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -188,6 +188,7 @@ __global__ void partially_decompress_to_edgelist_high_degree( | |
vertex_t input_major_count, | ||
vertex_t* output_majors, | ||
vertex_t* output_minors, | ||
thrust::optional<weight_t*> output_weights, | ||
thrust::optional<thrust::tuple<prop_t const*, prop_t*>> property, | ||
thrust::optional<thrust::tuple<edge_t const*, edge_t*>> global_edge_index) | ||
{ | ||
|
@@ -204,6 +205,8 @@ __global__ void partially_decompress_to_edgelist_high_degree( | |
for (edge_t i = threadIdx.x; i < local_degree; i += blockDim.x) { | ||
output_majors[major_offset + i] = major; | ||
output_minors[major_offset + i] = indices[i]; | ||
|
||
if (output_weights) (*output_weights)[major_offset + i] = (*weights)[i]; | ||
} | ||
if (property) { | ||
auto input_property = thrust::get<0>(*property)[idx]; | ||
|
@@ -231,6 +234,7 @@ __global__ void partially_decompress_to_edgelist_mid_degree( | |
vertex_t input_major_count, | ||
vertex_t* output_majors, | ||
vertex_t* output_minors, | ||
thrust::optional<weight_t*> output_weights, | ||
thrust::optional<thrust::tuple<prop_t const*, prop_t*>> property, | ||
thrust::optional<thrust::tuple<edge_t const*, edge_t*>> global_edge_index) | ||
{ | ||
|
@@ -242,11 +246,18 @@ __global__ void partially_decompress_to_edgelist_mid_degree( | |
auto major = input_majors[idx]; | ||
auto major_partition_offset = static_cast<size_t>(major - edge_partition.major_range_first()); | ||
vertex_t const* indices{nullptr}; | ||
thrust::optional<weight_t const*> weights{thrust::nullopt}; | ||
edge_t local_degree{}; | ||
|
||
thrust::tie(indices, weights, local_degree) = | ||
edge_partition.local_edges(major_partition_offset); | ||
|
||
auto major_offset = input_major_start_offsets[idx]; | ||
for (edge_t i = threadIdx.x; i < local_degree; i += blockDim.x) { | ||
output_majors[major_offset + i] = major; | ||
output_minors[major_offset + i] = indices[i]; | ||
|
||
if (output_weights) (*output_weights)[major_offset + i] = (*weights)[i]; | ||
} | ||
if (property) { | ||
auto input_property = thrust::get<0>(*property)[idx]; | ||
|
@@ -275,6 +286,7 @@ void partially_decompress_edge_partition_to_fill_edgelist( | |
std::vector<vertex_t> const& segment_offsets, | ||
vertex_t* majors, | ||
vertex_t* minors, | ||
thrust::optional<weight_t*> weights, | ||
thrust::optional<thrust::tuple<prop_t const*, prop_t*>> property, | ||
thrust::optional<thrust::tuple<edge_t const*, edge_t*>> global_edge_index) | ||
{ | ||
|
@@ -297,6 +309,7 @@ void partially_decompress_edge_partition_to_fill_edgelist( | |
segment_offsets[1], | ||
majors, | ||
minors, | ||
weights, | ||
property ? thrust::make_optional(thrust::make_tuple( | ||
thrust::get<0>(*property) + segment_offsets[0], thrust::get<1>(*property))) | ||
: thrust::nullopt, | ||
|
@@ -317,6 +330,7 @@ void partially_decompress_edge_partition_to_fill_edgelist( | |
segment_offsets[2] - segment_offsets[1], | ||
majors, | ||
minors, | ||
weights, | ||
property ? thrust::make_optional(thrust::make_tuple( | ||
thrust::get<0>(*property) + segment_offsets[1], thrust::get<1>(*property))) | ||
: thrust::nullopt, | ||
|
@@ -333,10 +347,11 @@ void partially_decompress_edge_partition_to_fill_edgelist( | |
input_major_start_offsets + segment_offsets[2] - segment_offsets[0], | ||
majors, | ||
minors, | ||
property = property | ||
? thrust::make_optional(thrust::make_tuple( | ||
output_weights = weights, | ||
property = property | ||
? thrust::make_optional(thrust::make_tuple( | ||
thrust::get<0>(*property) + segment_offsets[2], thrust::get<1>(*property))) | ||
: thrust::nullopt, | ||
: thrust::nullopt, | ||
global_edge_index] __device__(auto idx) { | ||
auto major = input_majors[idx]; | ||
auto major_offset = input_major_start_offsets[idx]; | ||
|
@@ -350,6 +365,10 @@ void partially_decompress_edge_partition_to_fill_edgelist( | |
thrust::fill( | ||
thrust::seq, majors + major_offset, majors + major_offset + local_degree, major); | ||
thrust::copy(thrust::seq, indices, indices + local_degree, minors + major_offset); | ||
if (weights) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This can lead to thread-divergence if You may add a similar FIXME. Later, we may address this together by adding something like (delayed) |
||
thrust::copy( | ||
thrust::seq, *weights, *weights + local_degree, *output_weights + major_offset); | ||
|
||
if (property) { | ||
auto major_input_property = thrust::get<0>(*property)[idx]; | ||
auto minor_output_property = thrust::get<1>(*property); | ||
|
@@ -379,10 +398,11 @@ void partially_decompress_edge_partition_to_fill_edgelist( | |
input_major_start_offsets + segment_offsets[3] - segment_offsets[0], | ||
majors, | ||
minors, | ||
property = property | ||
? thrust::make_optional(thrust::make_tuple( | ||
output_weights = weights, | ||
property = property | ||
? thrust::make_optional(thrust::make_tuple( | ||
thrust::get<0>(*property) + segment_offsets[3], thrust::get<1>(*property))) | ||
: thrust::nullopt, | ||
: thrust::nullopt, | ||
global_edge_index] __device__(auto idx) { | ||
auto major = input_majors[idx]; | ||
auto major_offset = input_major_start_offsets[idx]; | ||
|
@@ -395,6 +415,9 @@ void partially_decompress_edge_partition_to_fill_edgelist( | |
thrust::fill( | ||
thrust::seq, majors + major_offset, majors + major_offset + local_degree, major); | ||
thrust::copy(thrust::seq, indices, indices + local_degree, minors + major_offset); | ||
if (output_weights) | ||
thrust::copy( | ||
thrust::seq, *weights, *weights + local_degree, *output_weights + major_offset); | ||
if (property) { | ||
auto major_input_property = thrust::get<0>(*property)[idx]; | ||
auto minor_output_property = thrust::get<1>(*property); | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -25,6 +25,7 @@ | |
#include <rmm/exec_policy.hpp> | ||
|
||
#include <cuco/detail/hash_functions.cuh> | ||
#include <thrust/binary_search.h> | ||
#include <thrust/sort.h> | ||
#include <thrust/tabulate.h> | ||
#include <thrust/transform.h> | ||
|
@@ -47,6 +48,22 @@ struct compute_gpu_id_from_vertex_t { | |
} | ||
}; | ||
|
||
template <typename vertex_t> | ||
struct compute_gpu_id_from_int_vertex_t { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we better rename other functors working on external vertex IDs to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done for vertex in the next push. Do we ever try and use these functors on an There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Gotcha, agreed. |
||
vertex_t const* vertex_partition_range_lasts; | ||
size_t num_vertex_partitions; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah... maybe just a FIXME statement, but we should eventually replace this (pointer, size) pairs to raft::device_span. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Changed to span in the next push. |
||
|
||
__device__ int operator()(vertex_t v) const | ||
{ | ||
return static_cast<int>( | ||
thrust::distance(vertex_partition_range_lasts, | ||
thrust::upper_bound(thrust::seq, | ||
vertex_partition_range_lasts, | ||
vertex_partition_range_lasts + num_vertex_partitions, | ||
v))); | ||
} | ||
}; | ||
|
||
template <typename vertex_t> | ||
struct compute_gpu_id_from_edge_t { | ||
int comm_size{0}; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really a Multi-GPU only thing or for both SG & MG
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both. Updated the comment.