Refactor Uniform Neighborhood Sampling #2258

ChuckHastings · 2022-04-30T00:56:01Z

This PR will refactor the Uniform Neighborhood Sampling implementation to meet the new C API.

Major elements:

Moved old implementation details into cugraph::detail::original
Edge ids will be passed in as the edge weight, to allow them to be controlled by the caller. Edge weight will be an integer type, but we will treat int32_t * as float * (or int64_t * as double *). The algorithms will be flagged so that they won't do computations on the weight if it is an edge id
Adding an SG implementation (only partially done as of the creation of this PR, will be finished before this PR is ready for review)

…inal, add new functions with cleaned up API

…o validate

…factor

seunghwak

Review Part 1.

seunghwak · 2022-05-11T20:18:20Z

cpp/include/cugraph/algorithms.hpp

+ * @param with_replacement boolean flag specifying if random sampling is done with replacement
+ * (true); or, without replacement (false); default = true;
+ * @return tuple of tuple of device vectors and counts:
+ * ((vertex_t source_vertex, vertex_t destination_vertex, int rank, edge_t index), rx_counts)


I guess this comment is out-dated copy-and-paste from the previous implementation. I assume we are returning a tuple of edge source, edge destination, and edge weight vectors (the last might be actually edge ID right at this moment?).

Fixed in next push

seunghwak · 2022-05-11T20:21:04Z

cpp/include/cugraph/algorithms.hpp

+ * @return tuple of tuple of device vectors and counts:
+ * ((vertex_t source_vertex, vertex_t destination_vertex, int rank, edge_t index), rx_counts)
+ */
+template <typename graph_view_t>


Yeah... and we are sort of mixing

template <typename graph_view_t> and using typename graph_view_t::vertex_type, ...
and
template <typename vertext_t, typename edge_t, typename weight_t, bool store_transpoed, bool multi_gpu> and using graph_view_t<vertex_t, edge_t, weight_t, store_transposed, multi_gpu>.

I think we'd better be consistent and any preference in one over the other?

No strong preference for me.

There is, I think, and advantage to the template <typename graph_view_t> approach in that if we change the implementation of graph_view (adding or removing a template parameter), as long as typename graph_view_t::vertex_type is still defined the API works without modification. I believe Andrei copied this from my Louvain definition which uses this approach. I implemented Louvain this way so that I could support both the Legacy graph and the graph_t with the same API.

But the syntax is a bit cleaner with your original approach. I don't think it's likely that we will frequently change the template signature of the API, and we will eventually get rid of the legacy graph class.

I'd be happy to change this back to your original approach, or if we like the template <typename graph_view_t> approach better I can add that to the list of things to gradually update in the code.

Yeah... I don't have strong preference either but I have strong preference for consistency.

I am also using for primitives but wondering I should better use graph_view_t<vertex_t, edge_t, weight_t, store_transpoed, multi_gpu> instead.

I am getting more inclined to the graph_view_t<vertex_t, edge_t, weight_t, store_transpoed, multi_gpu> approach as this code does not work for a general graph view type but works only with our graph_view_t (e.g. the implementation depends on multiple member functions only exist in graph_view_t).

And hopefully we can eliminate the legacy code sooner than later; at that point, I slightly prefer graph_view_t<vertex_t, edge_t, weight_t, store_transpoed, multi_gpu> even though this will have pretty much very minimal impact on end-user experiences.

Sounds good. I will make those changes in the next push. I will leave Louvain as it is now. I plan to create a PR to add Louvain to the C API, I will refactor the Louvain API in that PR.

seunghwak · 2022-05-11T20:23:38Z

cpp/include/cugraph/algorithms.hpp

+uniform_nbr_sample(raft::handle_t const& handle,
+                   graph_view_t const& graph_view,
+                   raft::device_span<typename graph_view_t::vertex_type> d_starting_vertices,
+                   raft::host_span<const int> h_fan_out,


I guess d_ and h_ here are a bit redundant (especially with device_span and host_span). Or we should use this naming convention in all the functions in the public API. My current practice is to use d_ and h_ only when we have both host and device vectors with the same name, but open to discussions.

Yeah.... and this API is way more intuitive than the previous one!!!

I love how the span variants clean up the API. I'll drop the extra prefixes in the next push

seunghwak · 2022-05-11T20:24:29Z

cpp/include/cugraph/algorithms.hpp

@@ -1536,6 +1537,32 @@ uniform_nbr_sample(raft::handle_t const& handle,
                   std::vector<int> const& h_fan_out,
                   bool with_replacement = true);

+/**
+ * @brief Multi-GPU Uniform Neighborhood Sampling.


Is this really a Multi-GPU only thing or for both SG & MG

Both. Updated the comment.

seunghwak · 2022-05-11T20:33:10Z

cpp/src/detail/shuffle_wrappers.cu

    handle.get_stream());

  return d_rx_vertices;
 }

+template <typename vertex_t>
+rmm::device_uvector<vertex_t> shuffle_vertices_by_gpu_id(raft::handle_t const& handle,


Should we better rename this to shuffle_ext_vertices_by_gpu_id?

Done in next push

seunghwak · 2022-05-11T20:34:04Z

cpp/include/cugraph/detail/graph_utils.cuh

@@ -47,6 +48,22 @@ struct compute_gpu_id_from_vertex_t {
  }
 };

+template <typename vertex_t>
+struct compute_gpu_id_from_int_vertex_t {


Should we better rename other functors working on external vertex IDs to ext_vertex_t and ext_edge_t?

Done for vertex in the next push.

Do we ever try and use these functors on an int_edge_t? I'm inclined not to add the ext to the name unless we need to distinguish.

Gotcha, agreed.

seunghwak · 2022-05-11T20:36:35Z

cpp/include/cugraph/detail/graph_utils.cuh

+template <typename vertex_t>
+struct compute_gpu_id_from_int_vertex_t {
+  vertex_t const* vertex_partition_range_lasts;
+  size_t num_vertex_partitions;


yeah... maybe just a FIXME statement, but we should eventually replace this (pointer, size) pairs to raft::device_span.

Changed to span in the next push.

seunghwak · 2022-05-11T20:39:18Z

cpp/src/detail/utility_wrappers.cu

+                    zip_iter,
+                    zip_iter + d_vertices.size(),
+                    zip_iter,
+                    [] __device__(auto pair) { return thrust::get<1>(pair) > 0; });


FYI: https://github.com/NVIDIA/thrust/issues/1302
Maybe do copy_if in chunks or add check for d_vertices.size() and throw an exception if d_vertices.size() overflows 32 bit integer (if you expect this will unlikely to happen and we'd better wait for thrust folks to fix this).

And I guess thrust::remove_if is more intuitive than copy_if here (unless you willing to copy in chunks). You may look for https://github.com/rapidsai/cugraph/pull/2253/files#diff-ce8c8b8ffdc670a97313ca4ce20de7bf8a18daa81f5a1fde50f3b162bf75b75bR1238

#if 1 // FIXME: work-around for the 32 bit integer overflow issue in thrust::remove, // thrust::remove_if, and thrust::copy_if (https://github.com/NVIDIA/thrust/issues/1302) rmm::device_uvector<vertex_t> tmp_indices( thrust::count_if(handle.get_thrust_policy(), nbr_intersection_indices.begin(), nbr_intersection_indices.end(), detail::not_equal_t<vertex_t>{invalid_vertex_id<vertex_t>::value}), handle.get_stream()); size_t num_copied{0}; size_t num_scanned{0}; while (num_scanned < nbr_intersection_indices.size()) { size_t this_scan_size = std::min( size_t{1} << 30, static_cast<size_t>(thrust::distance(nbr_intersection_indices.begin() + num_scanned, nbr_intersection_indices.end()))); num_copied += static_cast<size_t>(thrust::distance( tmp_indices.begin() + num_copied, thrust::copy_if(handle.get_thrust_policy(), nbr_intersection_indices.begin() + num_scanned, nbr_intersection_indices.begin() + num_scanned + this_scan_size, tmp_indices.begin() + num_copied, detail::not_equal_t<vertex_t>{invalid_vertex_id<vertex_t>::value}))); num_scanned += this_scan_size; } nbr_intersection_indices = std::move(tmp_indices); #else nbr_intersection_indices.resize( thrust::distance(nbr_intersection_indices.begin(), thrust::remove(handle.get_thrust_policy(), nbr_intersection_indices.begin(), nbr_intersection_indices.end(), invalid_vertex_id<vertex_t>::value)), handle.get_stream()); #endif

Switched to remove_if.

Seems unlikely to have an overflow issue, at least with current memory sizes, as the number of elements in a vertex array on each partition is likely to be < 2^31-1. But I added a FIXME so we can remember.

Maybe add CUGRAPH_EXPECTS(d_vertices.size() < std::numeric_limit<int32_t>::max()) as well. I agree that this is unlikely to happen, but if this happens in user side or large scale testing, it is very difficult for us to figure out this is actually due to the overflow. With the check, it will be way easier to figure out what went awry.

Added the CUGRAPH_EXPECTS here and the other two places where I call remove_if.

seunghwak

Review Part 2

seunghwak · 2022-05-11T20:49:36Z

cpp/src/sampling/detail/graph_functions.hpp

+namespace detail {
+
+/**
+ * @brief Compute local out degrees of the majors belonging to the adjacency matrices


Need to double check but I guess this computes out-degrees if major == source and in-degrees if major == destination.

Yes, I think that's correct. The sampling code forces store_transposed=false, so this function assumes that.

I'm not sure that's a good long-term assumption (feels like sampling on incoming vertices would be a reasonable thing to do). But at the moment this is sufficient.

Perhaps a FIXME to address this later?

I added a FIXME near the beginning of these function definitions to reflect that we should revisit this.

seunghwak · 2022-05-11T20:50:09Z

cpp/src/sampling/detail/graph_functions.hpp

+ * @param handle RAFT handle object to encapsulate resources (e.g. CUDA stream, communicator, and
+ * handles to various CUDA libraries) to run graph algorithms.
+ * @param graph_view Non-owning graph object.
+ * @return A single vector containing the local out degrees of the majors belong to the adjacency


out degrees may not be accurate here.

Same observation as above, store_tranposed=false for the sampling algorithms.

seunghwak · 2022-05-11T21:20:33Z

cpp/src/sampling/detail/graph_functions.hpp

+  const rmm::device_uvector<typename GraphViewType::edge_type>& global_out_degrees);
+
+/**
+ * @brief Gather active majors across gpus in a column communicator


Is this gather or allgather (the results will be stored only in root or every process in the column communicator?)? If allgather, better rename to avoid confusion.

Fixed in next push

seunghwak · 2022-05-11T21:21:12Z

cpp/src/sampling/detail/graph_functions.hpp

+                                                   rmm::device_uvector<vertex_t>&& d_in);
+
+/**
+ * @brief Return global out degrees of active majors


Need to double check "out" degrees here is correct.

Sampling forces store_transposed=false

seunghwak · 2022-05-11T21:24:01Z

cpp/src/sampling/detail/sampling_utils_impl.cuh

+
+template <typename vertex_t>
+rmm::device_uvector<vertex_t> gather_active_majors(raft::handle_t const& handle,
+                                                   rmm::device_uvector<vertex_t>&& d_in)


OK, this is using allgatherv, so this function should better be renamed to "allgather_active_majors".

Fixed in next push

seunghwak · 2022-05-11T21:31:34Z

cpp/src/sampling/detail/sampling_utils_impl.cuh

+template <typename GraphViewType>
+rmm::device_uvector<typename GraphViewType::edge_type> compute_local_major_degrees(
+  raft::handle_t const& handle, GraphViewType const& graph_view)
+{


So, this code is pretty much https://github.com/rapidsai/cugraph/blob/branch-22.06/cpp/src/structure/graph_view_impl.cuh#L88 less col_comm.reduce(...) (https://github.com/rapidsai/cugraph/blob/branch-22.06/cpp/src/structure/graph_view_impl.cuh#L155).

Better to re-factor (or at least FIXME) to avoid code-duplication?

I'll add a FIXME.

I actually think much of this logic should be moved into the graph_view, we assume too much regarding implementation in these functions.

seunghwak · 2022-05-11T21:35:28Z

cpp/src/sampling/detail/sampling_utils_impl.cuh

+
+    auto compacted_length = thrust::distance(
+      input_iter,
+      thrust::remove_if(


FYI: The current version of thrust::remove_if does not work properly if minors.size() overflows 32bit integer.

Added FIXME to both of these remove_if calls in this file (both branches of the if)

seunghwak · 2022-05-11T21:39:25Z

cpp/src/sampling/uniform_neighbor_sampling_impl.hpp

+    thrust::make_optional(rmm::device_uvector<weight_t>(0, handle.get_stream()));
+
+  size_t level{0};
+  size_t num_rows{1};


Better re-name this to row_comm_size (this is more of a consistency thing).

codecov-commenter · 2022-05-12T08:30:52Z

Codecov Report

Merging #2258 (bad27dc) into branch-22.06 (e906c98) will decrease coverage by 5.93%.
The diff coverage is n/a.

@@               Coverage Diff                @@
##           branch-22.06    #2258      +/-   ##
================================================
- Coverage         69.91%   63.97%   -5.94%     
================================================
  Files               175      100      -75     
  Lines             11503     4436    -7067     
================================================
- Hits               8042     2838    -5204     
+ Misses             3461     1598    -1863

Impacted Files	Coverage Δ
python/pylibcugraph/pylibcugraph/_version.py	`0.00% <0.00%> (ø)`
python/cugraph/cugraph/tests/test_hypergraph.py
...ugraph/cugraph/tests/test_maximum_spanning_tree.py
python/cugraph/cugraph/tests/test_core_number.py
python/cugraph/cugraph/tests/mg/test_mg_hits.py
...hon/cugraph/cugraph/tests/test_k_truss_subgraph.py
.../cugraph/cugraph/tests/test_subgraph_extraction.py
python/cugraph/cugraph/tests/test_ecg.py
...ython/cugraph/cugraph/tests/test_triangle_count.py
...thon/pylibcugraph/pylibcugraph/tests/test_utils.py
... and 71 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e906c98...bad27dc. Read the comment docs.

…d, multi_gpu>

seunghwak

Looks good to me except for few minor complaints about documentation.

seunghwak · 2022-05-13T16:13:38Z

cpp/include/cugraph/algorithms.hpp

+ * @param handle RAFT handle object to encapsulate resources (e.g. CUDA stream, communicator, and
+ * handles to various CUDA libraries) to run graph algorithms.
+ * @param graph_view Graph View object to generate NBR Sampling on.
+ * @param d_starting_vertices Device span of starting vertex IDs for the NBR Sampling.


d_starting_vertices=>starting_vertices as we renamed the input parameters.

seunghwak · 2022-05-13T16:13:53Z

cpp/include/cugraph/algorithms.hpp

+ * @param graph_view Graph View object to generate NBR Sampling on.
+ * @param d_starting_vertices Device span of starting vertex IDs for the NBR Sampling.
+ * @param h_fan_out Host span defining branching out (fan-out) degree per source vertex for each
+ * level


h_fan_out to fan_out.

seunghwak · 2022-05-13T16:18:30Z

cpp/include/cugraph/detail/decompress_edge_partition.cuh

@@ -350,6 +365,10 @@ void partially_decompress_edge_partition_to_fill_edgelist(
        thrust::fill(
          thrust::seq, majors + major_offset, majors + major_offset + local_degree, major);
        thrust::copy(thrust::seq, indices, indices + local_degree, minors + major_offset);
+        if (weights)


This can lead to thread-divergence if local_degree values vary significantly within the threads in a single Warp. May add a FIXME statement. I have the same issue in Triangle Counting implementation (https://github.com/rapidsai/cugraph/pull/2253/files#diff-ce8c8b8ffdc670a97313ca4ce20de7bf8a18daa81f5a1fde50f3b162bf75b75bR434).

You may add a similar FIXME. Later, we may address this together by adding something like (delayed) segmented_copy(or fill).

ChuckHastings · 2022-05-13T19:51:44Z

@gpucibot merge

first cut at refactoring. Move old functions to cugraph::detail::orig…

040e2d0

…inal, add new functions with cleaned up API

ChuckHastings requested review from a team as code owners April 30, 2022 00:56

ChuckHastings self-assigned this Apr 30, 2022

ChuckHastings added 2 - In Progress improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Apr 30, 2022

ChuckHastings added this to the 22.06 milestone Apr 30, 2022

ChuckHastings added 5 commits May 9, 2022 10:31

added SG implementation, compiles and the test executes, still need t…

f6e4020

…o validate

added SG validation routine

0c53129

debugged MG test

1f438ae

some code cleanup

977df06

some code cleanup

cf5f203

ChuckHastings changed the title ~~[skip-ci] Refactor Uniform Neighborhood Sampling~~ Refactor Uniform Neighborhood Sampling May 11, 2022

ChuckHastings added 3 - Ready for Review and removed 2 - In Progress labels May 11, 2022

ChuckHastings added 5 commits May 11, 2022 13:41

Merge branch 'branch-22.06' into fea_uniform_neighborhood_sampling_re…

58b856e

…factor

fix clang-format issues

517c7ed

broke something in last merge

b5eb863

fix clang-format issues

eeaa14c

SG uniform neighbor sampling now works

bc1a358

seunghwak reviewed May 11, 2022

View reviewed changes

remove dependence on MPI from SG code

dfa3a50

ChuckHastings added 4 commits May 12, 2022 17:33

address PR comments

dcc87cb

fix clang-format issues

000d811

switch templating to use <vertex_t, edge_t, weight_t, store_transpose…

67844a1

…d, multi_gpu>

fix clang-format issues

7acf024

add CUGRAPH_EXPECTS to verify that remove_if won't fail

d8f69c8

seunghwak approved these changes May 13, 2022

View reviewed changes

a few doc changes based on PR comments

bad27dc

rlratzel mentioned this pull request May 13, 2022

update neighborhood sampling in pylibcugraph and cugraph to match latest C API #2272

Closed

rapids-bot bot merged commit 4a6263a into rapidsai:branch-22.06 May 13, 2022

ChuckHastings deleted the fea_uniform_neighborhood_sampling_refactor branch August 4, 2022 18:26

Refactor Uniform Neighborhood Sampling #2258

Refactor Uniform Neighborhood Sampling #2258

Conversation

ChuckHastings commented Apr 30, 2022 • edited Loading

seunghwak left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seunghwak left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented May 12, 2022 • edited Loading

Codecov Report

seunghwak left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ChuckHastings commented May 13, 2022

ChuckHastings commented Apr 30, 2022 •

edited

Loading

codecov-commenter commented May 12, 2022 •

edited

Loading