New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Ktruss implementation #4059

Merged

rapids-bot merged 169 commits into rapidsai:branch-24.04 from jnke2016:branch-24.02_ktruss

Mar 26, 2024

Contributor

jnke2016 commented Dec 13, 2023 •

edited by ChuckHastings

Loading

Implements SG and MG ktruss using graph primitives and drop cuHornet.

Closes #3447
Closes #3448
Closes #3449
Closes #3450
Closes #3451
Closes #3452
Closes #3453

jnke2016 added 4 commits

December 13, 2023 07:32


          define ktruss and add it to the cmake list

fad7a83


          ktruss implementation

6afb913


          add tests

b68381f


          Merge remote-tracking branch 'upstream/branch-24.02' into branch-23.1…

b5ac46e

…2_ktruss

github-actions bot added cuGraph CMake labels

jnke2016 added 2 commits

December 28, 2023 08:59


          update ktruss implementation

cc284e4


          update implementation

1635f87

seunghwak reviewed

View reviewed changes

Contributor

seunghwak left a comment

Review part 1/2.

cpp/include/cugraph/algorithms.hpp Outdated Show resolved Hide resolved

cpp/include/cugraph/algorithms.hpp Outdated

@@ @@ -1978,6 +1978,26 @@ void triangle_count(raft::handle_t const& handle, @@
                                   raft::device_span<edge_t> counts,
                                   bool do_expensive_check = false);
+              /*
+               * @brief Compute ktruss.

Contributor

seunghwak Jan 8, 2024

ktruss -> K-truss

cpp/include/cugraph/algorithms.hpp Outdated Show resolved Hide resolved

cpp/include/cugraph/algorithms.hpp Outdated Show resolved Hide resolved

cpp/src/community/ktruss_impl.cuh Outdated Show resolved Hide resolved

cpp/src/community/ktruss_impl.cuh Outdated

Comment on lines 476 to 482

+                  rmm::device_uvector<size_t> intersection_offsets(size_t{0}, handle.get_stream());
+                  rmm::device_uvector<vertex_t> intersection_indices(size_t{0}, handle.get_stream());
+                  rmm::device_uvector<edge_t> r_nbr_intersection_property_values0(size_t{0}, handle.get_stream());
+                  rmm::device_uvector<edge_t> r_nbr_intersection_property_values1(size_t{0}, handle.get_stream());
+                  // FIXME: Initially each edge should have an edge property of 0
+                  std::tie(intersection_offsets, intersection_indices) =

Contributor

seunghwak Jan 9, 2024

auto [intersection_offsets, intersection_indices] = detail::nbr_intersection(...);

And you can cut your code size a bit. And better define r_nbr_intersection_property_values0 or 1 right before using them (minimize the scope of a variable).

Contributor

seunghwak Jan 9, 2024

We explicitly define them when we are taking returned variables with std::tie and use std::ignore as we can't use std::ignore with auto [...]. We are not using std::ignore here, so no need to explicitly define variables.

Contributor Author

jnke2016 Jan 9, 2024

I am only using intersection_offsets and intersection_indices so I will remove r_nbr_intersection_property_values0 and r_nbr_intersection_property_values1

cpp/src/community/ktruss_impl.cuh Outdated

Comment on lines 495 to 496

		auto vertex_pair_buffer = allocate_dataframe_buffer<thrust::tuple<vertex_t, vertex_t>>(
		num_vertex_pairs, handle.get_stream());

Contributor

seunghwak Jan 9, 2024

Please do not pre-allocate memory too much in advance. This unnecessarily increase peak memory usage.

Contributor Author

jnke2016 Jan 9, 2024

in order to populate this vertex_pair_buffer with thrust::tabulate am I not required to know its size upfront?

Contributor Author

jnke2016 Jan 9, 2024

auto vertex_pair_buffer_tmp = allocate_dataframe_buffer<thrust::tuple<vertex_t, vertex_t>>(
          intersection_indices.size() * 3, handle.get_stream());

The above pre-allocated memory for (p, q), (p, r) and (q, r) but I can do one group at a time and resize to reduce peak memory usage.

cpp/src/community/ktruss_impl.cuh Outdated

Comment on lines 553 to 565

+                  rmm::device_uvector<vertex_t> num_triangles_(3 * intersection_indices.size(), handle.get_stream());
+                  thrust::fill(handle.get_thrust_policy(), num_triangles_.begin(), num_triangles_.end(), size_t{1});
+                  rmm::device_uvector<vertex_t> num_triangles(num_vertex_pairs, handle.get_stream());
+                  thrust::reduce_by_key(handle.get_thrust_policy(),
+                                        get_dataframe_buffer_begin(vertex_pair_buffer_),
+                                        get_dataframe_buffer_end(vertex_pair_buffer_),
+                                        num_triangles_.begin(),
+                                        get_dataframe_buffer_begin(vertex_pair_buffer),
+                                        num_triangles.begin(),
+                                        thrust::equal_to<thrust::tuple<vertex_t, vertex_t>>{});

Contributor

seunghwak Jan 9, 2024

See the code from https://github.com/rapidsai/cugraph/blob/branch-24.02/cpp/src/sampling/sampling_post_processing_impl.cuh

    auto num_unique_labels = thrust::count_if(
      handle.get_thrust_policy(),
      thrust::make_counting_iterator(size_t{0}),
      thrust::make_counting_iterator((*renumber_map_label_indices).size()),
      detail::is_first_in_run_t<label_index_t const*>{(*renumber_map_label_indices).data()});
    rmm::device_uvector<label_index_t> unique_label_indices(num_unique_labels, handle.get_stream());
    rmm::device_uvector<vertex_t> vertex_counts(num_unique_labels, handle.get_stream());
    thrust::reduce_by_key(handle.get_thrust_policy(),
                          (*renumber_map_label_indices).begin(),
                          (*renumber_map_label_indices).end(),
                          thrust::make_constant_iterator(size_t{1}),
                          unique_label_indices.begin(),
                          vertex_counts.begin());

You can use the same logic to perform this with less memory.

cpp/src/community/ktruss_impl.cuh Outdated

+                  printf("\nbefore sorting\n");
+                  raft::print_device_vector("src", std::get<0>(vertex_pair_buffer_).data(), std::get<0>(vertex_pair_buffer_).size(), std::cout);
+                  raft::print_device_vector("dst", std::get<1>(vertex_pair_buffer_).data(), std::get<1>(vertex_pair_buffer_).size(), std::cout);

Contributor

seunghwak Jan 9, 2024

You need to shuffle the resulting vertex pairs before reducing to be multi-GPU ready.

cpp/src/community/ktruss_impl.cuh Outdated

		num_triangles.begin(),
		thrust::equal_to<thrust::tuple<vertex_t, vertex_t>>{});

Contributor

seunghwak Jan 9, 2024

Now you can run thrust::partition here (Step 7 from #3446 (comment)) to find the edges to unroll.

jnke2016 added 4 commits

January 9, 2024 05:14


          rename to k_truss

fa61f9b


          remove unused variables

87f45be


          remove unsued variables

74d24be


          Merge remote-tracking branch 'upstream/branch-24.02' into branch-24.0…

0656dd9

…2_ktruss

Contributor

seunghwak commented Jan 10, 2024

I thought more about implementing #3446 (comment); especially the steps 6, 7, and 8.

After step 6, sort the (edge source, edge destination, triangle count) triplets using edge destination as the primary key and edge source as the secondary key; this is to implement step 8-2.

Run thrust::stable_partition to place the edges with triangle counts smaller than K-2 at the end. These edges should be "unrolled" and removed (unrolled means undoing their contributions to triangle counts, see the Pierce & Sanders for additional details)<= run thrust::stable_partition instead of thrust::partition, so each partition is still sorted.
While (# edges to be remove > 0)
8-1. Find (p, q, intersection of p&q's neighbor lists) for each edge to be removed. For each vertex r in the intersection, enumerate (p,r,-1) and (q,r,-1)<=call detail::nbr_intersection again but only for the edges to be removed. This covers the case the edge to be removed is a p->q edge. Note that we should eventually run step 6 in chunks as the aggregate size of the intersection indices can far exceed the number of edges. So, we cannot store all the intersection indices and we need to re-compute. This is a memory footprint vs compute trade-off.
8-2. For each edge to be removed (say (q,r)), find all the incoming edges of r, then query the existence of (p,q) assuming p is an incoming neighbor of q. Accumulate (p,q,-1) and (p,r,-1) if (p,q) exists. <= This covers the case the edge to be removed is a q->r edge. We can run thrust::lower_bound & thrust::upper_bound on (s, d, triangle count) triplets to find all the incoming edges of q. Note that we need to run searches on both partitions of the (edge source, edge destination, triangle count) triplet array. In multi-GPU, we need to inspect all the GPUs that can possibly have r as an edge destination. This may sound a bit complicated unless you fully understand the partitioning and you may just add if constexpr (multi_gpu) { CUGRAPH_FAIL("unimplemented."); } for the initial SG implementation. We can come back to this once we validated the SG implementation.
8-3. Similarly for each edge to be removed (say (p,r)), find all the incoming edges of r, then query the existence of (p,q) assuming q is an incoming neighbor of r. Accumulate (p,q,-1) and (q,r,-1). <= This covers the case the edge to be removed is a p->r edge. I assume there will be significant overlap in code for 8-2 & 8-3.
8-4. Shuffle similar to 6-3 in multi-GPU.
8-5. Update triangle counts based on triplets (similar to 6-4 & 6-5).
8-6. Mask out the edges to be deleted. Also, resize the (edge source, edge destination, triangle count) triplet array so the deleted edges will no longer be considered.
8-7. Identify the new set of edges to be deleted by running thrust::stable_partition again on the edge list based on newly update triangle counts.

Please read this and let me know when you are ready to sync again.

seunghwak added 16 commits

January 16, 2024 10:37


          add an empty line between two functions

dfbc33a


          added major_idx_from_major_nocheck

e6f6784


          add initial implementation of has_edge() and compute_multiplicity

c9c3a2b


          Merge branch 'branch-24.02' of https://github.com/rapidsai/cugraph in…

e827457

…to fea_has_edges


          move count_invalid_vertex_pairs to error_check_utils.cuh

06d4f77


          refactor has_edge() and compute_multiplicity()

4b4fb46


          Merge branch 'branch-24.02' of https://github.com/rapidsai/cugraph in…

7f25cfc

…to fea_has_edges


          clang-format and copyright year

a7d0fff


          to_host, to_device specialization for std::vector<bool>

becf133


          remove repetitive tests

5d3ed2a


          fix compile error

e8d0ccc


          add bool specialization for device_gatherv and device_allgatherv

737f438


          add tests for has_edge() and compute_multiplicity

00789bf


          copyright year

0b45356


          Merge branch 'branch-24.02' of https://github.com/rapidsai/cugraph in…

49f46d0

…to fea_has_edges


          bug fix

7cfe60a

jnke2016 added 9 commits

March 24, 2024 14:15


          update datasets paths

fbc5681


          fix style

a4fed55


          remove asymmetric datasets

2100e30


          rename variable

b66bd1a


          avoid illegal memory access

048b408


          rename function for general purpose

3e03255


          remove fixme

8f75316


          combine similar thrust calls

24a6f76


          fix style

f95fc13

ChuckHastings approved these changes

View reviewed changes

Collaborator

ChuckHastings left a comment

Will wait for @seunghwak to approve before merging, but LGTM.

seunghwak reviewed

View reviewed changes

Contributor

seunghwak left a comment

I think this code is now nearly in mergable state. Let's quickly address few minor issues and let this PR pass the CI.

cpp/src/community/edge_triangle_count_sg.cu Outdated

+               * See the License for the specific language governing permissions and
+               * limitations under the License.
+               */
+              #include <community/edge_triangle_count_impl.cuh>

Contributor

seunghwak Mar 25, 2024

Don't forget to address this.

cpp/src/community/k_truss_impl.cuh Outdated Show resolved Hide resolved

cpp/src/community/k_truss_impl.cuh Outdated Show resolved Hide resolved

cpp/src/community/k_truss_impl.cuh Outdated

+                    return dist_valid;
+                  });
+                thrust::exclusive_scan(
+                  handle.get_thrust_policy(), prefix_sum.begin(), prefix_sum.end(), prefix_sum.begin());

Contributor

seunghwak Mar 25, 2024

And you can provide thrust::make_transform_iterator(thrust::make_counting_iterator(size_t{0}), [query_vertices, num_edges = ..., sorted_veritces = ...]__device__(auto idx) { ... }); to thrust::exclusive_sum.

In the current code, you need to write dist values to prefix_sum; read dist values from prefix_sum; and compute prefix sum and write the results back to prefix_sum. (2 writes and 1 read). If you use transform_iterator, 1 write will be sufficient. This is a big cut in memory bandwidth and also reduces kernel launch overhead (2 kernel launches => 1 kernel launch).

Contributor Author

jnke2016 Mar 26, 2024

Got it and thanks for the background information

cpp/src/community/k_truss_impl.cuh Outdated

+                }
+                // FIXME: Remove because it yields to incorrect results
+                // 3. Find (k+1)-core and exclude edges that do not belong to (k+1)-core

Contributor

seunghwak Mar 25, 2024

(k+1)=>(k-1) and re-enable this. This shouldn't yield to incorrect results.

Contributor Author

jnke2016 Mar 25, 2024

I tried and it still yield incorrect results. I tried even k-2

Contributor Author

jnke2016 Mar 26, 2024

I added a fixme for this

cpp/src/community/k_truss_impl.cuh

+                    num_invalid_edges = static_cast<size_t>(
+                      thrust::distance(invalid_transposed_edge_triangle_count_first,
+                                       transposed_edge_triangle_count_pair_first + edgelist_srcs.size()));

Contributor

seunghwak Mar 25, 2024

edge_t num_invalid_edges{0};
      num_invalid_edges = ...

=>

auto num_invalid_edges = ...;

cpp/tests/community/k_truss_test.cpp Outdated

+              #include <rmm/mr/device/cuda_memory_resource.hpp>
+              #include <gtest/gtest.h>
+              #include <utilities/base_fixture.hpp>

Contributor

seunghwak Mar 25, 2024

Don't forget to address this.

cpp/tests/community/k_truss_test.cpp Outdated Show resolved Hide resolved

cpp/tests/community/k_truss_test.cpp Outdated Show resolved Hide resolved

cpp/tests/community/k_truss_test.cpp Outdated

Comment on lines 218 to 246

+                    std::optional<cugraph::graph_t<vertex_t, edge_t, false, false>> modified_graph{std::nullopt};
+                    std::optional<
+                      cugraph::edge_property_t<cugraph::graph_view_t<vertex_t, edge_t, false, false>, weight_t>>
+                      modified_edge_weight{std::nullopt};
+                    std::tie(*modified_graph, modified_edge_weight, std::ignore, std::ignore, std::ignore) =
+                      cugraph::
+                        create_graph_from_edgelist<vertex_t, edge_t, weight_t, edge_t, int32_t, false, false>(
+                          handle,
+                          std::nullopt,
+                          std::move(d_cugraph_src),
+                          std::move(d_cugraph_dst),
+                          std::move(d_cugraph_wgt),
+                          std::nullopt,
+                          std::nullopt,
+                          cugraph::graph_properties_t{true, false},
+                          renumber);
+                    // Convert cugraph results to CSR
+                    auto [h_cugraph_offsets, h_cugraph_indices, h_cugraph_values] =
+                      cugraph::test::graph_to_host_csr(
+                        handle,
+                        (*modified_graph).view(),
+                        modified_edge_weight ? std::make_optional((*modified_edge_weight).view()) : std::nullopt,
+                        std::optional<raft::device_span<vertex_t const>>(std::nullopt));
+                    // Remove isolated vertices.
+                    h_cugraph_offsets.erase(std::unique(h_cugraph_offsets.begin() + 1, h_cugraph_offsets.end()),
+                                            h_cugraph_offsets.end());  // CSR start from 0

Contributor

seunghwak Mar 25, 2024

Should we really do this? Can we just compare cugraph COO output and reference COO output?

Contributor Author

jnke2016 Mar 25, 2024

Ya but that means I will convert the reference ktruss to COO.

Contributor Author

jnke2016 Mar 26, 2024

I think I will have to rely on csr comparison because my cugraph results are not sorted. I get the same concern I had last week regarding sorting edges.

Contributor Author

jnke2016 Mar 26, 2024

Never mind. I extended the sorting utility functions in thrust_wrapper like you suggested.

jnke2016 added 15 commits

March 25, 2024 18:23


          reduce the number of memory accesses and kernel launch

3f6928d


          rename functor

56d5010


          update include paths

21b7b84


          update include path and rename variables

59546b7


          remove device functions from reference implementation

b3ba407


          drop cuHornet and support all archs

5dba8b3


          fix style

8dae624


          fix warning

b5f0d3a


          extend sorting utility function

c9ada76


          perform 'coo' comparison instead of 'csr'

ce8e594


          fix style

9b51e5b


          enable 'int64_t' support for K-Truss

65024ea


          update docstring

1d8c5d9


          update copyright

aa327f9


          update docstring

4549f10

seunghwak approved these changes

View reviewed changes

Contributor

seunghwak left a comment

LGTM and Thanks for all the hard work!!!

cpp/tests/community/k_truss_test.cpp

Comment on lines +75 to +77

+                k_truss_reference(std::vector<vertex_t> h_offsets,
+                                  std::vector<vertex_t> h_indices,
+                                  std::optional<std::vector<weight_t>> h_values,

Contributor

seunghwak Mar 26, 2024

No need to address this in this PR, but just fyi,

You're passing h_offsets, h_indices, and h_values by value and based on the size of the vectors, this can be expensive. You can pass a const reference (const &) instead.

Contributor Author

jnke2016 Mar 26, 2024

Right. I will resolve it in my next PR

rlratzel approved these changes

View reviewed changes

Contributor

rlratzel commented Mar 26, 2024

/merge

rapids-bot bot merged commit f753e51 into rapidsai:branch-24.04

147 of 153 checks passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CMake cuGraph improvement non-breaking