Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Implement HITS using the graph primitives #1657

Closed
4 tasks
ChuckHastings opened this issue Jun 8, 2021 · 1 comment · Fixed by #1898
Closed
4 tasks

[ENH] Implement HITS using the graph primitives #1657

ChuckHastings opened this issue Jun 8, 2021 · 1 comment · Fixed by #1898
Assignees
Labels
improvement Improvement / enhancement to an existing function
Milestone

Comments

@ChuckHastings
Copy link
Collaborator

Our current implementation of HITS uses the gunrock implementation which is limited to 1 CPU.

Inherently HITS is the same computational pattern as Pagerank and Katz Centrality. It should be possible to implement HITS using the graph primitives we use for Pagerank and Katz, giving us a HITS computation that can scale to multiple GPUs.

Suggest a sequence of PRs for this:

  • Move the existing C++ implementation into a cugraph::legacy namespace, updating the python to reference the legacy implementation. Define the new C++ API, define C++ unit tests using the C++ API.
  • Implement the algorithm in C++, all unit tests should pass
  • Implement the cython/python changes to support the new API
  • Delete the legacy implementation
@ChuckHastings ChuckHastings added the ? - Needs Triage Need team to review and classify label Jun 8, 2021
@ChuckHastings ChuckHastings added improvement Improvement / enhancement to an existing function multi-GPU labels Jun 8, 2021
@ChuckHastings ChuckHastings added this to the 21.08 milestone Jun 8, 2021
@seunghwak
Copy link
Contributor

To give little more detail about the implementation strategy,

Compare

https://github.com/networkx/networkx/blob/main/networkx/algorithms/centrality/katz.py#L172

        for n in x:
            for nbr in G[n]:
                x[nbr] += xlast[n] * G[n][nbr].get(weight, 1)

https://github.com/networkx/networkx/blob/main/networkx/algorithms/link_analysis/hits_alg.py#L116

        for n in h:
            for nbr in G[n]:
                a[nbr] += hlast[n] * G[n][nbr].get("weight", 1)
        # now multiply h=Ga        
        for n in h:            
            for nbr in G[n]:
                h[n] += a[nbr] * G[n][nbr].get("weight", 1)

The first part of the HITS's core implementation is practically identical to katz, but HITS has the second part as well.

Katz implements the first part using
copy_v_transform_reduce_in_nbr (assuming that the matrix is stored transposed)
https://github.com/rapidsai/cugraph/blob/branch-21.08/cpp/src/experimental/katz_centrality.cu#L106

but HITS will require calling both

copy_v_transform_reduce_in_nbr and copy_v_transform_reduce_out_nbr (the order will vary based on whether we assume the graph adjacency matrix is stored as is (row-major) or transposed (column-major). For PageRank & Katz, storing the transposed matrix is beneficial in performance to avoid atomics, but for HITS, there is no benefit in performance as we are using both copy_v_transform_reduce_in_nbr and copy_v_transform_reduce_out_nbr.

@ChuckHastings ChuckHastings modified the milestones: 21.08, 21.10 Jul 20, 2021
@BradReesWork BradReesWork removed the ? - Needs Triage Need team to review and classify label Aug 5, 2021
@ChuckHastings ChuckHastings modified the milestones: 21.10, 21.12 Sep 22, 2021
@rapids-bot rapids-bot bot closed this as completed in #1898 Nov 8, 2021
rapids-bot bot pushed a commit that referenced this issue Nov 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants