Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transpose #1834

Merged
merged 47 commits into from
Oct 15, 2021
Merged

Transpose #1834

merged 47 commits into from
Oct 15, 2021

Conversation

seunghwak
Copy link
Contributor

@seunghwak seunghwak commented Sep 22, 2021

  • implement transpose_edgelist
  • implement graph_t::transpose() using transpose_edgelist
  • implement C++ test cases

@seunghwak seunghwak added feature request New feature or request 2 - In Progress non-breaking Non-breaking change labels Sep 22, 2021
@seunghwak seunghwak added this to the 21.10 milestone Sep 22, 2021
@seunghwak seunghwak requested review from a team as code owners September 22, 2021 19:00
@BradReesWork BradReesWork modified the milestones: 21.10, 21.12 Sep 28, 2021
@BradReesWork BradReesWork changed the base branch from branch-21.10 to branch-21.12 September 28, 2021 15:43
@seunghwak seunghwak added DO NOT MERGE Hold off on merging; see PR for details and removed 2 - In Progress labels Oct 5, 2021
@seunghwak seunghwak changed the title [WIP][skip-ci] transpose Transpose Oct 6, 2021
@seunghwak seunghwak removed the DO NOT MERGE Hold off on merging; see PR for details label Oct 15, 2021
@seunghwak
Copy link
Contributor Author

@rlratzel

Could you review the API of transpose_storage functions?

https://github.com/rapidsai/cugraph/pull/1834/files#diff-f09339cd755b9bc46cc805cf64d1850fd9dff92a185fb05b0e9e70850752f13fR110
https://github.com/rapidsai/cugraph/pull/1834/files#diff-f09339cd755b9bc46cc805cf64d1850fd9dff92a185fb05b0e9e70850752f13fR243

With this function, we don't need to lazily create a graph. We create a CSR-ish graph by default (with an option to create a CSC-ish graph for advanced users or for benchmarking). If an algorithm requires a CSC-ish graph, we can use call this function to change the storage format.

@seunghwak seunghwak requested a review from rlratzel October 15, 2021 17:39
* @return rmm::device_uvector<vertex_t> Return a new renumber map (to recover the original vertex
* IDs).
*/
rmm::device_uvector<vertex_t> transpose(raft::handle_t const& handle,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question... this will work for callers that use C++ renumbering.

If we do python renumbering (for multi-column or strings), how would this function work? Would we have to provide a fake renumbering map (where i = i), and then use those values to look up values when we get back to python?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In SG, renumbering is optional, so you can provide std::nullopt.

In MG, renumbering is mandatory, so this question is relevant.

So, this depends on how the python renumbering is implemented. So, as you know, the C++ renumbering routine renumbers in a specific way to exploit this in optimizations as well. If the python renumbering satisfies all these requirements, we can create an identity mapping renumbering map (as you suggested) or look for a better option.

But if asking the python renumbering to satisfy all the detailed renumbering requirements is not very desirable, we may consider the renumbering process as the following.

Python renumbering
Multi-column AND|OR string IDs => integer IDs

C++ renumbering
Integer IDs to Integer IDs (mainly for optimization and can be mostly transparent to python users)

Then, really not that much changes with/or without python renumbering. This may add some additional overhead, but I guess String/multi-column renumbering is much more expensive than integer-to-integer renumbering and this leads to cleaner software architecture.

We run python renumbering, and then, run analytics as we have integer vertex ID inputs, and unrenumber to strings AND|OR multi-columns only when we need to return results to users.

What do you think and any concerns?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this really depends on how python renumbering is implemented, but if python renumbering just maps multi-column/string IDs to integers, it just needs to find unique IDs, and create a mapping between original IDs to integer IDs.

If python renumbering also sorts vertex IDs within a partition by their degrees (as C++ layer uses this for various optimizations), it needs to compute the degrees for each unique vertex IDs, and I believe this can be done much more efficiently after converting to integer IDs than directly working on multi-column/string IDs.

So, the performance loss may not be really that significant.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I was mainly asking for clarification purposes (making sure we all understand the implications - especially @rlratzel whose team will have to use this feature).

I don't have any concerns over what you described. I believe the current python renumbering satisfies all the detailed renumbering requirements today. I suspect that the new string/multi-column renumbering can also satisfy those requirements. We can, while it is being developed, explore whether a two-phase renumbering would be more efficient or result in lower maintain cost at a minimal performance loss. For now we can just use the identity mapping to keep track of the numbering changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for covering this, I don't have any questions beyond what @ChuckHastings already asked.

*/
std::tuple<graph_t<vertex_t, edge_t, weight_t, !store_transposed, multi_gpu>,
rmm::device_uvector<vertex_t>>
transpose_storage(raft::handle_t const& handle,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this variation returns a new graph and a new renumbering map. Same question about multi-column/string renumbering.

Also, it's a shame that this function can't be const. Would it be better to have two separate functions, one that is const (where destroy is forced to false) and one that is not const (where destroy is forced to be true)? I suppose it depends upon how we plan on using this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... I can create a const one if necessary; not sure it will be actually necessary. If store_transposed is not a template parameter, this can be a void function changing internal representation, so non-const. If one wants to keep both representations, yeah... having a const might be necessary (but this requires 2x memory so undesirable for large scale analysis).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inclined to leave this as is unless/until we discover a need for a const version. I was just initially surprised that it was not... and then realized the destroy = true option would require the ability to modify the contents.

@codecov-commenter
Copy link

codecov-commenter commented Oct 15, 2021

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.12@1c47f70). Click here to learn what that means.
The diff coverage is n/a.

❗ Current head 9f16be4 differs from pull request most recent head becac46. Consider uploading reports for the commit becac46 to get more accurate results
Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.12    #1834   +/-   ##
===============================================
  Coverage                ?   70.08%           
===============================================
  Files                   ?      143           
  Lines                   ?     8817           
  Branches                ?        0           
===============================================
  Hits                    ?     6179           
  Misses                  ?     2638           
  Partials                ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1c47f70...becac46. Read the comment docs.

Copy link
Contributor

@rlratzel rlratzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

* @return rmm::device_uvector<vertex_t> Return a new renumber map (to recover the original vertex
* IDs).
*/
rmm::device_uvector<vertex_t> transpose(raft::handle_t const& handle,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for covering this, I don't have any questions beyond what @ChuckHastings already asked.

@ChuckHastings
Copy link
Collaborator

@gpucibot merge

@rapids-bot rapids-bot bot merged commit a5718b1 into rapidsai:branch-21.12 Oct 15, 2021
@seunghwak seunghwak deleted the fea_transpose branch October 20, 2021 17:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants