-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transpose #1834
Transpose #1834
Conversation
seunghwak
commented
Sep 22, 2021
•
edited
Loading
edited
- implement transpose_edgelist
- implement graph_t::transpose() using transpose_edgelist
- implement C++ test cases
…atrix (no change in actual grpah but just convert graph format between CSR-ish & CSC-ish)
…lue to 0 as this value should always be 0 (this parameter exists to be compatible with the MG case
Could you review the API of transpose_storage functions? https://github.com/rapidsai/cugraph/pull/1834/files#diff-f09339cd755b9bc46cc805cf64d1850fd9dff92a185fb05b0e9e70850752f13fR110 With this function, we don't need to lazily create a graph. We create a CSR-ish graph by default (with an option to create a CSC-ish graph for advanced users or for benchmarking). If an algorithm requires a CSC-ish graph, we can use call this function to change the storage format. |
* @return rmm::device_uvector<vertex_t> Return a new renumber map (to recover the original vertex | ||
* IDs). | ||
*/ | ||
rmm::device_uvector<vertex_t> transpose(raft::handle_t const& handle, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question... this will work for callers that use C++ renumbering.
If we do python renumbering (for multi-column or strings), how would this function work? Would we have to provide a fake renumbering map (where i = i
), and then use those values to look up values when we get back to python?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In SG, renumbering is optional, so you can provide std::nullopt.
In MG, renumbering is mandatory, so this question is relevant.
So, this depends on how the python renumbering is implemented. So, as you know, the C++ renumbering routine renumbers in a specific way to exploit this in optimizations as well. If the python renumbering satisfies all these requirements, we can create an identity mapping renumbering map (as you suggested) or look for a better option.
But if asking the python renumbering to satisfy all the detailed renumbering requirements is not very desirable, we may consider the renumbering process as the following.
Python renumbering
Multi-column AND|OR string IDs => integer IDs
C++ renumbering
Integer IDs to Integer IDs (mainly for optimization and can be mostly transparent to python users)
Then, really not that much changes with/or without python renumbering. This may add some additional overhead, but I guess String/multi-column renumbering is much more expensive than integer-to-integer renumbering and this leads to cleaner software architecture.
We run python renumbering, and then, run analytics as we have integer vertex ID inputs, and unrenumber to strings AND|OR multi-columns only when we need to return results to users.
What do you think and any concerns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this really depends on how python renumbering is implemented, but if python renumbering just maps multi-column/string IDs to integers, it just needs to find unique IDs, and create a mapping between original IDs to integer IDs.
If python renumbering also sorts vertex IDs within a partition by their degrees (as C++ layer uses this for various optimizations), it needs to compute the degrees for each unique vertex IDs, and I believe this can be done much more efficiently after converting to integer IDs than directly working on multi-column/string IDs.
So, the performance loss may not be really that significant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I was mainly asking for clarification purposes (making sure we all understand the implications - especially @rlratzel whose team will have to use this feature).
I don't have any concerns over what you described. I believe the current python renumbering satisfies all the detailed renumbering requirements today. I suspect that the new string/multi-column renumbering can also satisfy those requirements. We can, while it is being developed, explore whether a two-phase renumbering would be more efficient or result in lower maintain cost at a minimal performance loss. For now we can just use the identity mapping to keep track of the numbering changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for covering this, I don't have any questions beyond what @ChuckHastings already asked.
*/ | ||
std::tuple<graph_t<vertex_t, edge_t, weight_t, !store_transposed, multi_gpu>, | ||
rmm::device_uvector<vertex_t>> | ||
transpose_storage(raft::handle_t const& handle, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this variation returns a new graph and a new renumbering map. Same question about multi-column/string renumbering.
Also, it's a shame that this function can't be const. Would it be better to have two separate functions, one that is const (where destroy is forced to false) and one that is not const (where destroy is forced to be true)? I suppose it depends upon how we plan on using this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... I can create a const one if necessary; not sure it will be actually necessary. If store_transposed is not a template parameter, this can be a void function changing internal representation, so non-const. If one wants to keep both representations, yeah... having a const might be necessary (but this requires 2x memory so undesirable for large scale analysis).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm inclined to leave this as is unless/until we discover a need for a const version. I was just initially surprised that it was not... and then realized the destroy = true
option would require the ability to modify the contents.
Codecov Report
@@ Coverage Diff @@
## branch-21.12 #1834 +/- ##
===============================================
Coverage ? 70.08%
===============================================
Files ? 143
Lines ? 8817
Branches ? 0
===============================================
Hits ? 6179
Misses ? 2638
Partials ? 0 Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
* @return rmm::device_uvector<vertex_t> Return a new renumber map (to recover the original vertex | ||
* IDs). | ||
*/ | ||
rmm::device_uvector<vertex_t> transpose(raft::handle_t const& handle, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for covering this, I don't have any questions beyond what @ChuckHastings already asked.
@gpucibot merge |