-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix MG weighted similarity test failure #4054
Fix MG weighted similarity test failure #4054
Conversation
…to bug_mg_weighted_similarity
@@ -1024,6 +1029,11 @@ remove_self_loops(raft::handle_t const& handle, | |||
* @param edgelist_weights Optional list of edge weights | |||
* @param edgelist_edge_ids Optional list of edge ids | |||
* @param edgelist_edge_types Optional list of edge types | |||
* @param keep_min_value_edge Flag indicating whether to keep an arbitrary edge (false) or the | |||
* minimum value edge (true) among the edges in a multi-edge. Relevant only if @p | |||
* edgelist_wegihts.has_value() | @p edgelist_edge_ids.has_value() | @p |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wegihts => weights
* true, the edge with the minimum value will be selected. The edge weights will be first compared | ||
* (if @p edgelist_weights.has_value() is true); edge IDs will be compared next (if @p | ||
* edgelist_edge_ids.has_value() is true); and edge types (if @p edgelist_edge_types.has_value() is | ||
* true) will compared last. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had thought about this when I put this function in. The decision to select an arbitrary edge was both simpler, and didn't assume what criteria would be desired. But it does limit the usefulness of the feature.
It seems like there are more reasonable choices than just the minimum edge. Here some thoughts:
- Arbitrary (original code, or this code with the flag set to false)
- Minimum edge weight (this code with the flag set to true)
- Maximum edge weight
- Sum of edge weights
- Average edge weight
We also support other edge properties (currently we use edge type and edge id), but the data structures and primitives support arbitrary properties.
Would this better be handled by passing in some sort of struct that creates the sorting and reduction criteria? Making it an optional would allow the arbitrary behavior if std::nullopt
or use the struct if we want some sort of specific criteria.
Thinking off the top of my head (leaving out lots of details), perhaps something like:
struct edge_reduction_t {
...
template ...
auto key_first() { // function that returns iterator to the first sort key }
auto value_first() { // function that returns iterator to the first value key }
auto reduce_function(auto key, auto value) { // returns the reduced edge }
};
Doesn't quite cover all of the cases (averaging would require both summing values and counting values so we could divide at the end... so perhaps there's an optional transform at the end.
I'd be fine marking this with a FIXME and letting the smallest value fix our immediate problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, I was also thinking about edge ID & type... If we want to maintain a symmetric graph, should we also assume that the reverse edge to have the same ID & type? (or this can really be a case by case thing?)
We can discuss what options are really necessary in cuGraph's context. NetworkX supports "arbitrary".
"If both edges exist in digraph and their edge data is different, only one edge is created with an arbitrary choice of which edge data to use."
https://networkx.org/documentation/stable/reference/classes/generated/networkx.DiGraph.to_undirected.html
And I wasn't sure how much should we go beyond this without clear use cases, but now we have at least one use case (maintaining weight symmetry). We should clearly make updates if we can identify more use cases (e.g. anything related to edge ID & types).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created an issue to track this idea. Definitely need to see what kind of use cases would drive this feature before we spend to much time on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to maintain a symmetric graph, should we also assume that the reverse edge to have the same ID & type? (or this can really be a case by case thing?)
Sorry to interject on this late but the python API has a tangential issue. In fact, the python API does not symmetrize edges that have edge_ids
under the assumption that each edge has a unique edge id and will throw an exception if the user attempts to create such undirected graph. If the reverse edge has the same ID, wouldn't that violate uniqueness of the edge_ids?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this really depends on the application context. If a user considers an undirected edge as a single edge, representing this single edge as two edges with the opposite direction is just an internal implementation issue and those two edges may have the same edge ID. If a user considers a directed graph which happens to be symmetric, then, the two edges may better have two different edge IDs. A similar issue can happen with edge types.
I assume this needs more in-depth discussion.
…to bug_mg_weighted_similarity
…to bug_mg_weighted_similarity
@@ -1038,6 +1048,7 @@ remove_multi_edges(raft::handle_t const& handle, | |||
rmm::device_uvector<vertex_t>&& edgelist_dsts, | |||
std::optional<rmm::device_uvector<weight_t>>&& edgelist_weights, | |||
std::optional<rmm::device_uvector<edge_t>>&& edgelist_edge_ids, | |||
std::optional<rmm::device_uvector<edge_type_t>>&& edgelist_edge_types); | |||
std::optional<rmm::device_uvector<edge_type_t>>&& edgelist_edge_types, | |||
bool keep_min_value_edge = false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this flag also be exposed to the CAPI? In fact the new cugraph_graph_create_sg
takes drop_multi_edges
however the users on the higher stack (C, PLC, python) won't be able to control this parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, yes, if the graph is symmetric, we need to set keep_min_value_edge to true to maintain symmetry. See a68868f for the update.
…to bug_mg_weighted_similarity
/merge |
MG weighted similarity tests assume symmetric graphs (undirected). When we remove multi-edges, we pick arbitrary edges and this can lead to an asymmetry in edge weights. This PR adds an additional flag to keep minimum value edges in
remove_multi_edges
and use this function if the input graph is symmetric to maintain weight symmetry.Applying the non-breaking label as this PR does not change existing code behavior if
keep_min_value_edge
is not provided.