Incorrect results return by the similarity algorithms #3708

jnke2016 · 2023-07-14T15:54:23Z

Version

23.08

Which installation method(s) does this occur on?

Docker, Conda, Pip, Source

Describe the bug.

All the Similarity algorithms return incorrect results when calling the method number_of_vertices() before calling the algorithm if the vertices are not numbered from 0 to number_of_vertices -1. In fact, Jaccard for instance gets the number of vertices from the offsets array since the legacy version creates a graph from an adjacency list internally But when we explicitly call the method number_of_vertices() prior , this sets the number of vertices to a value different than what the offsets counterpart would do if the vertices are not renumbered (the offsets and indices array are different based on the number of vertices provided when creating those arrays.

Minimum reproducible example

data = 'PATH_TO/karate.csv'
gdf = cudf.read_csv(data, delimiter=' ', names=['src', 'dst', 'wt'], dtype=['int32', 'int32', 'float']  )
gdf = gdf.add(1)
G = cugraph.Graph(directed=False)
G.from_cudf_edgelist(gdf, source='src', destination='dst', renumber=True)
G.number_of_vertices()
sdf = cugraph.sorensen(G)
sdf.max()

Relevant log output

Case 1: don’t add 1 to gdf:
graph_offsets=[0,16,25,35,41,44,48,52,56,61,63,66,67,69,74,76,78,80,82,84,87,89,91,93,98,101,104,106,110,113,117,121,127,139];
graph_indices=[1,2,3,4,5,6,7,8,10,11,12,13,17,19,21,31,0,2,3,7,13,17,19,21,30,0,1,3,7,8,9,13,27,28,32,0,1,2,7,12,13,0,6,10,0,6,10,16,0,4,5,16,0,1,2,3,0,2,30,32,33,2,33,0,4,5,0,0,3,0,1,2,3,33,32,33,32,33,5,6,0,1,32,33,0,1,33,32,33,0,1,32,33,25,27,29,32,33,25,27,31,23,24,31,29,33,2,23,24,33,2,31,33,23,26,32,33,1,8,32,33,0,24,25,28,32,33,2,8,14,15,18,20,22,23,29,30,31,33,8,9,13,14,15,18,19,20,22,23,26,27,28,29,30,31,32];

Case 2: add 1 to gdf:
graph_offsets=[0,0,16,25,35,41,44,48,52,56,61,63,66,67,69,74,76,78,80,82,84,87,89,91,93,98,101,104,106,110,113,117,121,127];
graph_indices=[2,3,4,5,6,7,8,9,11,12,13,14,18,20,22,32,1,3,4,8,14,18,20,22,31,1,2,4,8,9,10,14,28,29,33,1,2,3,8,13,14,1,7,11,1,7,11,17,1,5,6,17,1,2,3,4,1,3,31,33,34,3,34,1,5,6,1,1,4,1,2,3,4,34,33,34,33,34,6,7,1,2,33,34,1,2,34,33,34,1,2,33,34,26,28,30,33,34,26,28,32,24,25,32,30,34,3,24,25,34,3,32,34,24,27,33,34,2,9,33,34,1,25,26,29,33,34,3,9,15,16,19,21,23,24,30,31,32,34,9,10,14,15,16,19,20,21,23,24,27,28,29,30,31,32,33];

The print from case 2 suggests that the graph is not, in fact, renumbered.  Thus there are 35 vertices (from 0 to 34).  The fact that we overwrite the number of vertices to 34 results in incorrect behavior

Code of Conduct

I agree to follow cuGraph's Code of Conduct
I have searched the open bugs and have found no duplicates for this bug report

The text was updated successfully, but these errors were encountered:

kingmesal · 2023-08-03T15:46:45Z

Is this resolved?

jnke2016 · 2023-08-04T16:27:53Z

This issue is not resolved yet but we made progress in 23.08 to deliver a full fledged solution. In fact, we got the last piece that enables support for weights in the C++ layer which was merged during burndown. Prior to that as a safety guard, we merged a PR highlighting that the similarity algorithms don't currently support certain datasets and we throw an exception if this condition is not satisfied. Once the roadmap of 23.10 release is establish, we will have a better estimate as to when this will be resolved.

jnke2016 · 2023-09-26T16:54:18Z

Resolved by PR 3828.

jnke2016 added ? - Needs Triage Need team to review and classify bug Something isn't working labels Jul 14, 2023

acostadon mentioned this issue Jul 14, 2023

Created landing spot for centrality and similarity algorithms #3620

Merged

BradReesWork removed the ? - Needs Triage Need team to review and classify label Jul 17, 2023

kingmesal assigned jnke2016 Aug 3, 2023

BradReesWork changed the title ~~[BUG]: Incorrect results return by the similarity algorithms~~ Incorrect results return by the similarity algorithms Aug 3, 2023

ChuckHastings mentioned this issue Aug 7, 2023

Algorithm Migration - Jaccard #3260

Closed

rlratzel assigned naimnv Aug 23, 2023

jnke2016 closed this as completed Sep 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect results return by the similarity algorithms #3708

Incorrect results return by the similarity algorithms #3708

jnke2016 commented Jul 14, 2023 •

edited

Loading

kingmesal commented Aug 3, 2023

jnke2016 commented Aug 4, 2023

jnke2016 commented Sep 26, 2023

Incorrect results return by the similarity algorithms #3708

Incorrect results return by the similarity algorithms #3708

Comments

jnke2016 commented Jul 14, 2023 • edited Loading

Version

Which installation method(s) does this occur on?

Describe the bug.

Minimum reproducible example

Relevant log output

Code of Conduct

kingmesal commented Aug 3, 2023

jnke2016 commented Aug 4, 2023

jnke2016 commented Sep 26, 2023

jnke2016 commented Jul 14, 2023 •

edited

Loading