[REVIEW] dask personalization, fix df query #1237

Iroy30 · 2020-10-20T22:03:13Z

No description provided.

GPUtester · 2020-10-20T22:03:37Z

Please update the changelog in order to start CI tests.

View the gpuCI docs here.

seunghwak · 2020-10-23T18:07:15Z

python/cugraph/dask/link_analysis/pagerank.py

+                                vertex_partition_offsets,
+                                alpha,
+                                max_iter,
+                                tol,


I think we need to discuss more about tol.

The code below is what NetworkX does (compare err with N * tol). This requires setting tol to a smaller value if N gets large (tol here basically means tolerance for a single PageRank value not the entire set of PageRank vector).

https://github.com/networkx/networkx/blob/master/networkx/algorithms/link_analysis/pagerank_alg.py#L155

err = sum([abs(x[n] - xlast[n]) for n in x]) if err < N * tol: return x

The new PageRank algorithm adopts this NetworkX logic to determine convergence, so should we follow the NetworkX logic (and set tol to a smaller value with a larger N) or better remove N * from the new PageRank code?

seunghwak

Please check that Personalization vector is normalized. The new MNMG PageRank code does not require this but it seems like the old SG PageRank code requires this. This leads to test failure with netscience.csv.

codecov-io · 2020-10-26T19:25:45Z

Codecov Report

Merging #1237 (73b42e4) into branch-0.17 (1be056b) will decrease coverage by 0.18%.
The diff coverage is 33.33%.

@@               Coverage Diff               @@
##           branch-0.17    #1237      +/-   ##
===============================================
- Coverage        57.84%   57.65%   -0.19%     
===============================================
  Files               63       62       -1     
  Lines             2780     2657     -123     
===============================================
- Hits              1608     1532      -76     
+ Misses            1172     1125      -47

Impacted Files	Coverage Δ
python/cugraph/dask/link_analysis/pagerank.py	`25.80% <0.00%> (ø)`
python/cugraph/structure/graph.py	`66.60% <0.00%> (ø)`
python/cugraph/cores/k_core.py	`85.71% <75.00%> (-1.79%)`	⬇️
python/cugraph/utilities/utils.py	`69.23% <0.00%> (-3.39%)`	⬇️
python/cugraph/dask/__init__.py	`100.00% <0.00%> (ø)`
python/cugraph/utilities/__init__.py	`100.00% <0.00%> (ø)`
python/cugraph/structure/symmetrize.py	`70.73% <0.00%> (ø)`
python/cugraph/dask/centrality/katz_centrality.py
python/cugraph/comms/comms.py	`35.36% <0.00%> (+0.84%)`	⬆️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1be056b...2e88e12. Read the comment docs.

Iroy30 · 2020-11-02T17:00:31Z

rerun tests

k-core currently doesn't work on asymmetric Graphs

Iroy30 · 2020-11-09T22:24:07Z

rerun tests

seunghwak · 2020-11-13T16:35:14Z

Have you tested this with netscience.csv? The legacy SG personalized PageRank expects personalization values to be normalized and it seems like our test code does not normalize; this led to test failure even if (I believe) the new MNMG personalized PageRank returns correct values.

Iroy30 · 2020-11-13T20:33:46Z

Have you tested this with netscience.csv? The legacy SG personalized PageRank expects personalization values to be normalized and it seems like our test code does not normalize; this led to test failure even if (I believe) the new MNMG personalized PageRank returns correct values.

The legacy SG personalized PageRank normalizes internally and doesn't require the user to send in normalized values. However, netscience.csv fails in the test. Need to look deeper into what is happening

seunghwak · 2020-11-13T21:38:53Z

Have you tested this with netscience.csv? The legacy SG personalized PageRank expects personalization values to be normalized and it seems like our test code does not normalize; this led to test failure even if (I believe) the new MNMG personalized PageRank returns correct values.

The legacy SG personalized PageRank normalizes internally and doesn't require the user to send in normalized values. However, netscience.csv fails in the test. Need to look deeper into what is happening

When I tested, the sum of PageRank values decreases in every iteration with the legacy SG PageRank; it should stay at 1.0 if it's working properly.

afender

looks good
let's try and squeeze the notebook fix in this one

Iroy30 · 2020-11-20T15:39:03Z

looks good
let's try and squeeze the notebook fix in this one

p2p is defaulted to True now so the notebooks work without the explicit setting of p2p. However I edited the notebooks regardless so users are aware p2p is set as True

Iroy30 · 2020-11-20T15:42:10Z

Have you tested this with netscience.csv? The legacy SG personalized PageRank expects personalization values to be normalized and it seems like our test code does not normalize; this led to test failure even if (I believe) the new MNMG personalized PageRank returns correct values.

The legacy SG personalized PageRank normalizes internally and doesn't require the user to send in normalized values. However, netscience.csv fails in the test. Need to look deeper into what is happening

When I tested, the sum of PageRank values decreases in every iteration with the legacy SG PageRank; it should stay at 1.0 if it's working properly.

There was an issue with netscience due to personalization nodes initialization in python. Fixed it.

dask personalization, fix df query

dda79ba

Iroy30 requested a review from a team as a code owner October 20, 2020 22:03

BradReesWork added this to the 0.17 milestone Oct 21, 2020

BradReesWork added the 2 - In Progress label Oct 21, 2020

BradReesWork assigned Iroy30 Oct 21, 2020

seunghwak mentioned this pull request Oct 21, 2020

[REVIEW] BUG Personalized PageRank bug fix #1241

Merged

seunghwak reviewed Oct 23, 2020

View reviewed changes

Iroy30 added 3 commits October 26, 2020 08:55

test updates for asymmetric graphs

2bb3010

changelog

25ef46d

raise exception

cf0a80e

BradReesWork added the feature request New feature or request label Oct 27, 2020

flake8

47d2cd4

Iroy30 changed the title ~~[WIP] dask personalization, fix df query~~ [REVIEW] dask personalization, fix df query Oct 30, 2020

Merge branch 'branch-0.17' into fix_bugs

3782296

Iroy30 added 4 commits November 4, 2020 09:45

Remove DiGraph test

926c052

k-core currently doesn't work on asymmetric Graphs

Remove DiGraph test

3770a2c

k-core currently doesn't work on asymmetric Graphs

Update karate_undirected.csv

514e540

Update and rename asymmetric_directed__tiny.csv to karate-asymmetric.csv

4efb203

Iroy30 added 2 commits November 10, 2020 15:02

fixed conflicts

0acb7b7

merge branch 0.17

7ca833b

BradReesWork added the 4 - Waiting on Author label Nov 12, 2020

afender approved these changes Nov 19, 2020

View reviewed changes

Iroy30 and others added 4 commits November 20, 2020 09:25

update tests

4fbaa7c

update notebooks

d69de72

flake8:

73b42e4

Merge branch 'branch-0.17' into fix_bugs

2e88e12

seunghwak approved these changes Nov 20, 2020

View reviewed changes

BradReesWork merged commit 337a2f6 into rapidsai:branch-0.17 Nov 20, 2020

Iroy30 mentioned this pull request Nov 23, 2020

[BUG] specifying vertex_subset throws invalid syntax error #1229

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REVIEW] dask personalization, fix df query #1237

[REVIEW] dask personalization, fix df query #1237

Iroy30 commented Oct 20, 2020

GPUtester commented Oct 20, 2020

seunghwak Oct 23, 2020

seunghwak left a comment

codecov-io commented Oct 26, 2020 •

edited

Loading

Iroy30 commented Nov 2, 2020

Iroy30 commented Nov 9, 2020

seunghwak commented Nov 13, 2020

Iroy30 commented Nov 13, 2020

seunghwak commented Nov 13, 2020 •

edited

Loading

afender left a comment

Iroy30 commented Nov 20, 2020

Iroy30 commented Nov 20, 2020

[REVIEW] dask personalization, fix df query #1237

[REVIEW] dask personalization, fix df query #1237

Conversation

Iroy30 commented Oct 20, 2020

GPUtester commented Oct 20, 2020

seunghwak Oct 23, 2020

Choose a reason for hiding this comment

seunghwak left a comment

Choose a reason for hiding this comment

codecov-io commented Oct 26, 2020 • edited Loading

Codecov Report

Iroy30 commented Nov 2, 2020

Iroy30 commented Nov 9, 2020

seunghwak commented Nov 13, 2020

Iroy30 commented Nov 13, 2020

seunghwak commented Nov 13, 2020 • edited Loading

afender left a comment

Choose a reason for hiding this comment

Iroy30 commented Nov 20, 2020

Iroy30 commented Nov 20, 2020

codecov-io commented Oct 26, 2020 •

edited

Loading

seunghwak commented Nov 13, 2020 •

edited

Loading