-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] dask personalization, fix df query #1237
Conversation
Please update the changelog in order to start CI tests. View the gpuCI docs here. |
vertex_partition_offsets, | ||
alpha, | ||
max_iter, | ||
tol, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to discuss more about tol
.
The code below is what NetworkX does (compare err
with N * tol
). This requires setting tol to a smaller value if N gets large (tol here basically means tolerance for a single PageRank value not the entire set of PageRank vector).
err = sum([abs(x[n] - xlast[n]) for n in x])
if err < N * tol:
return x
The new PageRank algorithm adopts this NetworkX logic to determine convergence, so should we follow the NetworkX logic (and set tol to a smaller value with a larger N) or better remove N *
from the new PageRank code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check that Personalization vector is normalized. The new MNMG PageRank code does not require this but it seems like the old SG PageRank code requires this. This leads to test failure with netscience.csv.
Codecov Report
@@ Coverage Diff @@
## branch-0.17 #1237 +/- ##
===============================================
- Coverage 57.84% 57.65% -0.19%
===============================================
Files 63 62 -1
Lines 2780 2657 -123
===============================================
- Hits 1608 1532 -76
+ Misses 1172 1125 -47
Continue to review full report at Codecov.
|
rerun tests |
k-core currently doesn't work on asymmetric Graphs
k-core currently doesn't work on asymmetric Graphs
rerun tests |
Have you tested this with netscience.csv? The legacy SG personalized PageRank expects personalization values to be normalized and it seems like our test code does not normalize; this led to test failure even if (I believe) the new MNMG personalized PageRank returns correct values. |
The legacy SG personalized PageRank normalizes internally and doesn't require the user to send in normalized values. However, netscience.csv fails in the test. Need to look deeper into what is happening |
When I tested, the sum of PageRank values decreases in every iteration with the legacy SG PageRank; it should stay at 1.0 if it's working properly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good
let's try and squeeze the notebook fix in this one
p2p is defaulted to True now so the notebooks work without the explicit setting of p2p. However I edited the notebooks regardless so users are aware p2p is set as True |
There was an issue with netscience due to personalization nodes initialization in python. Fixed it. |
No description provided.