Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] dask personalization, fix df query #1237

Merged
merged 16 commits into from
Nov 20, 2020

Conversation

Iroy30
Copy link
Contributor

@Iroy30 Iroy30 commented Oct 20, 2020

No description provided.

@Iroy30 Iroy30 requested a review from a team as a code owner October 20, 2020 22:03
@GPUtester
Copy link
Contributor

Please update the changelog in order to start CI tests.

View the gpuCI docs here.

vertex_partition_offsets,
alpha,
max_iter,
tol,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to discuss more about tol.

The code below is what NetworkX does (compare err with N * tol). This requires setting tol to a smaller value if N gets large (tol here basically means tolerance for a single PageRank value not the entire set of PageRank vector).

https://github.com/networkx/networkx/blob/master/networkx/algorithms/link_analysis/pagerank_alg.py#L155

err = sum([abs(x[n] - xlast[n]) for n in x])
        if err < N * tol:
            return x

The new PageRank algorithm adopts this NetworkX logic to determine convergence, so should we follow the NetworkX logic (and set tol to a smaller value with a larger N) or better remove N * from the new PageRank code?

Copy link
Contributor

@seunghwak seunghwak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check that Personalization vector is normalized. The new MNMG PageRank code does not require this but it seems like the old SG PageRank code requires this. This leads to test failure with netscience.csv.

@codecov-io
Copy link

codecov-io commented Oct 26, 2020

Codecov Report

Merging #1237 (73b42e4) into branch-0.17 (1be056b) will decrease coverage by 0.18%.
The diff coverage is 33.33%.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.17    #1237      +/-   ##
===============================================
- Coverage        57.84%   57.65%   -0.19%     
===============================================
  Files               63       62       -1     
  Lines             2780     2657     -123     
===============================================
- Hits              1608     1532      -76     
+ Misses            1172     1125      -47     
Impacted Files Coverage Δ
python/cugraph/dask/link_analysis/pagerank.py 25.80% <0.00%> (ø)
python/cugraph/structure/graph.py 66.60% <0.00%> (ø)
python/cugraph/cores/k_core.py 85.71% <75.00%> (-1.79%) ⬇️
python/cugraph/utilities/utils.py 69.23% <0.00%> (-3.39%) ⬇️
python/cugraph/dask/__init__.py 100.00% <0.00%> (ø)
python/cugraph/utilities/__init__.py 100.00% <0.00%> (ø)
python/cugraph/structure/symmetrize.py 70.73% <0.00%> (ø)
python/cugraph/dask/centrality/katz_centrality.py
python/cugraph/comms/comms.py 35.36% <0.00%> (+0.84%) ⬆️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1be056b...2e88e12. Read the comment docs.

@BradReesWork BradReesWork added the feature request New feature or request label Oct 27, 2020
@Iroy30 Iroy30 changed the title [WIP] dask personalization, fix df query [REVIEW] dask personalization, fix df query Oct 30, 2020
@Iroy30
Copy link
Contributor Author

Iroy30 commented Nov 2, 2020

rerun tests

k-core currently doesn't work on asymmetric Graphs
k-core currently doesn't work on asymmetric Graphs
@Iroy30
Copy link
Contributor Author

Iroy30 commented Nov 9, 2020

rerun tests

@seunghwak
Copy link
Contributor

Have you tested this with netscience.csv? The legacy SG personalized PageRank expects personalization values to be normalized and it seems like our test code does not normalize; this led to test failure even if (I believe) the new MNMG personalized PageRank returns correct values.

@Iroy30
Copy link
Contributor Author

Iroy30 commented Nov 13, 2020

Have you tested this with netscience.csv? The legacy SG personalized PageRank expects personalization values to be normalized and it seems like our test code does not normalize; this led to test failure even if (I believe) the new MNMG personalized PageRank returns correct values.

The legacy SG personalized PageRank normalizes internally and doesn't require the user to send in normalized values. However, netscience.csv fails in the test. Need to look deeper into what is happening

@seunghwak
Copy link
Contributor

seunghwak commented Nov 13, 2020

Have you tested this with netscience.csv? The legacy SG personalized PageRank expects personalization values to be normalized and it seems like our test code does not normalize; this led to test failure even if (I believe) the new MNMG personalized PageRank returns correct values.

The legacy SG personalized PageRank normalizes internally and doesn't require the user to send in normalized values. However, netscience.csv fails in the test. Need to look deeper into what is happening

When I tested, the sum of PageRank values decreases in every iteration with the legacy SG PageRank; it should stay at 1.0 if it's working properly.

Copy link
Member

@afender afender left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good
let's try and squeeze the notebook fix in this one

@Iroy30
Copy link
Contributor Author

Iroy30 commented Nov 20, 2020

looks good
let's try and squeeze the notebook fix in this one

p2p is defaulted to True now so the notebooks work without the explicit setting of p2p. However I edited the notebooks regardless so users are aware p2p is set as True

@Iroy30
Copy link
Contributor Author

Iroy30 commented Nov 20, 2020

Have you tested this with netscience.csv? The legacy SG personalized PageRank expects personalization values to be normalized and it seems like our test code does not normalize; this led to test failure even if (I believe) the new MNMG personalized PageRank returns correct values.

The legacy SG personalized PageRank normalizes internally and doesn't require the user to send in normalized values. However, netscience.csv fails in the test. Need to look deeper into what is happening

When I tested, the sum of PageRank values decreases in every iteration with the legacy SG PageRank; it should stay at 1.0 if it's working properly.

There was an issue with netscience due to personalization nodes initialization in python. Fixed it.

@BradReesWork BradReesWork merged commit 337a2f6 into rapidsai:branch-0.17 Nov 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants