-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Pattern accelerator based implementation of PageRank, Katz Centrality, BFS, & SSSP #838
[REVIEW] Pattern accelerator based implementation of PageRank, Katz Centrality, BFS, & SSSP #838
Conversation
…o fea_pattern_acc
…o fea_pattern_acc
…hanges to enable optimization
…nalytics functions
…e more performance optimizations in accelerator API implementations
Please update the changelog in order to start CI tests. View the gpuCI docs here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been using these changes and they seem fine.
I assume more changes will be in a different PR (e.g. implementations for is_multi_gpu reductions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a lot of very useful code 🎉
rerun tests |
Codecov Report
@@ Coverage Diff @@
## branch-0.16 #838 +/- ##
============================================
Coverage 73.44% 73.44%
============================================
Files 60 60
Lines 2335 2335
============================================
Hits 1715 1715
Misses 620 620 Continue to review full report at Codecov.
|
rerun tests |
rerun tests |
Just some early feedback on memory usage. I tested BFS with some small graphs, and it produces the correct output. I wasn't able to test twitter (5.6 GB in CSR) on a 12 GB GPU whereas the current implementation works fine for that. The graph is already in CSR here. So there is no additional memory usage beyond the CSR graph, distance array, and whatever BFS uses. That is, I'm calling BFS on
The last two calls in the back trace are
|
rerun tests |
Thanks for testing this, and BFS & SSSP in this PR is not optimized for performance & memory footprint (if you actually read the PR code, you may find several FIXMEs to reduce memory footprint). It will happen in future PRs, and I will make sure memory requirement is actually smaller than the previous one in the final version. |
OK, I will try to merge this and plan to address multi-GPU extensions & performance tuning in separate PRs.
This PR is already very large and also there are multiple works dependent on this, so I think this works better (and this code is not linked to any python user code yet, so there isn't much risk in premature merging).
This API aims to achieve