Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Pattern accelerator based implementation of PageRank, Katz Centrality, BFS, & SSSP #838

Merged
merged 182 commits into from
Sep 25, 2020

Conversation

seunghwak
Copy link
Contributor

@seunghwak seunghwak commented Apr 28, 2020

OK, I will try to merge this and plan to address multi-GPU extensions & performance tuning in separate PRs.

This PR is already very large and also there are multiple works dependent on this, so I think this works better (and this code is not linked to any python user code yet, so there isn't much risk in premature merging).

This API aims to achieve

  1. thrust-like API for graph algorithms
  2. Abstract out implementation issues in different target systems (Single GPU, multi-GPU, ...) inside the pattern accelerator API, Graph, and Handle; Same analytics code will be used for different target systems.
  3. Minimize redundancy in cuGraph codebase and better enforce consistency.

@GPUtester
Copy link
Contributor

Please update the changelog in order to start CI tests.

View the gpuCI docs here.

Copy link
Collaborator

@ChuckHastings ChuckHastings left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been using these changes and they seem fine.

I assume more changes will be in a different PR (e.g. implementations for is_multi_gpu reductions)

Copy link
Member

@afender afender left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a lot of very useful code 🎉

@BradReesWork
Copy link
Member

rerun tests

@codecov-commenter
Copy link

codecov-commenter commented Sep 23, 2020

Codecov Report

Merging #838 into branch-0.16 will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff              @@
##           branch-0.16     #838   +/-   ##
============================================
  Coverage        73.44%   73.44%           
============================================
  Files               60       60           
  Lines             2335     2335           
============================================
  Hits              1715     1715           
  Misses             620      620           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 85ec558...c50f85c. Read the comment docs.

@BradReesWork
Copy link
Member

rerun tests

@BradReesWork
Copy link
Member

rerun tests

@pgera
Copy link
Contributor

pgera commented Sep 25, 2020

Just some early feedback on memory usage. I tested BFS with some small graphs, and it produces the correct output. I wasn't able to test twitter (5.6 GB in CSR) on a 12 GB GPU whereas the current implementation works fine for that. The graph is already in CSR here. So there is no additional memory usage beyond the CSR graph, distance array, and whatever BFS uses. That is, I'm calling BFS on

cugraph::experimental::graph_view_t<VT, ET, float, false, false> G(            
      handle,                                                                   
      thrust::raw_pointer_cast(t_vlist.data()),                                 
      thrust::raw_pointer_cast(t_elist.data()),                                                                                              
      nullptr,                                                                  
      std::vector<VT> (),                                                       
      vertex_cnt,                                                               
      edge_cnt,                                                                 
      cugraph::experimental::graph_properties_t(),                              
      false,                                                                    
      false);

cugraph::experimental::bfs<VT, ET, float, false>(                           
      handle,                                                                   
      G,                                                                        
      thrust::raw_pointer_cast(t_distances.data()),                             
      static_cast<VT*>(nullptr),                                                
      static_cast<VT>(src),                                                     
      false,                                                                    
      std::numeric_limits<VT>::max(),                                           
      false);

The last two calls in the back trace are

#6  0x0000555555567878 in rmm::mr::cuda_memory_resource::do_allocate(unsigned long, CUstream_st*) ()
#7  0x00007fffedd46940 in void cugraph::experimental::update_frontier_v_push_if_out_nbr<cugraph::experimental::graph_view_t<int, int, float, false, false, void>

@BradReesWork
Copy link
Member

rerun tests

@seunghwak
Copy link
Contributor Author

Just some early feedback on memory usage. I tested BFS with some small graphs, and it produces the correct output. I wasn't able to test twitter (5.6 GB in CSR) on a 12 GB GPU whereas the current implementation works fine for that. The graph is already in CSR here. So there is no additional memory usage beyond the CSR graph, distance array, and whatever BFS uses. That is, I'm calling BFS on

cugraph::experimental::graph_view_t<VT, ET, float, false, false> G(            
      handle,                                                                   
      thrust::raw_pointer_cast(t_vlist.data()),                                 
      thrust::raw_pointer_cast(t_elist.data()),                                                                                              
      nullptr,                                                                  
      std::vector<VT> (),                                                       
      vertex_cnt,                                                               
      edge_cnt,                                                                 
      cugraph::experimental::graph_properties_t(),                              
      false,                                                                    
      false);

cugraph::experimental::bfs<VT, ET, float, false>(                           
      handle,                                                                   
      G,                                                                        
      thrust::raw_pointer_cast(t_distances.data()),                             
      static_cast<VT*>(nullptr),                                                
      static_cast<VT>(src),                                                     
      false,                                                                    
      std::numeric_limits<VT>::max(),                                           
      false);

The last two calls in the back trace are

#6  0x0000555555567878 in rmm::mr::cuda_memory_resource::do_allocate(unsigned long, CUstream_st*) ()
#7  0x00007fffedd46940 in void cugraph::experimental::update_frontier_v_push_if_out_nbr<cugraph::experimental::graph_view_t<int, int, float, false, false, void>

Thanks for testing this, and BFS & SSSP in this PR is not optimized for performance & memory footprint (if you actually read the PR code, you may find several FIXMEs to reduce memory footprint). It will happen in future PRs, and I will make sure memory requirement is actually smaller than the previous one in the final version.

@BradReesWork BradReesWork merged commit b353e1e into rapidsai:branch-0.16 Sep 25, 2020
@seunghwak seunghwak deleted the fea_pattern_acc branch October 3, 2020 04:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants