[REVIEW] OPG degree #840

afender · 2020-04-28T23:38:35Z

close #810

Added temporary comm class (to be replaced by RAFT). It is a lightweight stopgap that should be easy to replace.
Added a path to set the communicator in GraphBase (might move to the Handle once we have it)
OPG edge list host partitioner (1D) for testing
C++ OPG Out degree implementation and test

up

afender · 2020-04-28T23:47:30Z

Expected output

cpp/include/graph.hpp

cpp/src/comms/mpi/comms_mpi.cpp

cpp/src/comms/mpi/comms_mpi.hpp

cpp/src/structure/graph.cu

codecov-io · 2020-05-01T23:59:36Z

Codecov Report

Merging #840 into branch-0.14 will not change coverage.
The diff coverage is n/a.

@@             Coverage Diff              @@
##           branch-0.14     #840   +/-   ##
============================================
  Coverage        47.92%   47.92%           
============================================
  Files               44       44           
  Lines             1327     1327           
============================================
  Hits               636      636           
  Misses             691      691

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 631bbaa...07feb33. Read the comment docs.

BradReesWork · 2020-05-05T14:42:44Z

cpp/CMakeLists.txt

@@ -330,6 +330,7 @@ link_directories(
    "${CMAKE_CUDA_IMPLICIT_LINK_DIRECTORIES}")

 add_library(cugraph SHARED
+    src/comms/mpi/comms_mpi.cpp


Should this also be in a "BUILD_MPI" block? there is no need to include the file if mpi is not going to be built.

When BUILD_MPI is OFF, the communicator member functions become no-op. This allows running through the existing path without contaminating the whole code base with #ifdef ENABLE_OPGoutside of the communicator class.

cpp/include/comms_mpi.hpp

BradReesWork · 2020-05-05T15:12:54Z

cpp/src/comms/mpi/comms_mpi.cpp

+  // CUDA
+
+  CUDA_TRY(cudaGetDeviceCount(&_device_count));
+  _device_id = _rank % _device_count; // FixMe : assumes each node has the same number of GPUs


This seems like it could be an issues. We should discuss configuration and how to handle
you also only set one device and not one per rank

This seems like it could be an issues. We should discuss configuration and how to handle

I think it is fine. We expect python to set the device and this code to be removed. This is just for C++ smoke tests for now so that I can validate the progress on the C++ part.

you also only set one device and not one per rank

No, it does set one device per rank. In an OPG environment, this section of the code is traversed by all ranks, each one will retrieve and set its own GPU based on its rank.

cpp/src/comms/mpi/comms_mpi.cpp

BradReesWork · 2020-05-05T15:27:41Z

cpp/src/structure/graph.cu

-    degree_from_vertex_ids(GraphBase<VT,ET,WT>::number_of_edges, src_indices, degree, stream);
+    if (GraphBase<VT,ET,WT>::comm.get_p()) // FixMe retrieve global source indexing for the allreduce work
+      CUGRAPH_FAIL("OPG degree not implemented for OUT degree");
+    degree_from_vertex_ids(GraphBase<VT,ET,WT>::comm, GraphBase<VT,ET,WT>::number_of_vertices, GraphBase<VT,ET,WT>::number_of_edges, src_indices, degree, stream);


Is it GraphBase or new GraphBaseView? PR #799 might need to be merged first

We would need to reconcile either way.

Iroy30

LGTM

Iroy30 · 2020-05-05T18:02:30Z

It would be good to be able to run these changes on CI with mpi build ON

afender added 9 commits April 13, 2020 10:45

Merge pull request #40 from rapidsai/branch-0.14

9173acb

up

Added NCCL_TRY macro for throwing throwing erros

012465e

wip comm

8aa34bc

checkpoint

b23b5e5

builds

a89328e

test checkpoint

f5bc959

checkpoint np 1 passes

dd29ec9

added edge list partitioning of test input and fixes

60cc7ee

more fixes and cleanup

a703325

afender requested a review from Iroy30 April 28, 2020 23:38

afender requested review from a team as code owners April 28, 2020 23:38

afender added 3 commits April 28, 2020 18:39

changelog

60e9ee8

Merge branch 'branch-0.14' into opg_degree

d639939

fixed comment

18094e1

seunghwak reviewed Apr 29, 2020

View reviewed changes

afender added 2 commits April 29, 2020 12:51

fixes

008327f

non-mpi path

40b448f

BradReesWork assigned afender Apr 29, 2020

BradReesWork added the 3 - Ready for Review label Apr 29, 2020

BradReesWork added this to the 0.14 milestone Apr 29, 2020

afender added 3 commits April 30, 2020 17:01

headers reorg for comms deployment

1c6b267

constructor for python and fixes

1643eee

naming

7ac19b1

BradReesWork requested changes May 5, 2020

View reviewed changes

Iroy30 approved these changes May 5, 2020

View reviewed changes

afender added 5 commits May 5, 2020 15:20

Clang formating

5b45a5c

fixmes and copyright

f4c407c

clang2

01ba016

fix for header issue showing up on CI

4309688

Merge branch 'branch-0.14' into opg_degree

07feb33

seunghwak approved these changes May 6, 2020

View reviewed changes

afender requested a review from BradReesWork May 6, 2020 14:41

afender added the 5 - Ready to Merge label May 6, 2020

BradReesWork approved these changes May 6, 2020

View reviewed changes

BradReesWork merged commit 3f78243 into rapidsai:branch-0.14 May 6, 2020

Iroy30 mentioned this pull request May 6, 2020

[WIP][skip-ci] Opg python changes #846

Closed

afender mentioned this pull request May 28, 2020

[FEA] communicator abstraction #814

Closed

afender deleted the opg_degree branch April 5, 2021 18:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REVIEW] OPG degree #840

[REVIEW] OPG degree #840

afender commented Apr 28, 2020

afender commented Apr 28, 2020

codecov-io commented May 1, 2020 •

edited

Loading

BradReesWork May 5, 2020

afender May 5, 2020 •

edited

Loading

BradReesWork May 5, 2020

afender May 5, 2020

BradReesWork May 5, 2020

afender May 5, 2020

Iroy30 left a comment

Iroy30 commented May 5, 2020

[REVIEW] OPG degree #840

[REVIEW] OPG degree #840

Conversation

afender commented Apr 28, 2020

afender commented Apr 28, 2020

codecov-io commented May 1, 2020 • edited Loading

Codecov Report

BradReesWork May 5, 2020

Choose a reason for hiding this comment

afender May 5, 2020 • edited Loading

Choose a reason for hiding this comment

BradReesWork May 5, 2020

Choose a reason for hiding this comment

afender May 5, 2020

Choose a reason for hiding this comment

BradReesWork May 5, 2020

Choose a reason for hiding this comment

afender May 5, 2020

Choose a reason for hiding this comment

Iroy30 left a comment

Choose a reason for hiding this comment

Iroy30 commented May 5, 2020

codecov-io commented May 1, 2020 •

edited

Loading

afender May 5, 2020 •

edited

Loading