Fix broadcast root during the replication call #3655

jnke2016 · 2023-06-10T00:39:56Z

A Raft rapidsai/raft#1573 assigning deterministic ranks to dask workers was merged, breaking batch algorithms like batch_edge_betweenness_centrality by picking the wrong worker as the root for the broadcast operation.
This PR ensures that the worker with rank = 0 is the root of the broadcast operation.

…6_fix-broadcast-root

VibhuJawa · 2023-06-10T18:24:56Z

python/cugraph/cugraph/dask/common/input_utils.py

@@ -102,8 +104,20 @@ def create(cls, data, client=None, batch_enabled=False):
        else:
            raise TypeError("Graph data must be dask-cudf dataframe")

+        broadcast_worker = None
+        if batch_enabled:
+            worker_ranks = client.run(_get_nvml_device_index)


I think this will fail on MNMG setups. The worker_rank here gets nvml_device_index which will be zero for 2 devices (the first device on both nodes) .

I think we should figure out a way to get this from raft instead.

How about we directly fetch from the upstream raft_dask library to prevent breaking in the future too ?

Suggested change

worker_ranks = client.run(_get_nvml_device_index)

from raft_dask.common.comms import _func_worker_ranks

worker_ranks = _func_worker_ranks(client)

That PR is not merged yet. That is why I added a FIXME in line 286

That PR is not merged yet. That is why I added a FIXME in line 286

No need to wait for your PR to be merged because _func_worker_ranks already exits. Its implementation will just change once your PR is merge.

I think this will fail on MNMG setups

I don't think this can fail because the address of rank 0 in this PR always matches the raft one. In fact, client.run(_get_nvml_device_index) is deterministic and always returns the result in increasing network ID or IP address number(the keys of the dictionary). On a MNMG run, even though there are multiple pairs with rank 0, this PR will always pick the one in the first node which is consistent with the raft one. This is true because the raft PR only applies the rank offsets to the second node and above. Furthermore, I extensively tested this PR on Friday on both 2 and 4 nodes and all runs passed.

How about we directly fetch from the upstream raft_dask library to prevent breaking in the future too ?

Right and this is what I had in mind when I added a FIXME. This will avoid code duplication and prevent breaking changes.

On a MNMG run, even though there are multiple pairs with rank 0, this PR will always pick the one in the first node which is consistent with the raft one

Does this always hold true, Maybe i am missing the logic on this

As we discussed, getting the ranks that were established after the comms initialization guarantee consistency.

VibhuJawa · 2023-06-10T18:32:43Z

python/cugraph/cugraph/dask/common/input_utils.py

@@ -102,8 +104,20 @@ def create(cls, data, client=None, batch_enabled=False):
        else:
            raise TypeError("Graph data must be dask-cudf dataframe")

+        broadcast_worker = None
+        if batch_enabled:
+            worker_ranks = client.run(_get_nvml_device_index)


How about we directly fetch from the upstream raft_dask library to prevent breaking in the future too ?

Suggested change

worker_ranks = client.run(_get_nvml_device_index)

from raft_dask.common.comms import _func_worker_ranks

worker_ranks = _func_worker_ranks(client)

VibhuJawa · 2023-06-12T19:21:31Z

python/cugraph/cugraph/dask/common/input_utils.py

+def _get_nvml_device_index():
+    """
+    Return NVML device index based on environment variable
+    'CUDA_VISIBLE_DEVICES'.
+    """
+    # FIXME: Leverage the one from raft instead.
+    CUDA_VISIBLE_DEVICES = os.getenv("CUDA_VISIBLE_DEVICES")
+    return nvml_device_index(0, CUDA_VISIBLE_DEVICES)


I think you can remove it now

Suggested change

def _get_nvml_device_index():

"""

Return NVML device index based on environment variable

'CUDA_VISIBLE_DEVICES'.

"""

# FIXME: Leverage the one from raft instead.

CUDA_VISIBLE_DEVICES = os.getenv("CUDA_VISIBLE_DEVICES")

return nvml_device_index(0, CUDA_VISIBLE_DEVICES)

Ya. I was pushing this now in another commit

VibhuJawa · 2023-06-12T19:21:48Z

python/cugraph/cugraph/dask/common/input_utils.py

+import os
+from dask_cuda.utils import nvml_device_index


You can remove this now.

Suggested change

import os

from dask_cuda.utils import nvml_device_index

VibhuJawa

LGTM

jnke2016 added 5 commits June 9, 2023 17:08

fix the root of the broadcast

4a43a82

remove outdated comments

7e3b6f8

fix style

3323da9

Merge remote-tracking branch 'upstream/branch-23.06' into branch-23.0…

1bd2d82

…6_fix-broadcast-root

fix typo

929e491

jnke2016 requested a review from a team as a code owner June 10, 2023 00:39

rlratzel added bug Something isn't working non-breaking Non-breaking change labels Jun 10, 2023

rlratzel requested a review from VibhuJawa June 10, 2023 03:51

VibhuJawa suggested changes Jun 10, 2023

View reviewed changes

BradReesWork added this to the 23.06 milestone Jun 12, 2023

get consistent gpu IDs from raft

9428f27

VibhuJawa suggested changes Jun 12, 2023

View reviewed changes

remove unused import and functions

67d94ae

VibhuJawa approved these changes Jun 12, 2023

View reviewed changes

rlratzel approved these changes Jun 12, 2023

View reviewed changes

raydouglass merged commit 653bbd5 into rapidsai:branch-23.06 Jun 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix broadcast root during the replication call #3655

Fix broadcast root during the replication call #3655

jnke2016 commented Jun 10, 2023

VibhuJawa Jun 10, 2023

VibhuJawa Jun 10, 2023

jnke2016 Jun 12, 2023

jnke2016 Jun 12, 2023 •

edited

Loading

VibhuJawa Jun 12, 2023 •

edited

Loading

jnke2016 Jun 12, 2023

VibhuJawa Jun 10, 2023

VibhuJawa Jun 12, 2023

jnke2016 Jun 12, 2023

VibhuJawa Jun 12, 2023

VibhuJawa Jun 12, 2023

VibhuJawa left a comment

	worker_ranks = client.run(_get_nvml_device_index)
	from raft_dask.common.comms import _func_worker_ranks
	worker_ranks = _func_worker_ranks(client)

Fix broadcast root during the replication call #3655

Fix broadcast root during the replication call #3655

Conversation

jnke2016 commented Jun 10, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jnke2016 Jun 12, 2023 • edited Loading

Choose a reason for hiding this comment

VibhuJawa Jun 12, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

VibhuJawa left a comment

Choose a reason for hiding this comment

jnke2016 Jun 12, 2023 •

edited

Loading

VibhuJawa Jun 12, 2023 •

edited

Loading