[RLlib] Decentralized multi-agent learning; PR #01 #21421

sven1977 · 2022-01-06T15:14:20Z

Decentralized multi-agent learning; preparatory PR.

Adds asynchronous_parallel_sampling utility fn for asynch algos. Analogous to existing synchronous_parallel_sampling already used by PGTrainer.
Adds missing docstrings to some rollout utility functions.
Enhances create_colocated via new and more generic create_colocated_actors utility function. This allows users to co-locate any (different) types of actors on the same node.

Why are these changes needed?

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

…decentralized_multi_agent_learning_01 # Conflicts: # rllib/policy/policy.py

…ntralized_multi_agent_learning_01 # Conflicts: # rllib/agents/trainer.py # rllib/evaluation/rollout_worker.py # rllib/policy/policy.py

gjoliver

a few questions and comments. thanks.

gjoliver · 2022-01-11T23:47:30Z

rllib/agents/ddpg/apex.py

@@ -31,6 +35,7 @@
        "worker_side_prioritization": True,
        "min_iter_time_s": 30,
    },
+    _allow_unknown_configs=True,


why need this?

We are using Trainer's merge utility. It requires that if the second config (APEX-DDPG's) contains new keys that you set this to True.
Otherwise, it would complain about the new key (e.g. ) not being found in the first config (DDPG's).

gjoliver · 2022-01-11T23:50:12Z

rllib/agents/dqn/apex.py

@@ -129,7 +136,8 @@ def execution_plan(workers: WorkerSet, config: dict,
        # Create a number of replay buffer actors.
        num_replay_buffer_shards = config["optimizer"][
            "num_replay_buffer_shards"]
-        replay_actors = create_colocated(ReplayActor, [
+
+        args = [


name this replay_actor_args may be clearer.

gjoliver · 2022-01-12T03:04:52Z

rllib/utils/actors.py

 def drop_colocated(actors):
    colocated, non_colocated = split_colocated(actors)
    for a in colocated:
        a.__ray_terminate__.remote()
    return non_colocated


-def split_colocated(actors):
-    localhost = platform.node()
+def split_colocated(actors, node=None):


do we really need to allow node param to be None?
the only user of this util provides node parameter when calling this.
also not sure if it's intuitive that split_colocated would split based on the node of the first actor if node param is not specified.
like this behavior feels a bit random?

Fixed this. Should behave backward-compatibly now, with node="localhost" being the default (same behavior as before, where node arg didn't exist and we always tried to locate on localhost).
Also added docstrings.

gjoliver · 2022-01-12T03:08:26Z

rllib/utils/actors.py

+
+
+def try_create_colocated(cls, args, count, kwargs=None, node=None):
+    kwargs = kwargs or {}


why don't we make {} the default value for the argument?

Dangerous, as python will keep that {} dict around, so if you change it in one function call (add key/values to it), the next time you call the function w/o providing the arg, the function will use the altered dict (the one with the added key/value pair).

oh right, I forgot {} is banned as defaults actually. thanks.

gjoliver · 2022-01-12T03:08:49Z

rllib/utils/actors.py

+    return ok
+
+
+def try_create_colocated(cls, args, count, kwargs=None, node=None):


Can we add a comment here about node being None? Seems it's a pretty important detail that if node is None, it means we don't care which node these actors are colocated to, as long as they are together.

Yes, I'll add (better) docstrings to all these.

gjoliver · 2022-01-12T04:20:11Z

rllib/agents/ddpg/apex.py

+        # Whether all shards of the replay buffer must be co-located
+        # with the learner process (running the execution plan).
+        # If False, replay shards may be created on different node(s).
+        "replay_buffer_shards_colocated_with_driver": True,


actually I wonder, why do they need to be on the same node?

For APEX, local node is the learner, so data (one in the buffer shards) never has to travel again. I think that's the sole intention here.

I see I see. to be honest, this doesn't feel like a requirement to me, more like an optimization.
since we don't have viability guarantee from Ray core, If it's up to me, I would choose to do this as a best-effort thing.
like trying to colocate everything, and if that fails, schedule the other rb shards anywhere.
then we don't have the while loop, and this scheduling can finish in at most 2 steps.

it is obviously too big of a change. maybe just add a note/todo somewhere???

as written, I am a little worried a stack may fail with mysterious error message like "fail to schedule RB actors" while there are enough CPUs, just a small head node.

Just a note: This is nothing new that I introduced here for APEX. We have always forced all replay shards to be located on the driver. This change actually allows users (via setting this new flag to False) to relax this constraint.

I can add a comment to explain this more. ...

appreciate!

gjoliver · 2022-01-12T04:21:18Z

rllib/utils/actors.py

+    # Maps types to lists of already co-located actors.
+    ok = [[] for _ in range(len(actor_specs))]
+    attempt = 1
+    while attempt < max_attempts:


if these actors don't fit on a same node the first time, why would they fit when we try the second time?
this is a case where we need PACK scheduling it seems.

We do use "PACK" strategy by default, so this should be ok. But it's a good question: Could still be that ray places the actor on a different node (bundle), no? And then we have to try again. I would love to use a ray tool to force placing an actor on a given node, but I don't think this exists.

yeah, I would love to show core folks this as an example use case when this PR is in.

gjoliver · 2022-01-12T04:32:45Z

rllib/execution/rollout_ops.py

+        assert len(remote_kwargs) == len(actors)
+
+    # Create a map inside Trainer instance that maps actorss to sets of open
+    # requests (object refs). This way, we keep track, of which actorss have


are you intentionally writing actorss here and above?
or are they typos?

:D definitely typos

gjoliver · 2022-01-12T04:35:31Z

rllib/execution/rollout_ops.py

+    # already been sent how many requests
+    # (`max_remote_requests_in_flight_per_actor` arg).
+    if not hasattr(trainer, "_remote_requests_in_flight"):
+        trainer._remote_requests_in_flight = defaultdict(set)


initialize this in trainer's init() or setup()?
it is confusing if some other instances are creating _ private members of another class instance.

also, instead of passing entire trainer, why not pass _remote_requests_in_flights dict into this function? so this rollout op doesn't have access to everything that we have.

+1 Makes all sense. I'll clean up.

done

This is only needed in the next PR (where we introduce the AlphaStar agent). In there, I'll set this property up in setup(). 👍

gjoliver · 2022-01-12T04:44:45Z

rllib/execution/rollout_ops.py

+
+        # Return None if nothing ready after the timeout.
+        if not ready:
+            return


should we return [] or at least None?

return is the same as return None, no? I can add the None to make it more explicit.

…ntralized_multi_agent_learning_01

gjoliver

thanks for all the updates, looks great now.
one more minor suggestion to delete the util func that is not used anymore.
also thanks for digging into the logics for colocated actor scheduling. it's not pretty for sure.

gjoliver · 2022-01-13T08:58:12Z

rllib/agents/ddpg/apex.py

+        # Whether all shards of the replay buffer must be co-located
+        # with the learner process (running the execution plan).
+        # If False, replay shards may be created on different node(s).
+        "replay_buffer_shards_colocated_with_driver": True,


appreciate!

gjoliver · 2022-01-13T09:00:34Z

rllib/utils/actors.py

+    co_located, non_co_located = split_colocated(actors, node=node)
+    logger.info("Got {} colocated actors of {}".format(len(co_located), count))
+    for a in non_co_located:
+        a.__ray_terminate__.remote()


yeah I double checked with clark, they have an open issue to make ray_terminate a public api.

gjoliver · 2022-01-13T09:09:01Z

rllib/utils/actors.py

+
+
+@Deprecated(error=False)
+def drop_colocated(actors: List[ActorHandle]) -> List[ActorHandle]:


delete this util? it's not used anywhere.

Maybe some user is using it somewhere :D
I marked it @Deprecated.

sven1977 added 5 commits January 6, 2022 15:53

wip.

be8b0c7

wip.

e9cc32a

LINT.

83ea25c

Merge branch 'decentralized_multi_agent_learning_init_cleanups' into …

43db716

…decentralized_multi_agent_learning_01 # Conflicts: # rllib/policy/policy.py

wip.

488443f

sven1977 changed the title ~~[RLlib] Decentralized multi-agent learning; PR #01~~ [WIP; RLlib] Decentralized multi-agent learning; PR #01 Jan 6, 2022

sven1977 changed the title ~~[WIP; RLlib] Decentralized multi-agent learning; PR #01~~ [RLlib] Decentralized multi-agent learning; PR #01 Jan 10, 2022

sven1977 added 2 commits January 10, 2022 11:26

Merge branch 'master' of https://github.com/ray-project/ray into dece…

2e35dc0

…ntralized_multi_agent_learning_01 # Conflicts: # rllib/agents/trainer.py # rllib/evaluation/rollout_worker.py # rllib/policy/policy.py

wip.

cf07481

sven1977 requested a review from gjoliver January 10, 2022 12:09

sven1977 assigned gjoliver Jan 10, 2022

wip.

51fae10

gjoliver reviewed Jan 12, 2022

View reviewed changes

sven1977 requested a review from gjoliver January 12, 2022 10:41

sven1977 added 4 commits January 12, 2022 14:42

wip.

d4168ca

wip.

affa2c6

Merge branch 'master' of https://github.com/ray-project/ray into dece…

c872b25

…ntralized_multi_agent_learning_01

wip.

bd1a0bc

gjoliver approved these changes Jan 13, 2022

View reviewed changes

sven1977 merged commit 90c6b10 into ray-project:master Jan 13, 2022

sven1977 deleted the decentralized_multi_agent_learning_01 branch June 2, 2023 20:17



		def try_create_colocated(cls, args, count, kwargs=None, node=None):
		kwargs = kwargs or {}

		return ok


		def try_create_colocated(cls, args, count, kwargs=None, node=None):



		@Deprecated(error=False)
		def drop_colocated(actors: List[ActorHandle]) -> List[ActorHandle]:

[RLlib] Decentralized multi-agent learning; PR #01 #21421

[RLlib] Decentralized multi-agent learning; PR #01 #21421

Conversation

sven1977 commented Jan 6, 2022 • edited Loading

Why are these changes needed?

Related issue number

Checks

gjoliver left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sven1977 Jan 12, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sven1977 Jan 12, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gjoliver left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sven1977 commented Jan 6, 2022 •

edited

Loading

sven1977 Jan 12, 2022 •

edited

Loading

sven1977 Jan 12, 2022 •

edited

Loading