-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Decentralized multi-agent learning; PR #01 #21421
[RLlib] Decentralized multi-agent learning; PR #01 #21421
Conversation
…ntralized_multi_agent_learning_01 # Conflicts: # rllib/agents/trainer.py # rllib/evaluation/rollout_worker.py # rllib/policy/policy.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few questions and comments. thanks.
@@ -31,6 +35,7 @@ | |||
"worker_side_prioritization": True, | |||
"min_iter_time_s": 30, | |||
}, | |||
_allow_unknown_configs=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are using Trainer's merge utility. It requires that if the second config (APEX-DDPG's) contains new keys that you set this to True.
Otherwise, it would complain about the new key (e.g. ) not being found in the first config (DDPG's).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👌 👌
rllib/agents/dqn/apex.py
Outdated
@@ -129,7 +136,8 @@ def execution_plan(workers: WorkerSet, config: dict, | |||
# Create a number of replay buffer actors. | |||
num_replay_buffer_shards = config["optimizer"][ | |||
"num_replay_buffer_shards"] | |||
replay_actors = create_colocated(ReplayActor, [ | |||
|
|||
args = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name this replay_actor_args may be clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
rllib/utils/actors.py
Outdated
def drop_colocated(actors): | ||
colocated, non_colocated = split_colocated(actors) | ||
for a in colocated: | ||
a.__ray_terminate__.remote() | ||
return non_colocated | ||
|
||
|
||
def split_colocated(actors): | ||
localhost = platform.node() | ||
def split_colocated(actors, node=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need to allow node param to be None?
the only user of this util provides node parameter when calling this.
also not sure if it's intuitive that split_colocated would split based on the node of the first actor if node param is not specified.
like this behavior feels a bit random?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed this. Should behave backward-compatibly now, with node="localhost" being the default (same behavior as before, where node
arg didn't exist and we always tried to locate on localhost).
Also added docstrings.
|
||
|
||
def try_create_colocated(cls, args, count, kwargs=None, node=None): | ||
kwargs = kwargs or {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't we make {} the default value for the argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dangerous, as python will keep that {}
dict around, so if you change it in one function call (add key/values to it), the next time you call the function w/o providing the arg, the function will use the altered dict (the one with the added key/value pair).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh right, I forgot {} is banned as defaults actually. thanks.
rllib/utils/actors.py
Outdated
return ok | ||
|
||
|
||
def try_create_colocated(cls, args, count, kwargs=None, node=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a comment here about node being None? Seems it's a pretty important detail that if node is None, it means we don't care which node these actors are colocated to, as long as they are together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'll add (better) docstrings to all these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
# Whether all shards of the replay buffer must be co-located | ||
# with the learner process (running the execution plan). | ||
# If False, replay shards may be created on different node(s). | ||
"replay_buffer_shards_colocated_with_driver": True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually I wonder, why do they need to be on the same node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For APEX, local node is the learner, so data (one in the buffer shards) never has to travel again. I think that's the sole intention here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see I see. to be honest, this doesn't feel like a requirement to me, more like an optimization.
since we don't have viability guarantee from Ray core, If it's up to me, I would choose to do this as a best-effort thing.
like trying to colocate everything, and if that fails, schedule the other rb shards anywhere.
then we don't have the while loop, and this scheduling can finish in at most 2 steps.
it is obviously too big of a change. maybe just add a note/todo somewhere???
as written, I am a little worried a stack may fail with mysterious error message like "fail to schedule RB actors" while there are enough CPUs, just a small head node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note: This is nothing new that I introduced here for APEX. We have always forced all replay shards to be located on the driver. This change actually allows users (via setting this new flag to False) to relax this constraint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add a comment to explain this more. ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
appreciate!
rllib/utils/actors.py
Outdated
# Maps types to lists of already co-located actors. | ||
ok = [[] for _ in range(len(actor_specs))] | ||
attempt = 1 | ||
while attempt < max_attempts: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if these actors don't fit on a same node the first time, why would they fit when we try the second time?
this is a case where we need PACK scheduling it seems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do use "PACK" strategy by default, so this should be ok. But it's a good question: Could still be that ray places the actor on a different node (bundle), no? And then we have to try again. I would love to use a ray tool to force placing an actor on a given node, but I don't think this exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I would love to show core folks this as an example use case when this PR is in.
rllib/execution/rollout_ops.py
Outdated
assert len(remote_kwargs) == len(actors) | ||
|
||
# Create a map inside Trainer instance that maps actorss to sets of open | ||
# requests (object refs). This way, we keep track, of which actorss have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you intentionally writing actorss here and above?
or are they typos?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:D definitely typos
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
rllib/execution/rollout_ops.py
Outdated
# already been sent how many requests | ||
# (`max_remote_requests_in_flight_per_actor` arg). | ||
if not hasattr(trainer, "_remote_requests_in_flight"): | ||
trainer._remote_requests_in_flight = defaultdict(set) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
initialize this in trainer's init() or setup()?
it is confusing if some other instances are creating _ private members of another class instance.
also, instead of passing entire trainer, why not pass _remote_requests_in_flights dict into this function? so this rollout op doesn't have access to everything that we have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 Makes all sense. I'll clean up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
This is only needed in the next PR (where we introduce the AlphaStar agent). In there, I'll set this property up in setup()
. 👍
rllib/execution/rollout_ops.py
Outdated
|
||
# Return None if nothing ready after the timeout. | ||
if not ready: | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we return [] or at least None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return
is the same as return None
, no? I can add the None
to make it more explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
…ntralized_multi_agent_learning_01
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for all the updates, looks great now.
one more minor suggestion to delete the util func that is not used anymore.
also thanks for digging into the logics for colocated actor scheduling. it's not pretty for sure.
# Whether all shards of the replay buffer must be co-located | ||
# with the learner process (running the execution plan). | ||
# If False, replay shards may be created on different node(s). | ||
"replay_buffer_shards_colocated_with_driver": True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
appreciate!
co_located, non_co_located = split_colocated(actors, node=node) | ||
logger.info("Got {} colocated actors of {}".format(len(co_located), count)) | ||
for a in non_co_located: | ||
a.__ray_terminate__.remote() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I double checked with clark, they have an open issue to make ray_terminate a public api.
|
||
|
||
@Deprecated(error=False) | ||
def drop_colocated(actors: List[ActorHandle]) -> List[ActorHandle]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete this util? it's not used anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe some user is using it somewhere :D
I marked it @Deprecated
.
Decentralized multi-agent learning; preparatory PR.
synchronous_parallel_sampling
already used by PGTrainer.create_colocated
via new and more genericcreate_colocated_actors
utility function. This allows users to co-locate any (different) types of actors on the same node.Why are these changes needed?
Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.