-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Decentralized multi-agent learning; PR #01 #21421
Changes from 8 commits
be8b0c7
e9cc32a
83ea25c
43db716
488443f
2e35dc0
cf07481
51fae10
d4168ca
affa2c6
c872b25
bd1a0bc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -23,6 +23,10 @@ | |
"buffer_size": 2000000, | ||
# TODO(jungong) : update once Apex supports replay_buffer_config. | ||
"replay_buffer_config": None, | ||
# Whether all shards of the replay buffer must be co-located | ||
# with the learner process (running the execution plan). | ||
# If False, replay shards may be created on different node(s). | ||
"replay_buffer_shards_colocated_with_driver": True, | ||
"learning_starts": 50000, | ||
"train_batch_size": 512, | ||
"rollout_fragment_length": 50, | ||
|
@@ -31,6 +35,7 @@ | |
"worker_side_prioritization": True, | ||
"min_iter_time_s": 30, | ||
}, | ||
_allow_unknown_configs=True, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why need this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We are using Trainer's merge utility. It requires that if the second config (APEX-DDPG's) contains new keys that you set this to True. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👌 👌 |
||
) | ||
|
||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,6 +14,7 @@ | |
|
||
import collections | ||
import copy | ||
import platform | ||
from typing import Tuple | ||
|
||
import ray | ||
|
@@ -32,7 +33,7 @@ | |
from ray.rllib.execution.rollout_ops import ParallelRollouts | ||
from ray.rllib.execution.train_ops import UpdateTargetNetwork | ||
from ray.rllib.utils import merge_dicts | ||
from ray.rllib.utils.actors import create_colocated | ||
from ray.rllib.utils.actors import create_colocated_actors | ||
from ray.rllib.utils.annotations import override | ||
from ray.rllib.utils.metrics.learner_info import LEARNER_INFO | ||
from ray.rllib.utils.typing import SampleBatchType, TrainerConfigDict | ||
|
@@ -55,10 +56,16 @@ | |
"n_step": 3, | ||
"num_gpus": 1, | ||
"num_workers": 32, | ||
|
||
"buffer_size": 2000000, | ||
# TODO(jungong) : add proper replay_buffer_config after | ||
# DistributedReplayBuffer type is supported. | ||
"replay_buffer_config": None, | ||
# Whether all shards of the replay buffer must be co-located | ||
# with the learner process (running the execution plan). | ||
# If False, replay shards may be created on different node(s). | ||
"replay_buffer_shards_colocated_with_driver": True, | ||
|
||
"learning_starts": 50000, | ||
"train_batch_size": 512, | ||
"rollout_fragment_length": 50, | ||
|
@@ -129,7 +136,8 @@ def execution_plan(workers: WorkerSet, config: dict, | |
# Create a number of replay buffer actors. | ||
num_replay_buffer_shards = config["optimizer"][ | ||
"num_replay_buffer_shards"] | ||
replay_actors = create_colocated(ReplayActor, [ | ||
|
||
args = [ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. name this replay_actor_args may be clearer. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
num_replay_buffer_shards, | ||
config["learning_starts"], | ||
config["buffer_size"], | ||
|
@@ -139,7 +147,22 @@ def execution_plan(workers: WorkerSet, config: dict, | |
config["prioritized_replay_eps"], | ||
config["multiagent"]["replay_mode"], | ||
config.get("replay_sequence_length", 1), | ||
], num_replay_buffer_shards) | ||
] | ||
# Place all replay buffer shards on the same node as the learner | ||
# (driver process that runs this execution plan). | ||
if config["replay_buffer_shards_colocated_with_driver"]: | ||
replay_actors = create_colocated_actors( | ||
actor_specs=[ | ||
# (class, args, kwargs={}, count) | ||
(ReplayActor, args, {}, num_replay_buffer_shards) # [0] | ||
], | ||
node=platform.node(), # localhost | ||
)[0] | ||
# Place replay buffer shards on any node(s). | ||
else: | ||
replay_actors = [ | ||
ReplayActor(*args) for _ in range(num_replay_buffer_shards) | ||
] | ||
|
||
# Start the learner thread. | ||
learner_thread = LearnerThread(workers.local_worker()) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually I wonder, why do they need to be on the same node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For APEX, local node is the learner, so data (one in the buffer shards) never has to travel again. I think that's the sole intention here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see I see. to be honest, this doesn't feel like a requirement to me, more like an optimization.
since we don't have viability guarantee from Ray core, If it's up to me, I would choose to do this as a best-effort thing.
like trying to colocate everything, and if that fails, schedule the other rb shards anywhere.
then we don't have the while loop, and this scheduling can finish in at most 2 steps.
it is obviously too big of a change. maybe just add a note/todo somewhere???
as written, I am a little worried a stack may fail with mysterious error message like "fail to schedule RB actors" while there are enough CPUs, just a small head node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note: This is nothing new that I introduced here for APEX. We have always forced all replay shards to be located on the driver. This change actually allows users (via setting this new flag to False) to relax this constraint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add a comment to explain this more. ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
appreciate!