-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Add metrics to buffers. #49822
[RLlib] Add metrics to buffers. #49822
Conversation
…odeReplayBuffer'. Signed-off-by: simonsays1980 <[email protected]>
…chanism in 'DQN'. Signed-off-by: simonsays1980 <[email protected]>
Signed-off-by: simonsays1980 <[email protected]>
…ermore, added a further key argument for the initialization of the buffer to get the number of iterations for smoothing. Signed-off-by: simonsays1980 <[email protected]>
@@ -51,6 +51,7 @@ | |||
NUM_ENV_STEPS_SAMPLED_LIFETIME, | |||
NUM_TARGET_UPDATES, | |||
REPLAY_BUFFER_ADD_DATA_TIMER, | |||
REPLAY_BUFFER_RESULTS, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool!
@@ -18,11 +18,15 @@ | |||
lr=0.0005 * (args.num_learners or 1) ** 0.5, | |||
train_batch_size_per_learner=32, | |||
replay_buffer_config={ | |||
"type": "PrioritizedEpisodeReplayBuffer", | |||
"type": "EpisodeReplayBuffer", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this just for testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I wanted to check with you, if we proceed like this and then all buffers get the metrics. Then I can test with any of them.
@@ -660,6 +661,11 @@ def _training_step_new_api_stack(self): | |||
sample_episodes=True, | |||
) | |||
|
|||
replay_buffer_results = self.local_replay_buffer.get_metrics() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice. Unified API names get_metrics
. Analogous to EnvRunners.
@@ -660,6 +661,11 @@ def _training_step_new_api_stack(self): | |||
sample_episodes=True, | |||
) | |||
|
|||
replay_buffer_results = self.local_replay_buffer.get_metrics() | |||
self.metrics.merge_and_log_n_dicts( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I wonder why log_dict
doesn't work here. It should be the better choice here b/c we don't have more than one buffer.
self.metrics.log_dict(
replay_buffer_results,
key=REPLAY_BUFFER_RESULTS,
)
Maybe b/c in replay_buffer_results
there are already Stats
objects with their individual settings? ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to check it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, basically, the lifetime metrics are somehow wrongly accumulated and grow exponentially. They probably need to be reduced before given to the log_dict
method.
@@ -646,6 +896,10 @@ def get_added_timesteps(self) -> int: | |||
"""Returns number of timesteps that have been added in buffer's lifetime.""" | |||
return self._num_timesteps_added | |||
|
|||
def get_metrics(self) -> ResultDict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Add docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for this really cool PR. A handful of nits and one important question on the usage of log_dict
vs merge_and_log_n_dicts
. Let's take a look at log_dicts
and figure out why it doesn't work here (according to our offline discussion). log_dicts
should be the better choice here, b/c we are NOT merging > 1 dicts from parallel subcomponents.
…ics to the 'PrioritizedEpisodeReplayBuffer'. Added also docstrings. Signed-off-by: simonsays1980 <[email protected]>
Signed-off-by: simonsays1980 <[email protected]>
Signed-off-by: simonsays1980 <[email protected]>
Signed-off-by: Anson Qian <[email protected]>
Signed-off-by: Puyuan Yao <[email protected]>
Why are these changes needed?
This PR proposes the following changes:
MetricsLogger
to theEpisodeReplayBuffer
's.ray.rllib.utils.metrics.__init__
.EpisodeReplayBuffer
s duringadd
andsample
operations.off-policy
algorithms inRLlib
, namely,DQN
andSAC
.Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.