Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Add metrics to buffers. #49822

Merged

Conversation

simonsays1980
Copy link
Collaborator

@simonsays1980 simonsays1980 commented Jan 14, 2025

Why are these changes needed?

This PR proposes the following changes:

  • Add a MetricsLogger to the EpisodeReplayBuffer's.
  • Add multiple additional metrics to the ray.rllib.utils.metrics.__init__.
  • Log values for the added metrics in the EpisodeReplayBuffers during add and sample operations.
  • Collect these metrics in the off-policy algorithms in RLlib, namely, DQN and SAC.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: simonsays1980 <[email protected]>
…ermore, added a further key argument for the initialization of the buffer to get the number of iterations for smoothing.

Signed-off-by: simonsays1980 <[email protected]>
@simonsays1980 simonsays1980 marked this pull request as ready for review January 14, 2025 08:49
@@ -51,6 +51,7 @@
NUM_ENV_STEPS_SAMPLED_LIFETIME,
NUM_TARGET_UPDATES,
REPLAY_BUFFER_ADD_DATA_TIMER,
REPLAY_BUFFER_RESULTS,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

@@ -18,11 +18,15 @@
lr=0.0005 * (args.num_learners or 1) ** 0.5,
train_batch_size_per_learner=32,
replay_buffer_config={
"type": "PrioritizedEpisodeReplayBuffer",
"type": "EpisodeReplayBuffer",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this just for testing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wanted to check with you, if we proceed like this and then all buffers get the metrics. Then I can test with any of them.

@@ -660,6 +661,11 @@ def _training_step_new_api_stack(self):
sample_episodes=True,
)

replay_buffer_results = self.local_replay_buffer.get_metrics()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice. Unified API names get_metrics. Analogous to EnvRunners.

@@ -660,6 +661,11 @@ def _training_step_new_api_stack(self):
sample_episodes=True,
)

replay_buffer_results = self.local_replay_buffer.get_metrics()
self.metrics.merge_and_log_n_dicts(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wonder why log_dict doesn't work here. It should be the better choice here b/c we don't have more than one buffer.

self.metrics.log_dict(
    replay_buffer_results,
    key=REPLAY_BUFFER_RESULTS,
)

Maybe b/c in replay_buffer_results there are already Stats objects with their individual settings? ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to check it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, basically, the lifetime metrics are somehow wrongly accumulated and grow exponentially. They probably need to be reduced before given to the log_dict method.

@@ -646,6 +896,10 @@ def get_added_timesteps(self) -> int:
"""Returns number of timesteps that have been added in buffer's lifetime."""
return self._num_timesteps_added

def get_metrics(self) -> ResultDict:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Add docstring.

Copy link
Contributor

@sven1977 sven1977 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for this really cool PR. A handful of nits and one important question on the usage of log_dict vs merge_and_log_n_dicts. Let's take a look at log_dicts and figure out why it doesn't work here (according to our offline discussion). log_dicts should be the better choice here, b/c we are NOT merging > 1 dicts from parallel subcomponents.

@sven1977 sven1977 changed the title [RLlib] - Add metrics to buffers [RLlib] Add metrics to buffers. Jan 15, 2025
@sven1977 sven1977 enabled auto-merge (squash) January 15, 2025 11:48
@github-actions github-actions bot added the go add ONLY when ready to merge, run all tests label Jan 15, 2025
@github-actions github-actions bot disabled auto-merge January 16, 2025 15:47
@sven1977 sven1977 merged commit 3d0f4ac into ray-project:master Jan 20, 2025
5 checks passed
anson627 pushed a commit to anson627/ray that referenced this pull request Jan 31, 2025
srinathk10 pushed a commit that referenced this pull request Feb 2, 2025
anyadontfly pushed a commit to anyadontfly/ray that referenced this pull request Feb 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants