Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Add metrics to buffers. #49822

Merged
7 changes: 7 additions & 0 deletions rllib/algorithms/algorithm.py
Original file line number Diff line number Diff line change
Expand Up @@ -4002,6 +4002,13 @@ def _create_local_replay_buffer_if_necessary(
):
return

# Add parameters, if necessary.
if config["replay_buffer_config"]["type"] == "EpisodeReplayBuffer":
# TODO (simon): If all episode buffers have metrics, check for sublassing.
config["replay_buffer_config"][
"metrics_num_episodes_for_smoothing"
] = self.config.metrics_num_episodes_for_smoothing

return from_config(ReplayBuffer, config["replay_buffer_config"])

@OldAPIStack
Expand Down
6 changes: 6 additions & 0 deletions rllib/algorithms/dqn/dqn.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@
NUM_ENV_STEPS_SAMPLED_LIFETIME,
NUM_TARGET_UPDATES,
REPLAY_BUFFER_ADD_DATA_TIMER,
REPLAY_BUFFER_RESULTS,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

REPLAY_BUFFER_SAMPLE_TIMER,
REPLAY_BUFFER_UPDATE_PRIOS_TIMER,
SAMPLE_TIMER,
Expand Down Expand Up @@ -660,6 +661,11 @@ def _training_step_new_api_stack(self):
sample_episodes=True,
)

replay_buffer_results = self.local_replay_buffer.get_metrics()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice. Unified API names get_metrics. Analogous to EnvRunners.

self.metrics.merge_and_log_n_dicts(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wonder why log_dict doesn't work here. It should be the better choice here b/c we don't have more than one buffer.

self.metrics.log_dict(
    replay_buffer_results,
    key=REPLAY_BUFFER_RESULTS,
)

Maybe b/c in replay_buffer_results there are already Stats objects with their individual settings? ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to check it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, basically, the lifetime metrics are somehow wrongly accumulated and grow exponentially. They probably need to be reduced before given to the log_dict method.

[replay_buffer_results], key=REPLAY_BUFFER_RESULTS
)

# Perform an update on the buffer-sampled train batch.
with self.metrics.log_time((TIMERS, LEARNER_UPDATE_TIMER)):
learner_results = self.learner_group.update_from_episodes(
Expand Down
10 changes: 7 additions & 3 deletions rllib/tuned_examples/dqn/cartpole_dqn.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,15 @@
lr=0.0005 * (args.num_learners or 1) ** 0.5,
train_batch_size_per_learner=32,
replay_buffer_config={
"type": "PrioritizedEpisodeReplayBuffer",
"type": "EpisodeReplayBuffer",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this just for testing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wanted to check with you, if we proceed like this and then all buffers get the metrics. Then I can test with any of them.

"capacity": 50000,
"alpha": 0.6,
"beta": 0.4,
},
# replay_buffer_config={
# "type": "PrioritizedEpisodeReplayBuffer",
# "capacity": 50000,
# "alpha": 0.6,
# "beta": 0.4,
# },
n_step=(2, 5),
double_q=True,
dueling=True,
Expand Down
28 changes: 28 additions & 0 deletions rllib/utils/metrics/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,34 @@
NUM_MODULE_STEPS_SAMPLED = "num_module_steps_sampled"
NUM_MODULE_STEPS_SAMPLED_LIFETIME = "num_module_steps_sampled_lifetime"

# Counters for adding and evicting in replay buffers.
# TODO (Simon): Check, if we should prefix with 'REPLAY_BUFFER'.
NUM_ENV_STEPS = "num_env_steps"
NUM_ENV_STEPS_ADDED = "num_env_steps_added"
NUM_ENV_STEPS_ADDED_LIFETIME = "num_env_steps_added_lifetime"
NUM_ENV_STEPS_EVICTED = "num_env_steps_evicted"
NUM_ENV_STEPS_EVICTED_LIFETIME = "num_env_steps_evicted_lifetime"
NUM_AGENT_EPISODES = "num_agent_episodes"
NUM_AGENT_EPISODES_ADDED = "num_agent_episodes_added"
NUM_AGENT_EPISODES_ADDED_LIFETIME = "num_agent_episodes_added_lifetime"
NUM_AGENT_EPISODES_EVICTED = "num_agent_episodes_evicted"
NUM_AGENT_EPISODES_EVICTED_LIFETIME = "num_agent_episodes_evicted_lifetime"
NUM_AGENT_EPISODES_PER_SAMPLE = ("num_agent_episodes_per_sample",)
NUM_AGENT_STEPS = "num_agent_steps"
NUM_AGENT_STEPS_ADDED = "num_agent_steps_added"
NUM_AGENT_STEPS_ADDED_LIFETIME = "num_agent_steps_added_lifetime"
NUM_AGENT_STEPS_EVICTED = "num_agent_steps_evicted"
NUM_AGENT_STEPS_EVICTED_LIFETIME = "num_agent_steps_evicted_lifetime"
NUM_EPISODES = "num_episodes"
NUM_EPISODES_ADDED = "num_episodes_added"
NUM_EPISODES_ADDED_LIFETIME = "num_episodes_added_lifetime"
NUM_EPISODES_EVICTED = "num_episodes_evicted"
NUM_EPISODES_EVICTED_LIFETIME = "num_episodes_evicted_lifetime"
NUM_EPISODES_PER_SAMPLE = "num_episodes_per_sample"
NUM_AGENT_STEPS_EVICTED = "num_agent_steps_evicted"
# If some requirements (like length) are not matched we resample.
NUM_MISSED_SAMPLES = "num_missed_samples"

EPISODE_DURATION_SEC_MEAN = "episode_duration_sec_mean"
EPISODE_LEN_MEAN = "episode_len_mean"
EPISODE_LEN_MAX = "episode_len_max"
Expand Down
Loading