Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Add metrics to buffers. #49822

Merged
10 changes: 10 additions & 0 deletions rllib/algorithms/algorithm.py
Original file line number Diff line number Diff line change
Expand Up @@ -3948,6 +3948,16 @@ def _create_local_replay_buffer_if_necessary(
):
return

# Add parameters, if necessary.
if config["replay_buffer_config"]["type"] in [
"EpisodeReplayBuffer",
"PrioritizedEpisodeReplayBuffer",
]:
# TODO (simon): If all episode buffers have metrics, check for sublassing.
config["replay_buffer_config"][
"metrics_num_episodes_for_smoothing"
] = self.config.metrics_num_episodes_for_smoothing

return from_config(ReplayBuffer, config["replay_buffer_config"])

@OldAPIStack
Expand Down
7 changes: 7 additions & 0 deletions rllib/algorithms/dqn/dqn.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@
NUM_ENV_STEPS_SAMPLED_LIFETIME,
NUM_TARGET_UPDATES,
REPLAY_BUFFER_ADD_DATA_TIMER,
REPLAY_BUFFER_RESULTS,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

REPLAY_BUFFER_SAMPLE_TIMER,
REPLAY_BUFFER_UPDATE_PRIOS_TIMER,
SAMPLE_TIMER,
Expand Down Expand Up @@ -689,6 +690,12 @@ def _training_step_new_api_stack(self):
sample_episodes=True,
)

# Get the replay buffer metrics.
replay_buffer_results = self.local_replay_buffer.get_metrics()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice. Unified API names get_metrics. Analogous to EnvRunners.

self.metrics.merge_and_log_n_dicts(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wonder why log_dict doesn't work here. It should be the better choice here b/c we don't have more than one buffer.

self.metrics.log_dict(
    replay_buffer_results,
    key=REPLAY_BUFFER_RESULTS,
)

Maybe b/c in replay_buffer_results there are already Stats objects with their individual settings? ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to check it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, basically, the lifetime metrics are somehow wrongly accumulated and grow exponentially. They probably need to be reduced before given to the log_dict method.

[replay_buffer_results], key=REPLAY_BUFFER_RESULTS
)

# Perform an update on the buffer-sampled train batch.
with self.metrics.log_time((TIMERS, LEARNER_UPDATE_TIMER)):
learner_results = self.learner_group.update_from_episodes(
Expand Down
34 changes: 34 additions & 0 deletions rllib/utils/metrics/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,40 @@
ENV_TO_MODULE_SUM_EPISODES_LENGTH_IN = "env_to_module_sum_episodes_length_in"
ENV_TO_MODULE_SUM_EPISODES_LENGTH_OUT = "env_to_module_sum_episodes_length_out"

# Counters for adding and evicting in replay buffers.
ACTUAL_N_STEP = "actual_n_step"
AGENT_ACTUAL_N_STEP = "agent_actual_n_step"
AGENT_STEP_UTILIZATION = "agent_step_utilization"
ENV_STEP_UTILIZATION = "env_step_utilization"
NUM_AGENT_EPISODES_STORED = "num_agent_episodes"
NUM_AGENT_EPISODES_ADDED = "num_agent_episodes_added"
NUM_AGENT_EPISODES_ADDED_LIFETIME = "num_agent_episodes_added_lifetime"
NUM_AGENT_EPISODES_EVICTED = "num_agent_episodes_evicted"
NUM_AGENT_EPISODES_EVICTED_LIFETIME = "num_agent_episodes_evicted_lifetime"
NUM_AGENT_EPISODES_PER_SAMPLE = "num_agent_episodes_per_sample"
NUM_AGENT_RESAMPLES = "num_agent_resamples"
NUM_AGENT_STEPS_ADDED = "num_agent_steps_added"
NUM_AGENT_STEPS_ADDED_LIFETIME = "num_agent_steps_added_lifetime"
NUM_AGENT_STEPS_EVICTED = "num_agent_steps_evicted"
NUM_AGENT_STEPS_EVICTED_LIFETIME = "num_agent_steps_evicted_lifetime"
NUM_AGENT_STEPS_PER_SAMPLE = "num_agent_steps_per_sample"
NUM_AGENT_STEPS_PER_SAMPLE_LIFETIME = "num_agent_steps_per_sample_lifetime"
NUM_AGENT_STEPS_STORED = "num_agent_steps"
NUM_ENV_STEPS_STORED = "num_env_steps"
NUM_ENV_STEPS_ADDED = "num_env_steps_added"
NUM_ENV_STEPS_ADDED_LIFETIME = "num_env_steps_added_lifetime"
NUM_ENV_STEPS_EVICTED = "num_env_steps_evicted"
NUM_ENV_STEPS_EVICTED_LIFETIME = "num_env_steps_evicted_lifetime"
NUM_ENV_STEPS_PER_SAMPLE = "num_env_steps_per_sample"
NUM_ENV_STEPS_PER_SAMPLE_LIFETIME = "num_env_steps_per_sample_lifetime"
NUM_EPISODES_STORED = "num_episodes"
NUM_EPISODES_ADDED = "num_episodes_added"
NUM_EPISODES_ADDED_LIFETIME = "num_episodes_added_lifetime"
NUM_EPISODES_EVICTED = "num_episodes_evicted"
NUM_EPISODES_EVICTED_LIFETIME = "num_episodes_evicted_lifetime"
NUM_EPISODES_PER_SAMPLE = "num_episodes_per_sample"
NUM_RESAMPLES = "num_resamples"

EPISODE_DURATION_SEC_MEAN = "episode_duration_sec_mean"
EPISODE_LEN_MEAN = "episode_len_mean"
EPISODE_LEN_MAX = "episode_len_max"
Expand Down
Loading
Loading