Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] New ConnectorV2 API #06: Changes in SingleAgentEpisode & SingleAgentEnvRunner. #42296

Conversation

sven1977
Copy link
Contributor

@sven1977 sven1977 commented Jan 10, 2024

This PR adds some changes to SingleAgentEpisode & SingleAgentEnvRunner:

  • SingleAgentEnvRunner now utilizes the user-configured EnvToModule and ModuleToEnv connector pipelines.

  • Hence, SingleAgentEnvRunner does NOT anymore:

    • Have to deal with internal (RNN) states.
    • Action sampling, logp, probs computations as these are all done automatically in the default ModuleToEnv pipeline.
  • Add set APIs to SingleAgentEpisode, such that custom connectors are able to manipulate an episode's data, e.g. for observation framestacking, reward clipping, etc..

  • New set API had to also be supported then by InfiniteLookbackBuffer, which sits at the core of all episode classes.

  • Updated test cases and added new ones for set APIs.

Why are these changes needed?

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: sven1977 <[email protected]>
@@ -623,40 +559,6 @@ def stop(self):
# Close our env object via gymnasium's API.
self.env.close()

# TODO (sven): Replace by default "to-env" connector.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary anymore here in EnvRunner.

This is default ModuleToEnv connector behavior now.

@sven1977 sven1977 changed the title [RLlib] EnvRunners support ConnectorV2 API: Changes in SingleAgentEpisode & SingleAgentEnvRunner. [RLlib] New ConnectorV2 API #06: Changes in SingleAgentEpisode & SingleAgentEnvRunner. Jan 10, 2024
…runner_support_connectors_06_small_changes_on_env_runner_and_episode
…runner_support_connectors_06_small_changes_on_env_runner_and_episode
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Comment on lines +164 to +172
# TODO (sven): Convert data to proper tensor formats, depending on framework
# used by the RLModule. We cannot do this right now as the RLModule does NOT
# know its own device. Only the Learner knows the device. Also, on the
# EnvRunner side, we assume that it's always the CPU (even though one could
# imagine a GPU-based EnvRunner + RLModule for sampling).
# if rl_module.framework == "torch":
# data = convert_to_torch_tensor(data, device=??)
# elif rl_module.framework == "tf2":
# data =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uncomment or remove?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a TODO on an open question with the possible code-solution commented out. I'll leave this in. We need to unify this behavior (numpy to tensor) for all connector types in the near future to not cause user confusion.
The blocker right now is the fact that an RLModule does not know its own device today (only Learners do (GPU or CPU) and EnvRunners assume they are always on the CPU). Thus, connectors have to means to perform this conversion step properly.

rl_module=self.module,
episodes=self._episodes,
explore=explore,
# persistent_data=None, #TODO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a TODO here? What is the todo exactly?

data=to_env,
episodes=self._episodes,
explore=explore,
# persistent_data=None, #TODO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same thing

Signed-off-by: sven1977 <[email protected]>
…runner_support_connectors_06_small_changes_on_env_runner_and_episode
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
…runner_support_connectors_06_small_changes_on_env_runner_and_episode
Signed-off-by: sven1977 <[email protected]>
@sven1977 sven1977 merged commit 6da2636 into ray-project:master Jan 12, 2024
9 checks passed
vickytsang pushed a commit to ROCm/ray that referenced this pull request Jan 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rllib RLlib related issues rllib-newstack
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants