-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] New ConnectorV2 API #06: Changes in SingleAgentEpisode & SingleAgentEnvRunner. #42296
[RLlib] New ConnectorV2 API #06: Changes in SingleAgentEpisode & SingleAgentEnvRunner. #42296
Conversation
Signed-off-by: sven1977 <[email protected]>
@@ -623,40 +559,6 @@ def stop(self): | |||
# Close our env object via gymnasium's API. | |||
self.env.close() | |||
|
|||
# TODO (sven): Replace by default "to-env" connector. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessary anymore here in EnvRunner.
This is default ModuleToEnv connector behavior now.
…runner_support_connectors_06_small_changes_on_env_runner_and_episode
…runner_support_connectors_06_small_changes_on_env_runner_and_episode
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
# TODO (sven): Convert data to proper tensor formats, depending on framework | ||
# used by the RLModule. We cannot do this right now as the RLModule does NOT | ||
# know its own device. Only the Learner knows the device. Also, on the | ||
# EnvRunner side, we assume that it's always the CPU (even though one could | ||
# imagine a GPU-based EnvRunner + RLModule for sampling). | ||
# if rl_module.framework == "torch": | ||
# data = convert_to_torch_tensor(data, device=??) | ||
# elif rl_module.framework == "tf2": | ||
# data = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uncomment or remove?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a TODO on an open question with the possible code-solution commented out. I'll leave this in. We need to unify this behavior (numpy to tensor) for all connector types in the near future to not cause user confusion.
The blocker right now is the fact that an RLModule does not know its own device today (only Learners do (GPU or CPU) and EnvRunners assume they are always on the CPU). Thus, connectors have to means to perform this conversion step properly.
rl_module=self.module, | ||
episodes=self._episodes, | ||
explore=explore, | ||
# persistent_data=None, #TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a TODO here? What is the todo exactly?
data=to_env, | ||
episodes=self._episodes, | ||
explore=explore, | ||
# persistent_data=None, #TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same thing
Signed-off-by: sven1977 <[email protected]>
…runner_support_connectors_06_small_changes_on_env_runner_and_episode
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
…runner_support_connectors_06_small_changes_on_env_runner_and_episode
Signed-off-by: sven1977 <[email protected]>
…ode & SingleAgentEnvRunner. (ray-project#42296)
This PR adds some changes to SingleAgentEpisode & SingleAgentEnvRunner:
SingleAgentEnvRunner now utilizes the user-configured EnvToModule and ModuleToEnv connector pipelines.
Hence,
SingleAgentEnvRunner
does NOT anymore:Add
set
APIs to SingleAgentEpisode, such that custom connectors are able to manipulate an episode's data, e.g. for observation framestacking, reward clipping, etc..New
set
API had to also be supported then byInfiniteLookbackBuffer
, which sits at the core of all episode classes.Updated test cases and added new ones for
set
APIs.Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.