-
Notifications
You must be signed in to change notification settings - Fork 8.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Step API with terminated, truncated bools instead of done #2752
Merged
Merged
Changes from 1 commit
Commits
Show all changes
52 commits
Select commit
Hold shift + click to select a range
e6b0a40
New Step API with terminated, truncated bools instead of done
arjun-kg 6618da5
Merge branch 'master' of https://github.com/openai/gym into done_term…
arjun-kg a0c4475
Setting return_two_dones=False as default
arjun-kg 2aabc30
update warnings
arjun-kg 1babe4e
pytest - ignore deprecation warnings
arjun-kg c9c6add
Only ignore step api deprecation warnings
arjun-kg c5fe53c
fix duplicate wrapping bug in vector envs
arjun-kg f88927d
Merge branch 'master' of https://github.com/openai/gym into done_term…
arjun-kg 7c1e9c7
Merge branch 'master' of https://github.com/openai/gym into done_term…
arjun-kg 6af7182
edit docstrings, comments, warnings
arjun-kg 22c1cc7
Merge branch 'master' of https://github.com/openai/gym into done_term…
arjun-kg 68ef969
step compatibility for wrappers, vectors
arjun-kg f06343b
reset tests back to old api
arjun-kg 794737b
fix circular import
arjun-kg f89e5da
merge tests with master
arjun-kg 8b518bb
existing code, tests work
arjun-kg 9a2a9af
fix compat at registration, tests
arjun-kg 29eafe5
docstrings, tests passing
arjun-kg 63fc044
Merge branch 'master' of https://github.com/openai/gym into done_term…
arjun-kg 97f36d3
dealing with conflicts
arjun-kg 63d3d19
update wrapper class to use step compatibility
arjun-kg 492c6e1
Merge branch 'master' of https://github.com/openai/gym into done_term…
arjun-kg 9ce03cb
add warning for play
arjun-kg f93295f
add todo
arjun-kg 1940494
replace 'closing' with 'final'
arjun-kg f12b5fb
fix pre-commit
arjun-kg aa5a071
remove previously missed `done` references
arjun-kg e135b9e
fix step compat in atari wrapper reset
arjun-kg 2bb742a
Merge branch 'master' of https://github.com/openai/gym into done_term…
arjun-kg 1f11077
fix tests with step returning np.bool_
arjun-kg e861fbc
remove warning for using new api
arjun-kg fe04e7c
Merge branch 'master' of https://github.com/openai/gym into done_term…
arjun-kg 8e56f45
pre-commit fixes
arjun-kg 4491d9a
Merge branch 'master' of https://github.com/openai/gym into done_term…
arjun-kg be947e3
Merge branch 'master' of https://github.com/openai/gym into done_term…
arjun-kg 5e8f085
new API does not include 'TimeLimit.truncated' in info
arjun-kg cdb3516
fix checks, tests
arjun-kg 8cc2074
vector info mask - fix wrong underscore
arjun-kg 2f83d55
dont remove from info
arjun-kg 57e839c
Merge branch 'master' of https://github.com/openai/gym into done_term…
arjun-kg b1660cf
edit definitions
arjun-kg ea10e7a
remove whitespaces :/
arjun-kg bffa257
Merge branch 'master' of https://github.com/openai/gym into done_term…
arjun-kg d7dff2c
update tests
arjun-kg b2c10a4
fix pattern
arjun-kg 6553bed
restructure warnings
arjun-kg 50d367e
fix incorrect warning
arjun-kg d71836f
fix incorrect warnings (properly)
arjun-kg 78a507e
Merge branch 'master' of https://github.com/openai/gym into done_term…
arjun-kg a747625
add warning to env checker
arjun-kg 28c7b36
Merge branch 'master' of https://github.com/openai/gym into done_term…
arjun-kg d65d21b
Merge branch 'master' of https://github.com/openai/gym into done_term…
arjun-kg File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -61,12 +61,17 @@ def np_random(self, value: RandomNumberGenerator): | |
self._np_random = value | ||
|
||
@abstractmethod | ||
def step(self, action: ActType) -> Tuple[ObsType, float, bool, dict]: | ||
def step( | ||
self, action: ActType | ||
) -> Union[ | ||
Tuple[ObsType, float, bool, bool, dict], Tuple[ObsType, float, bool, dict] | ||
]: | ||
"""Run one timestep of the environment's dynamics. When end of | ||
episode is reached, you are responsible for calling :meth:`reset` | ||
to reset this environment's state. | ||
|
||
Accepts an action and returns a tuple (observation, reward, done, info). | ||
Accepts an action and returns either a tuple (observation, reward, terminated, truncated, info) or a tuple | ||
(observation, reward, done, info). The latter is deprecated and will be removed in future versions. | ||
|
||
Args: | ||
action (object): an action provided by the agent | ||
|
@@ -76,13 +81,17 @@ def step(self, action: ActType) -> Tuple[ObsType, float, bool, dict]: | |
Returns: | ||
observation (object): agent's observation of the current environment. This will be an element of the environment's :attr:`observation_space`. This may, for instance, be a numpy array containing the positions and velocities of certain objects. | ||
reward (float) : amount of reward returned after previous action | ||
done (bool): whether the episode has ended, in which case further :meth:`step` calls will return undefined results. A done signal may be emitted for different reasons: Maybe the task underlying the environment was solved successfully, a certain timelimit was exceeded, or the physics simulation has entered an invalid state. ``info`` may contain additional information regarding the reason for a ``done`` signal. | ||
terminated (bool): whether the episode has ended due to a termination, in which case further step() calls will return undefined results | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would rephrase "termination" to something like "reaching a terminal state", or otherwise to indicate that it's about the intrinsic properties of the environment |
||
truncated (bool): whether the episode has ended due to a truncation, in which case further step() calls will return undefined results | ||
info (dict): contains auxiliary diagnostic information (helpful for debugging, learning, and logging). This might, for instance, contain: | ||
|
||
- metrics that describe the agent's performance or | ||
- state variables that are hidden from observations or | ||
- information that distinguishes truncation and termination or | ||
- individual reward terms that are combined to produce the total reward | ||
|
||
(deprecated) | ||
done (bool): whether the episode has ended due to any reason, in which case further step() calls will return undefined results | ||
""" | ||
raise NotImplementedError | ||
|
||
|
@@ -290,7 +299,11 @@ def metadata(self) -> dict: | |
def metadata(self, value): | ||
self._metadata = value | ||
|
||
def step(self, action: ActType) -> Tuple[ObsType, float, bool, dict]: | ||
def step( | ||
self, action: ActType | ||
) -> Union[ | ||
Tuple[ObsType, float, bool, bool, dict], Tuple[ObsType, float, bool, dict] | ||
]: | ||
return self.env.step(action) | ||
|
||
def reset(self, **kwargs) -> Union[ObsType, tuple[ObsType, dict]]: | ||
|
@@ -325,8 +338,13 @@ def reset(self, **kwargs): | |
return self.observation(self.env.reset(**kwargs)) | ||
|
||
def step(self, action): | ||
observation, reward, done, info = self.env.step(action) | ||
return self.observation(observation), reward, done, info | ||
step_returns = self.env.step(action) | ||
if len(step_returns) == 5: | ||
observation, reward, terminated, truncated, info = step_returns | ||
return self.observation(observation), reward, terminated, truncated, info | ||
else: | ||
observation, reward, done, info = step_returns | ||
return self.observation(observation), reward, done, info | ||
arjun-kg marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
@abstractmethod | ||
def observation(self, observation): | ||
|
@@ -338,8 +356,13 @@ def reset(self, **kwargs): | |
return self.env.reset(**kwargs) | ||
|
||
def step(self, action): | ||
observation, reward, done, info = self.env.step(action) | ||
return observation, self.reward(reward), done, info | ||
step_returns = self.env.step(action) | ||
if len(step_returns) == 5: | ||
observation, reward, terminated, truncated, info = step_returns | ||
return observation, self.reward(reward), terminated, truncated, info | ||
else: | ||
observation, reward, done, info = step_returns | ||
return observation, self.reward(reward), done, info | ||
arjun-kg marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
@abstractmethod | ||
def reward(self, reward): | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if I like this approach to backwards compatibility. If this is the official state of (for example) 0.24.0, then you can't reliably write an algorithm that will work for all valid 0.24.0 environments. I think we should just say that an environment should have the signature of (ObsType, float, bool, bool, dict), and then provide a wrapper-like compatibility layer that can convert an old-style environment to a new-style environment.