Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release v1.8.0 #1434

Merged
merged 1 commit into from
Apr 8, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 19 additions & 3 deletions docs/misc/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,11 @@ Changelog
==========


Release 1.8.0a14 (WIP)
Release 1.8.0 (2023-04-07)
--------------------------

**Multi-env HerReplayBuffer, Open RL Benchmark, Improved env checker**

.. warning::

Stable-Baselines3 (SB3) v1.8.0 will be the last one to use Gym as a backend.
Expand All @@ -31,23 +33,37 @@ New Features:
- Added support for dict/tuple observations spaces for ``VecCheckNan``, the check is now active in the ``env_checker()`` (@DavyMorgan)
- Added multiprocessing support for ``HerReplayBuffer``
- ``HerReplayBuffer`` now supports all datatypes supported by ``ReplayBuffer``
- Provide more helpful failure messages when validating the ``observation_space`` of custom gym environments using ``check_env``` (@FieteO)
- Provide more helpful failure messages when validating the ``observation_space`` of custom gym environments using ``check_env`` (@FieteO)
- Added ``stats_window_size`` argument to control smoothing in rollout logging (@jonasreiher)


`SB3-Contrib`_
^^^^^^^^^^^^^^
- Added warning about potential crashes caused by ``check_env`` in the ``MaskablePPO`` docs (@AlexPasqua)
- Fixed ``sb3_contrib/qrdqn/*.py`` type hints
- Removed shared layers in ``mlp_extractor`` (@AlexPasqua)

`RL Zoo`_
^^^^^^^^^
- `Open RL Benchmark <https://github.com/openrlbenchmark/openrlbenchmark/issues/7>`_
- Upgraded to new `HerReplayBuffer` implementation that supports multiple envs
- Removed `TimeFeatureWrapper` for Panda and Fetch envs, as the new replay buffer should handle timeout.
- Tuned hyperparameters for RecurrentPPO on Swimmer
- Documentation is now built using Sphinx and hosted on read the doc
- Removed `use_auth_token` for push to hub util
- Reverted from v3 to v2 for HumanoidStandup, Reacher, InvertedPendulum and InvertedDoublePendulum since they were not part of the mujoco refactoring (see https://github.com/openai/gym/pull/1304)
- Fixed `gym-minigrid` policy (from `MlpPolicy` to `MultiInputPolicy`)
- Replaced deprecated `optuna.suggest_loguniform(...)` by `optuna.suggest_float(..., log=True)`
- Switched to `ruff` and `pyproject.toml`
- Removed `online_sampling` and `max_episode_length` argument when using `HerReplayBuffer`

Bug Fixes:
^^^^^^^^^^
- Fixed Atari wrapper that missed the reset condition (@luizapozzobon)
- Added the argument ``dtype`` (default to ``float32``) to the noise for consistency with gym action (@sidney-tio)
- Fixed PPO train/n_updates metric not accounting for early stopping (@adamfrly)
- Fixed loading of normalized image-based environments
- Fixed `DictRolloutBuffer.add` with multidimensional action space (@younik)
- Fixed ``DictRolloutBuffer.add`` with multidimensional action space (@younik)

Deprecations:
^^^^^^^^^^^^^
Expand Down
4 changes: 2 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@

extra_packages = extra_no_roms + [ # noqa: RUF005
# For atari roms,
"autorom[accept-rom-license]~=0.5.5",
"autorom[accept-rom-license]~=0.6.0",
]


Expand Down Expand Up @@ -138,7 +138,7 @@
# For spelling
"sphinxcontrib.spelling",
# Type hints support
"sphinx-autodoc-typehints==1.21.1", # TODO: remove version constraint, see #1290
"sphinx-autodoc-typehints",
# Copy button for code snippets
"sphinx_copybutton",
],
Expand Down
1 change: 0 additions & 1 deletion stable_baselines3/common/distributions.py
Original file line number Diff line number Diff line change
Expand Up @@ -617,7 +617,6 @@ class TanhBijector:
"""
Bijective transformation of a probability distribution
using a squashing function (tanh)
TODO: use Pyro instead (https://pyro.ai/)

:param epsilon: small value to avoid NaN due to numerical imprecision.
"""
Expand Down
5 changes: 0 additions & 5 deletions stable_baselines3/common/policies.py
Original file line number Diff line number Diff line change
Expand Up @@ -337,11 +337,6 @@ def predict(
:return: the model's action and the next hidden state
(used in recurrent policies)
"""
# TODO (GH/1): add support for RNN policies
# if state is None:
# state = self.initial_state
# if episode_start is None:
# episode_start = [False for _ in range(self.n_envs)]
# Switch to eval mode (this affects batch norm / dropout)
self.set_training_mode(False)

Expand Down
2 changes: 1 addition & 1 deletion stable_baselines3/version.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
1.8.0a14
1.8.0
2 changes: 1 addition & 1 deletion tests/test_her.py
Original file line number Diff line number Diff line change
Expand Up @@ -260,7 +260,7 @@ def env_fn():
del model.replay_buffer

with pytest.raises(AttributeError):
model.replay_buffer
model.replay_buffer # noqa: B018

# Check that there is no warning
assert len(recwarn) == 0
Expand Down