Add Gymnasium support (#1327)

* Fix failing set_env test * Fix test failiing due to deprectation of env.seed * Adjust mean reward threshold in failing test * Fix her test failing due to rng * Change seed and revert reward threshold to 90 * Pin gym version * Make VecEnv compatible with gym seeding change * Revert change to VecEnv reset signature * Change subprocenv seed cmd to call reset instead * Fix type check * Add backward compat * Add `compat_gym_seed` helper * Add goal env checks in env_checker * Add docs on HER requirements for envs * Capture user warning in test with inverted box space * Update ale-py version * Fix randint * Allow noop_max to be zero * Update changelog * Update docker image * Update doc conda env and dockerfile * Custom envs should not have any warnings * Fix test for numpy >= 1.21 * Add check for vectorized compute reward * Bump to gym 0.24 * Fix gym default step docstring * Test downgrading gym * Revert "Test downgrading gym" This reverts commit 0072b77. * Fix protobuf error * Fix in dependencies * Fix protobuf dep * Use newest version of cartpole * Update gym * Fix warning * Loosen required scipy version * Scipy no longer needed * Try gym 0.25 * Silence warnings from gym * Filter warnings during tests * Update doc * Update requirements * Add gym 26 compat in vec env * Fixes in envs and tests for gym 0.26+ * Enforce gym 0.26 api * format * Fix formatting * Fix dependencies * Fix syntax * Cleanup doc and warnings * Faster tests * Higher budget for HER perf test (revert prev change) * Fixes and update doc * Fix doc build * Fix breaking change * Fixes for rendering * Rename variables in monitor * update render method for gym 0.26 API backwards compatible (mode argument is allowed) while using the gym 0.26 API (render mode is determined at environment creation) * update tests and docs to new gym render API * undo removal of render modes metatadata check * set rgb_array as default render mode for gym.make * undo changes & raise warning if not 'rgb_array' * Fix type check * Remove recursion and fix type checking * Remove hacks for protobuf and gym 0.24 * Fix type annotations * reuse existing render_mode attribute * return tiled images for 'human' render mode * Allow to use opencv for human render, fix typos * Add warning when using non-zero start with Discrete (fixes #1197) * Fix type checking * Bug fixes and handle more cases * Throw proper warnings * Update test * Fix new metadata name * Ignore numpy warnings * Fixes in vec recorder * Global ignore * Filter local warning too * Monkey patch not needed for gym 26 * Add doc of VecEnv vs Gym API * Add render test * Fix return type * Update VecEnv vs Gym API doc * Fix for custom render mode * Fix return type * Fix type checking * check test env test_buffer * skip render check * check env test_dict_env * test_env test_gae * check envs in remaining tests * Update tests * Add warning for Discrete action space with non-zero (#1295) * Fix atari annotation * ignore get_action_meanings [attr-defined] * Fix mypy issues * Add patch for gym/gymnasium transition * Switch to gymnasium * Rely on signature instead of version * More patches * Type ignore because of Farama-Foundation/Gymnasium#39 * Fix doc build * Fix pytype errors * Fix atari requirement * Update env checker due to change in dtype for Discrete * Fix type hint * Convert spaces for saved models * Ignore pytype * Remove gitlab CI * Disable pytype for convert space * Fix undefined info * Fix undefined info * Upgrade shimmy * Fix wrappers type annotation (need PR from Gymnasium) * Fix gymnasium dependency * Fix dependency declaration * Cap pygame version for python 3.7 * Point to master branch (v0.28.0) * Fix: use main not master branch * Rename done to terminated * Fix pygame dependency for python 3.7 * Rename gym to gymnasium * Update Gymnasium * Fix test * Fix tests * Forks don't have access to private variables * Fix linter warnings * Update read the doc env * Fix env checker for GoalEnv * Fix import * Update env checker (more info) and fix dtype * Use micromamab for Docker * Update dependencies * Clarify VecEnv doc * Fix Gymnasium version * Copy file only after mamba install * [ci skip] Update docker doc * Polish code * Reformat * Remove deprecated features * Ignore warning * Update doc * Update examples and changelog * Fix type annotation bundle (SAC, TD3, A2C, PPO, base class) (#1436) * Fix SAC type hints, improve DQN ones * Fix A2C and TD3 type hints * Fix PPO type hints * Fix on-policy type hints * Fix base class type annotation, do not use defaults * Update version * Disable mypy for python 3.7 * Rename Gym26StepReturn * Update continuous critic type annotation * Fix pytype complain --------- Co-authored-by: Carlos Luis <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Thomas Lips <[email protected]> Co-authored-by: tlips <[email protected]> Co-authored-by: tlpss <[email protected]> Co-authored-by: Quentin GALLOUÉDEC <[email protected]>
DLR-RM · Apr 14, 2023 · 40e0b9d · 40e0b9d
1 parent 15c9daa
commit 40e0b9d
Show file tree

Hide file tree

Showing 94 changed files with 1,333 additions and 733 deletions.
diff --git a/.github/ISSUE_TEMPLATE/custom_env.yml b/.github/ISSUE_TEMPLATE/custom_env.yml
@@ -49,15 +49,16 @@ body:
                 self.observation_space = spaces.Box(low=-np.inf, high=np.inf, shape=(14,))
                 self.action_space = spaces.Box(low=-1, high=1, shape=(6,))
 
-            def reset(self):
-                return self.observation_space.sample()
+            def reset(self, seed=None):
+                return self.observation_space.sample(), {}
 
             def step(self, action):
                 obs = self.observation_space.sample()
                 reward = 1.0
-                done = False
+                terminated = False
+                truncated = False
                 info = {}
-                return obs, reward, done, info
+                return obs, reward, terminated, truncated, info
 
         env = CustomEnv()
         check_env(env)

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -55,6 +55,8 @@ jobs:
     - name: Type check
       run: |
         make type
+      # skip mypy type check for python3.7 (result is different to all other versions)
+      if: "!(matrix.python-version == '3.7')"
     - name: Test with pytest
       run: |
         make pytest
diff --git a/Dockerfile b/Dockerfile
@@ -1,41 +1,25 @@
 ARG PARENT_IMAGE
 FROM $PARENT_IMAGE
 ARG PYTORCH_DEPS=cpuonly
-ARG PYTHON_VERSION=3.7
+ARG PYTHON_VERSION=3.8
+ARG MAMBA_DOCKERFILE_ACTIVATE=1  # (otherwise python will not be found)
 
-RUN apt-get update && apt-get install -y --no-install-recommends \
-         build-essential \
-         cmake \
-         git \
-         curl \
-         ca-certificates \
-         libjpeg-dev \
-         libpng-dev \
-         libglib2.0-0 && \
-     rm -rf /var/lib/apt/lists/*
+# Install micromamba env and dependencies
+RUN micromamba install -n base -y python=$PYTHON_VERSION \
+    pytorch $PYTORCH_DEPS -c conda-forge -c pytorch -c nvidia && \
+    micromamba clean --all --yes
 
-# Install Anaconda and dependencies
-RUN curl -o ~/miniconda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
-     chmod +x ~/miniconda.sh && \
-     ~/miniconda.sh -b -p /opt/conda && \
-     rm ~/miniconda.sh && \
-     /opt/conda/bin/conda install -y python=$PYTHON_VERSION numpy pyyaml scipy ipython mkl mkl-include && \
-     /opt/conda/bin/conda install -y pytorch $PYTORCH_DEPS -c pytorch && \
-     /opt/conda/bin/conda clean -ya
-ENV PATH /opt/conda/bin:$PATH
-
-ENV CODE_DIR /root/code
+ENV CODE_DIR /home/$MAMBA_USER
 
 # Copy setup file only to install dependencies
-COPY ./setup.py ${CODE_DIR}/stable-baselines3/setup.py
-COPY ./stable_baselines3/version.txt ${CODE_DIR}/stable-baselines3/stable_baselines3/version.txt
+COPY --chown=$MAMBA_USER:$MAMBA_USER ./setup.py ${CODE_DIR}/stable-baselines3/setup.py
+COPY --chown=$MAMBA_USER:$MAMBA_USER ./stable_baselines3/version.txt ${CODE_DIR}/stable-baselines3/stable_baselines3/version.txt
 
-RUN \
-    cd ${CODE_DIR}/stable-baselines3 3&& \
+RUN cd ${CODE_DIR}/stable-baselines3 && \
     pip install -e .[extra,tests,docs] && \
     # Use headless version for docker
     pip uninstall -y opencv-python && \
     pip install opencv-python-headless && \
-    rm -rf $HOME/.cache/pip
+    pip cache purge
 
 CMD /bin/bash
diff --git a/Makefile b/Makefile
@@ -10,6 +10,12 @@ pytype:
 mypy:
 	mypy ${LINT_PATHS}
 
+missing-annotations:
+	mypy --disallow-untyped-calls --disallow-untyped-defs --ignore-missing-imports stable_baselines3
+
+# missing docstrings
+# pylint -d R,C,W,E -e C0116 stable_baselines3 -j 4
+
 type: pytype mypy
 
 lint:

diff --git a/docs/conda_env.yml b/docs/conda_env.yml
@@ -4,11 +4,11 @@ channels:
   - defaults
 dependencies:
   - cpuonly=1.0=0
-  - pip=21.1
+  - pip=22.3.1
   - python=3.7
-  - pytorch=1.11=py3.7_cpu_0
+  - pytorch=1.11.0=py3.7_cpu_0
   - pip:
-    - gym==0.21
+    - gymnasium
     - cloudpickle
     - opencv-python-headless
     - pandas

diff --git a/docs/guide/callbacks.rst b/docs/guide/callbacks.rst
@@ -210,7 +210,7 @@ It will save the best model if ``best_model_save_path`` folder is specified and
 
 .. code-block:: python
 
-    import gym
+    import gymnasium as gym
 
     from stable_baselines3 import SAC
     from stable_baselines3.common.callbacks import EvalCallback
@@ -260,7 +260,7 @@ Alternatively, you can pass directly a list of callbacks to the ``learn()`` meth
 
 .. code-block:: python
 
-    import gym
+    import gymnasium as gym
 
     from stable_baselines3 import SAC
     from stable_baselines3.common.callbacks import CallbackList, CheckpointCallback, EvalCallback
@@ -290,7 +290,7 @@ It must be used with the :ref:`EvalCallback` and use the event triggered by a ne
 
 .. code-block:: python
 
-    import gym
+    import gymnasium as gym
 
     from stable_baselines3 import SAC
     from stable_baselines3.common.callbacks import EvalCallback, StopTrainingOnRewardThreshold
@@ -322,7 +322,7 @@ An :ref:`EventCallback` that will trigger its child callback every ``n_steps`` t
 
 .. code-block:: python
 
-  import gym
+  import gymnasium as gym
 
   from stable_baselines3 import PPO
   from stable_baselines3.common.callbacks import CheckpointCallback, EveryNTimesteps
@@ -379,7 +379,7 @@ It must be used with the :ref:`EvalCallback` and use the event triggered after e
 
 .. code-block:: python
 
-    import gym
+    import gymnasium as gym
 
     from stable_baselines3 import SAC
     from stable_baselines3.common.callbacks import EvalCallback, StopTrainingOnNoModelImprovement

diff --git a/docs/guide/checking_nan.rst b/docs/guide/checking_nan.rst
@@ -100,8 +100,8 @@ It will monitor the actions, observations, and rewards, indicating what action o
 
 .. code-block:: python
 
-  import gym
-  from gym import spaces
+  import gymnasium as gym
+  from gymnasium import spaces
   import numpy as np
 
   from stable_baselines3 import PPO
@@ -129,7 +129,7 @@ It will monitor the actions, observations, and rewards, indicating what action o
       def reset(self):
           return [0.0]
 
-      def render(self, mode="human", close=False):
+      def render(self, close=False):
           pass
 
   # Create environment

diff --git a/docs/guide/custom_env.rst b/docs/guide/custom_env.rst
@@ -26,9 +26,9 @@ That is to say, your environment must implement the following methods (and inher
 
 .. code-block:: python
 
-  import gym
+  import gymnasium as gym
   import numpy as np
-  from gym import spaces
+  from gymnasium import spaces
 
 
   class CustomEnv(gym.Env):
@@ -54,7 +54,7 @@ That is to say, your environment must implement the following methods (and inher
           ...
           return observation  # reward, done, info can't be included
 
-      def render(self, mode="human"):
+      def render(self):
           ...
 
       def close(self):
@@ -91,7 +91,7 @@ Optionally, you can also register the environment with gym, that will allow you
 
 .. code-block:: python
 
-	from gym.envs.registration import register
+	from gymnasium.envs.registration import register
 	# Example for the CartPole environment
 	register(
 	    # unique identifier for the env `name-version`

diff --git a/docs/guide/custom_policy.rst b/docs/guide/custom_policy.rst
@@ -101,7 +101,7 @@ using ``policy_kwargs`` parameter:
 
 .. code-block:: python
 
-  import gym
+  import gymnasium as gym
   import torch as th
 
   from stable_baselines3 import PPO
@@ -143,7 +143,7 @@ that derives from ``BaseFeaturesExtractor`` and then pass it to the model when t
 
   import torch as th
   import torch.nn as nn
-  from gym import spaces
+  from gymnasium import spaces
 
   from stable_baselines3 import PPO
   from stable_baselines3.common.torch_layers import BaseFeaturesExtractor
@@ -208,7 +208,7 @@ downsampling and "vector" with a single linear layer.
 
 .. code-block:: python
 
-  import gym
+  import gymnasium as gym
   import torch as th
   from torch import nn
 
@@ -308,7 +308,7 @@ If your task requires even more granular control over the policy/value architect
 
   from typing import Callable, Dict, List, Optional, Tuple, Type, Union
 
-  from gym import spaces
+  from gymnasium import spaces
   import torch as th
   from torch import nn