docs: fix documentation (#377)

This commit fixes several incorrect run commands found in the documentation.
rickstaa · Jan 19, 2024 · c3b47c5 · c3b47c5
1 parent 0e8c661
commit c3b47c5
Show file tree

Hide file tree

Showing 8 changed files with 37 additions and 24 deletions.
diff --git a/docs/source/dev/contributing.rst b/docs/source/dev/contributing.rst
@@ -1,6 +1,6 @@
-========================
-Contribute to stable-gym
-========================
+=====================================
+Contribute to stable-learning-control
+=====================================
 
 .. contents:: Table of Contents
 

diff --git a/docs/source/usage/algorithms.rst b/docs/source/usage/algorithms.rst
@@ -4,14 +4,11 @@
 Available Agents
 ================
 
-The SLC package contains several stable RL algorithms together with their unstable baselines.
-All these algorithms are implemented with `MLP`_ (non-recurrent) actor-critics, making them
-suitable for fully-observed, non-image-based RL environments, e.g., the `gymnasium Mujoco`_
-environments. They are implemented in a modular way, allowing for easy extension to other
-types of environments and/or neural network architectures.
+The SLC package includes a collection of robust RL algorithms accompanied by their less stable baselines. These algorithms are designed with non-recurrent `MLP`_ actor-critic models, making them well-suited for fully observable RL environments that do not rely on image data, such as the `gymnasium Mujoco`_ and `stable-gym`_ environments. The implementation follows a modular approach, allowing for seamless adaptation to different types of environments and neural network architectures.
 
 .. _`MLP`: https://en.wikipedia.org/wiki/Multilayer_perceptron
 .. _`gymnasium Mujoco`: https://gymnasium.farama.org/environments/mujoco/
+.. _`stable-gym`: https://rickstaa.dev/stable-gym/
 
 Stable Agents
 -------------

diff --git a/docs/source/usage/hyperparameter_tuning.rst b/docs/source/usage/hyperparameter_tuning.rst
@@ -15,14 +15,14 @@ You can utilize this utility in two ways: by supplying the :ref:`CLI <runner>` w
 (refer to :ref:`Running Experiments <running_experiments>`), or by directly employing the 
 :class:`~stable_learning_control.utils.run_utils.ExperimentGrid` class (see :ref:`running_multiple_experiments`). These 
 methods facilitate running numerous experiments with distinct hyperparameter combinations, enabling a hyperparameter grid search
-to identify the optimal parameter setting for your task. For instance, to execute the LAC algorithm on the `CartPoleCost-v0`_
+to identify the optimal parameter setting for your task. For instance, to execute the LAC algorithm on the `CartPoleCost-v1`_
 environment with various values for actor and critic learning rates using the :ref:`CLI <runner>`, employ the following command:
 
 .. code-block:: bash
 
-    python -m stable_learning_control.examples.pytorch.run lac --env CartPoleCost-v0 --lr_a 0.001 0.01 0.1 --lr_c 0.001 0.01 0.1
+    python -m stable_learning_control.run lac --env CartPoleCost-v1 --lr_a 0.001 0.01 0.1 --lr_c 0.001 0.01 0.1
 
-.. _`CartPoleCost-v0`: https://rickstaa.dev/stable-gym/envs/classic_control/cartpole_cost.html
+.. _`CartPoleCost-v1`: https://rickstaa.dev/stable-gym/envs/classic_control/cartpole_cost.html
 
 .. tip:: 
     You can enable logging of TensorBoard and Weights & Biases by adding the ``--use_tensorboard`` and ``--use_wandb`` flags to the

diff --git a/docs/source/usage/installation.rst b/docs/source/usage/installation.rst
@@ -185,7 +185,7 @@ the :ref:`LAC <lac>` algorithm on the `CartPoleCost-v1`_ environment of the
 
 .. code-block:: bash
 
-    python -m stable_learning_control.run lac --env_name stable_gym:CartPole-v0
+    python -m stable_learning_control.run lac --env_name stable_gym:CartPole-v1
 
 .. _`Han et al. 2020`: https://arxiv.org/abs/2004.14288
 .. _`CartPoleCost-v1`: https://rickstaa.dev/stable-gym/envs/classic_control/cartpole_cost.html
diff --git a/docs/source/usage/running.rst b/docs/source/usage/running.rst
@@ -14,6 +14,19 @@ or through function calls in scripts.
 Launching from the Command Line
 ===============================
 
+.. important::
+
+    **Important Note:** To run the examples in this section, you need to install the `Gymnasium Mujoco environments`_ package, including all its necessary dependencies. To do so, execute the following command:
+
+    .. code-block:: bash
+
+        pip install stable-learning-control[mujoco]
+
+    For more detailed information about the `Gymnasium Mujoco environments`_ package, please consult the documentation available `here <here_mujoco_>`_.
+
+.. _`Gymnasium Mujoco environments`: https://gymnasium.farama.org/environments/mujoco/
+.. _`here_mujoco`: https://gymnasium.farama.org/environments/mujoco/
+
 SLC ships with a convenient :ref:`command line interface (CLI) <runner>` that lets you
 quickly launch any algorithm (with any choices of hyperparameters) from the command line.
 It also serves as a thin wrapper over the utilities for watching/evaluating the trained
@@ -31,7 +44,7 @@ eg:
 
 .. parsed-literal::
 
-    python -m stable_learning_control.run sac --env Walker2d-v2 --exp_name walker
+    python -m stable_learning_control.run sac --env Walker2d-v4 --exp_name walker
 
 .. admonition:: You Should Know
 
@@ -46,11 +59,11 @@ eg:
 
     .. parsed-literal::
 
-        python -m stable_learning_control.run sac --exp_name sac_ant --env Ant-v2 --clip_ratio 0.1 0.2
+        python -m stable_learning_control.run sac --exp_name sac_ant --env Ant-v4 --clip_ratio 0.1 0.2
             --hid[h] [32,32] [64,32] --act torch.nn.Tanh --seed 0 10 20 --dt
             --data_dir path/to/data
 
-    runs SAC in the ``Ant-v2`` gymnasium environment, with various settings controlled by the flags.
+    runs SAC in the ``Ant-v4`` gymnasium environment, with various settings controlled by the flags.
 
     By default, the PyTorch version will run. You can, however, substitute ``sac`` with
     ``sac_tf2`` for the TensorFlow version.
@@ -133,7 +146,7 @@ to see a readout of the docstring.
 
     .. parsed-literal::
 
-        python -m stable_learning_control.run SAC --env Walker2d-v2 --exp_name walker --act torch.nn.ReLU
+        python -m stable_learning_control.run SAC --env Walker2d-v4 --exp_name walker --act torch.nn.ReLU
 
     sets ``torch.nn.ReLU`` as the activation function. (TensorFlow equivalent: run ``sac_tf`` with ``--act tf.nn.relu``.)
 
@@ -166,7 +179,7 @@ For example, to launch otherwise-equivalent runs with different random seeds (0,
 
 .. parsed-literal::
 
-    python -m stable_learning_control.run sac --env Walker2d-v2 --exp_name walker --seed 0 10 20
+    python -m stable_learning_control.run sac --env Walker2d-v4 --exp_name walker --seed 0 10 20
 
 Experiments don't launch in parallel because they soak up enough resources that executing several
 simultaneously wouldn't get a speedup.
@@ -196,10 +209,10 @@ Environment Flags
 
     :obj:`object`. Additional keyword arguments you want to pass to the gym environment. If 
     you, for example, want to change the forward reward weight and healthy reward of the
-    `Walker2d-v2`_ environment, you can do so by passing ``--env_kwargs "{'forward_reward_weight': 0.5, 'healthy_reward': 0.5}"``
+    `Walker2d-v4`_ environment, you can do so by passing ``--env_kwargs "{'forward_reward_weight': 0.5, 'healthy_reward': 0.5}"``
     to the run command.
 
-.. _`Walker2d-v2`: https://mgoulao.github.io/gym-docs/environments/mujoco/walker2d/
+.. _`Walker2d-v4`: https://gymnasium.farama.org/environments/mujoco/walker2d/
 
 .. _alg_flags:
 
@@ -411,7 +424,7 @@ For example, consider:
 
 .. parsed-literal::
 
-    python -m stable_learning_control.run sac_tf --env Hopper-v2 --hid[h] [300] [128,128] --act tf.nn.tanh tf.nn.relu
+    python -m stable_learning_control.run sac_tf --env Hopper-v4 --hid[h] [300] [128,128] --act tf.nn.tanh tf.nn.relu
 
 Here, the ``--hid`` flag is given a **user-supplied shorthand**, ``h``. The user does not provide the ``--act``
 flag with a shorthand, so one will be constructed for it automatically.
@@ -470,7 +483,7 @@ can be done by adding the following lines to your environment file:
     from gymnasium.envs.registration import register
 
     register(
-        id='CustomEnv-v0',
+        id='CustomEnv-v1',
         entry_point='path.to.your.env:CustomEnv',
     )
 
@@ -480,7 +493,7 @@ the file ``custom_env_module.py``, you can run the SLC package with your environ
 
 .. parsed-literal::
 
-    python -m stable_learning_control.run sac --env custom_env_module:CustomEnv-v0
+    python -m stable_learning_control.run sac --env custom_env_module:CustomEnv-v1
 
 Launching from Scripts
 ======================

diff --git a/experiments/haarnoja_et_al_2019/haarnoja_et_al_2019_hopper.yml b/experiments/haarnoja_et_al_2019/haarnoja_et_al_2019_hopper.yml
@@ -1,6 +1,6 @@
 alg_name: sac
 exp_name: sac_hopper_haarnoja_2019_exp
-env_name: "Hopper-v2"
+env_name: "Hopper-v4"
 opt_type: "maximize"
 ac_kwargs:
   hidden_sizes:

diff --git a/pyproject.toml b/pyproject.toml
@@ -80,6 +80,9 @@ docs = [
     "myst-parser>=1.0.0",
     "sphinx-autoapi>=2.1.1"
 ]
+mujoco = [
+    "gymnasium[mujoco]>=0.29.1",
+]
 
 [project.urls]
 repository = "https://github.com/rickstaa/stable-learning-control"

diff --git a/stable_learning_control/run.py b/stable_learning_control/run.py
@@ -574,7 +574,7 @@ def run(input_args):
             FYI: When running an algorithm, any keyword argument to the
             algorithm function can be used as a flag, eg
 
-            \tpython -m stable_learning_control.run sac --env HalfCheetah-v2 --clip_ratio 0.1
+            \tpython -m stable_learning_control.run sac --env HalfCheetah-v4 --clip_ratio 0.1
 
             If you need a quick refresher on valid kwargs, get the docstring
             with