Skip to content

Commit

Permalink
docs: fix documentation (#377)
Browse files Browse the repository at this point in the history
This commit fixes several incorrect run commands found in the
documentation.
  • Loading branch information
rickstaa authored Jan 19, 2024
1 parent 0e8c661 commit c3b47c5
Show file tree
Hide file tree
Showing 8 changed files with 37 additions and 24 deletions.
6 changes: 3 additions & 3 deletions docs/source/dev/contributing.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
========================
Contribute to stable-gym
========================
=====================================
Contribute to stable-learning-control
=====================================

.. contents:: Table of Contents

Expand Down
7 changes: 2 additions & 5 deletions docs/source/usage/algorithms.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,11 @@
Available Agents
================

The SLC package contains several stable RL algorithms together with their unstable baselines.
All these algorithms are implemented with `MLP`_ (non-recurrent) actor-critics, making them
suitable for fully-observed, non-image-based RL environments, e.g., the `gymnasium Mujoco`_
environments. They are implemented in a modular way, allowing for easy extension to other
types of environments and/or neural network architectures.
The SLC package includes a collection of robust RL algorithms accompanied by their less stable baselines. These algorithms are designed with non-recurrent `MLP`_ actor-critic models, making them well-suited for fully observable RL environments that do not rely on image data, such as the `gymnasium Mujoco`_ and `stable-gym`_ environments. The implementation follows a modular approach, allowing for seamless adaptation to different types of environments and neural network architectures.

.. _`MLP`: https://en.wikipedia.org/wiki/Multilayer_perceptron
.. _`gymnasium Mujoco`: https://gymnasium.farama.org/environments/mujoco/
.. _`stable-gym`: https://rickstaa.dev/stable-gym/

Stable Agents
-------------
Expand Down
6 changes: 3 additions & 3 deletions docs/source/usage/hyperparameter_tuning.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,14 +15,14 @@ You can utilize this utility in two ways: by supplying the :ref:`CLI <runner>` w
(refer to :ref:`Running Experiments <running_experiments>`), or by directly employing the
:class:`~stable_learning_control.utils.run_utils.ExperimentGrid` class (see :ref:`running_multiple_experiments`). These
methods facilitate running numerous experiments with distinct hyperparameter combinations, enabling a hyperparameter grid search
to identify the optimal parameter setting for your task. For instance, to execute the LAC algorithm on the `CartPoleCost-v0`_
to identify the optimal parameter setting for your task. For instance, to execute the LAC algorithm on the `CartPoleCost-v1`_
environment with various values for actor and critic learning rates using the :ref:`CLI <runner>`, employ the following command:

.. code-block:: bash
python -m stable_learning_control.examples.pytorch.run lac --env CartPoleCost-v0 --lr_a 0.001 0.01 0.1 --lr_c 0.001 0.01 0.1
python -m stable_learning_control.run lac --env CartPoleCost-v1 --lr_a 0.001 0.01 0.1 --lr_c 0.001 0.01 0.1
.. _`CartPoleCost-v0`: https://rickstaa.dev/stable-gym/envs/classic_control/cartpole_cost.html
.. _`CartPoleCost-v1`: https://rickstaa.dev/stable-gym/envs/classic_control/cartpole_cost.html

.. tip::
You can enable logging of TensorBoard and Weights & Biases by adding the ``--use_tensorboard`` and ``--use_wandb`` flags to the
Expand Down
2 changes: 1 addition & 1 deletion docs/source/usage/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -185,7 +185,7 @@ the :ref:`LAC <lac>` algorithm on the `CartPoleCost-v1`_ environment of the

.. code-block:: bash
python -m stable_learning_control.run lac --env_name stable_gym:CartPole-v0
python -m stable_learning_control.run lac --env_name stable_gym:CartPole-v1
.. _`Han et al. 2020`: https://arxiv.org/abs/2004.14288
.. _`CartPoleCost-v1`: https://rickstaa.dev/stable-gym/envs/classic_control/cartpole_cost.html
33 changes: 23 additions & 10 deletions docs/source/usage/running.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,19 @@ or through function calls in scripts.
Launching from the Command Line
===============================

.. important::

**Important Note:** To run the examples in this section, you need to install the `Gymnasium Mujoco environments`_ package, including all its necessary dependencies. To do so, execute the following command:

.. code-block:: bash
pip install stable-learning-control[mujoco]
For more detailed information about the `Gymnasium Mujoco environments`_ package, please consult the documentation available `here <here_mujoco_>`_.

.. _`Gymnasium Mujoco environments`: https://gymnasium.farama.org/environments/mujoco/
.. _`here_mujoco`: https://gymnasium.farama.org/environments/mujoco/

SLC ships with a convenient :ref:`command line interface (CLI) <runner>` that lets you
quickly launch any algorithm (with any choices of hyperparameters) from the command line.
It also serves as a thin wrapper over the utilities for watching/evaluating the trained
Expand All @@ -31,7 +44,7 @@ eg:

.. parsed-literal::
python -m stable_learning_control.run sac --env Walker2d-v2 --exp_name walker
python -m stable_learning_control.run sac --env Walker2d-v4 --exp_name walker
.. admonition:: You Should Know

Expand All @@ -46,11 +59,11 @@ eg:

.. parsed-literal::
python -m stable_learning_control.run sac --exp_name sac_ant --env Ant-v2 --clip_ratio 0.1 0.2
python -m stable_learning_control.run sac --exp_name sac_ant --env Ant-v4 --clip_ratio 0.1 0.2
--hid[h] [32,32] [64,32] --act torch.nn.Tanh --seed 0 10 20 --dt
--data_dir path/to/data
runs SAC in the ``Ant-v2`` gymnasium environment, with various settings controlled by the flags.
runs SAC in the ``Ant-v4`` gymnasium environment, with various settings controlled by the flags.

By default, the PyTorch version will run. You can, however, substitute ``sac`` with
``sac_tf2`` for the TensorFlow version.
Expand Down Expand Up @@ -133,7 +146,7 @@ to see a readout of the docstring.

.. parsed-literal::
python -m stable_learning_control.run SAC --env Walker2d-v2 --exp_name walker --act torch.nn.ReLU
python -m stable_learning_control.run SAC --env Walker2d-v4 --exp_name walker --act torch.nn.ReLU
sets ``torch.nn.ReLU`` as the activation function. (TensorFlow equivalent: run ``sac_tf`` with ``--act tf.nn.relu``.)

Expand Down Expand Up @@ -166,7 +179,7 @@ For example, to launch otherwise-equivalent runs with different random seeds (0,

.. parsed-literal::
python -m stable_learning_control.run sac --env Walker2d-v2 --exp_name walker --seed 0 10 20
python -m stable_learning_control.run sac --env Walker2d-v4 --exp_name walker --seed 0 10 20
Experiments don't launch in parallel because they soak up enough resources that executing several
simultaneously wouldn't get a speedup.
Expand Down Expand Up @@ -196,10 +209,10 @@ Environment Flags

:obj:`object`. Additional keyword arguments you want to pass to the gym environment. If
you, for example, want to change the forward reward weight and healthy reward of the
`Walker2d-v2`_ environment, you can do so by passing ``--env_kwargs "{'forward_reward_weight': 0.5, 'healthy_reward': 0.5}"``
`Walker2d-v4`_ environment, you can do so by passing ``--env_kwargs "{'forward_reward_weight': 0.5, 'healthy_reward': 0.5}"``
to the run command.

.. _`Walker2d-v2`: https://mgoulao.github.io/gym-docs/environments/mujoco/walker2d/
.. _`Walker2d-v4`: https://gymnasium.farama.org/environments/mujoco/walker2d/

.. _alg_flags:

Expand Down Expand Up @@ -411,7 +424,7 @@ For example, consider:

.. parsed-literal::
python -m stable_learning_control.run sac_tf --env Hopper-v2 --hid[h] [300] [128,128] --act tf.nn.tanh tf.nn.relu
python -m stable_learning_control.run sac_tf --env Hopper-v4 --hid[h] [300] [128,128] --act tf.nn.tanh tf.nn.relu
Here, the ``--hid`` flag is given a **user-supplied shorthand**, ``h``. The user does not provide the ``--act``
flag with a shorthand, so one will be constructed for it automatically.
Expand Down Expand Up @@ -470,7 +483,7 @@ can be done by adding the following lines to your environment file:
from gymnasium.envs.registration import register
register(
id='CustomEnv-v0',
id='CustomEnv-v1',
entry_point='path.to.your.env:CustomEnv',
)
Expand All @@ -480,7 +493,7 @@ the file ``custom_env_module.py``, you can run the SLC package with your environ

.. parsed-literal::
python -m stable_learning_control.run sac --env custom_env_module:CustomEnv-v0
python -m stable_learning_control.run sac --env custom_env_module:CustomEnv-v1
Launching from Scripts
======================
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
alg_name: sac
exp_name: sac_hopper_haarnoja_2019_exp
env_name: "Hopper-v2"
env_name: "Hopper-v4"
opt_type: "maximize"
ac_kwargs:
hidden_sizes:
Expand Down
3 changes: 3 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,9 @@ docs = [
"myst-parser>=1.0.0",
"sphinx-autoapi>=2.1.1"
]
mujoco = [
"gymnasium[mujoco]>=0.29.1",
]

[project.urls]
repository = "https://github.com/rickstaa/stable-learning-control"
Expand Down
2 changes: 1 addition & 1 deletion stable_learning_control/run.py
Original file line number Diff line number Diff line change
Expand Up @@ -574,7 +574,7 @@ def run(input_args):
FYI: When running an algorithm, any keyword argument to the
algorithm function can be used as a flag, eg
\tpython -m stable_learning_control.run sac --env HalfCheetah-v2 --clip_ratio 0.1
\tpython -m stable_learning_control.run sac --env HalfCheetah-v4 --clip_ratio 0.1
If you need a quick refresher on valid kwargs, get the docstring
with
Expand Down

0 comments on commit c3b47c5

Please sign in to comment.