feat: improve hyperparmeter tuning, logger and add W&B logging (#314)

This commit: - Adds [Weights & Biases](https://wandb.ai/site) logging to the train and eval utilities. - Cleans up the logger. - Cleans up the Hyperparameter pipeline and documentation. - Cleans up the code.
rickstaa · Aug 8, 2023 · 74afd65 · 74afd65
1 parent 6788535
commit 74afd65
Show file tree

Hide file tree

Showing 48 changed files with 1,959 additions and 1,177 deletions.
diff --git a/.gitignore b/.gitignore
@@ -3,6 +3,7 @@ pytest/
 *.pytest_cache/
 .coverage
 pytest-coverage.txt
+wandb/
 
 # cache
 __pycache__/

diff --git a/README.md b/README.md
@@ -6,6 +6,7 @@
 [![codecov](https://codecov.io/gh/rickstaa/stable-learning-control/branch/main/graph/badge.svg?token=4SAME74CJ7)](https://codecov.io/gh/rickstaa/stable-learning-control)
 [![Contributions](https://img.shields.io/badge/contributions-welcome-brightgreen.svg)](CONTRIBUTING.md)
 [![DOI](https://zenodo.org/badge/271989240.svg)](https://zenodo.org/badge/latestdoi/271989240)
+[![Weights & Biases dashboard](https://img.shields.io/badge/Weights_&_Biases-FFCC33?style=flat\&logo=WeightsAndBiases\&logoColor=black)](https://wandb.ai/rickstaa/stable-learning-control)
 
 ## Package Overview
 

diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -159,6 +159,7 @@ def __getattr__(cls, name):
     "gymnasium": ("https://gymnasium.farama.org/%s", None),
     "tf2": ("https://www.tensorflow.org/api_docs/python/tf/%s", None),
     "tb": ("https://www.tensorflow.org/tensorboard/%s", None),
+    "wandb": ("https://docs.wandb.ai/%s", None),
 }
 
 

diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -37,6 +37,7 @@ ready-to-use compatible environments can be found in the  :stable-gym:`stable-gy
    usage/installation
    usage/algorithms
    usage/running
+   usage/hyperparameter_tuning
    usage/saving_and_loading
    usage/plotting
    usage/eval_robustness

diff --git a/docs/source/usage/eval_robustness.rst b/docs/source/usage/eval_robustness.rst
@@ -8,13 +8,11 @@ SLC ships with a handy utility for evaluating the policy's robustness. This is d
 
 .. parsed-literal::
 
-    python -m stable_learning_control.run eval_robustness [path/to/output_directory] [disturber] [-h] [--list_disturbers]
-        [--disturber_config DISTURBER_CONFIG] [--data_dir DATA_DIR] [--itr ITR]
-        [--len LEN] [--episodes EPISODES] [--render] [--deterministic] [--disable_baseline]
-        [--observations [OBSERVATIONS [OBSERVATIONS ...]]] [--references [REFERENCES [REFERENCES ...]]]
-        [--reference_errors [REFERENCE_ERRORS [REFERENCE_ERRORS ...]]] [--absolute_reference_errors]
-        [--merge_reference_errors] [--use_subplots] [--use_time] [--save_result] [--save_plots]
-        [--figs_fmt FIGS_FMT] [--font_scale FONT_SCALE]
+    python -m stable_learning_control.run eval_robustness [path/to/output_directory] [disturber] [-h] [--list_disturbers] [--disturber_config DISTURBER_CONFIG] [--data_dir DATA_DIR] [--itr ITR] [--len LEN] [--episodes EPISODES] [--render] [--deterministic]
+        [--disable_baseline] [--observations [OBSERVATIONS [OBSERVATIONS ...]]] [--references [REFERENCES [REFERENCES ...]]]
+        [--reference_errors [REFERENCE_ERRORS [REFERENCE_ERRORS ...]]] [--absolute_reference_errors] [--merge_reference_errors] [--use_subplots] [--use_time] [--save_result]
+        [--save_plots] [--figs_fmt FIGS_FMT] [--font_scale FONT_SCALE] [--use_wandb] [--wandb_job_type WANDB_JOB_TYPE] [--wandb_project WANDB_PROJECT] [--wandb_group WANDB_GROUP]
+        [--wandb_run_name WANDB_RUN_NAME]
 
 The most important input arguments are:
 
@@ -37,6 +35,18 @@ The most important input arguments are:
     For more information about all the input arguments available for the ``eval_robustness`` tool you can use the ``--help`` flag or check the :ref:`robustness evaluation utility <eval_robustness>`
     documentation or :ref:`the API reference <autoapi>`.
 
+Robustness eval configuration file (yaml)
+-----------------------------------------
+
+The SLC CLI comes with a handy configuration file loader that can be used to load `YAML`_ configuration files.
+These configuration files provide a convenient way to store your robustness evaluation parameters such that results
+can be reproduced. You can supply the CLI with an experiment configuration file using the ``--eval_cfg`` flag. The
+configuration file format equals the format expected by the :ref:`--exp_cfg <exp_cfg>` flag of the :ref:`run experiments <running_experiments>` utility.
+
+.. option:: --eval_cfg
+
+    :obj:`path str`. Sets the path to the ``yml`` config file used for loading experiment hyperparameter.
+
 Available disturbers
 ====================
 
@@ -80,6 +90,11 @@ The robustness evaluation tool can save several files to disk that contain infor
 
 These files will be saved inside the ``eval`` directory inside the output directory.
 
+.. tip:: 
+
+    You can also log these results to Weights & Biases by adding the and ``--use_wandb`` flag to the
+    CLI command (see :ref:`eval_robustness` for more information).
+
 Plots
 -----
 
@@ -157,6 +172,15 @@ by specifying the module containing your disturber and the disturber class name.
 
     python -m stable_learning_control.run eval_robustness [path/to/output_directory] --disturber "my_module.MyDisturber"
 
+Special attributes
+------------------
+
+The SLC package looks for several attributes in the disturber class to get information about the disturber that can be used during the robustness evaluation. These attributes are:
+
+.. describe:: disturbance_label
+
+    :obj:`str`. Can be used to set the label of the disturber in the plots. If not present the :ref:`robustness evaluation utility <eval_robustness>` will generate a label based on the disturber configuration.
+
 Manual robustness evaluation
 ============================
 

diff --git a/docs/source/usage/hyperparameter_tuning.rst b/docs/source/usage/hyperparameter_tuning.rst
@@ -0,0 +1,72 @@
+=====================
+Hyperparameter Tuning
+=====================
+
+Hyperparameter tuning is crucial in RL as it directly impacts the agent's performance and stability. Properly selected
+hyperparameters can lead to faster convergence, improved overall task performance and generalizability. Because of this,
+the SLC package provides several tools to help with hyperparameter tuning.
+
+Use the ExperimentGrid utility
+------------------------------
+
+As outlined in the :ref:`Running Experiments <running_multiple_experiments>` section, the SLC package includes a utility 
+class called :ref:`ExperimentGrid <exp_grid_utility>`, which enables the execution of multiple experiments **sequentially**. 
+You can utilize this utility in two ways: by supplying the :ref:`CLI <runner>` with more than one value for a specific argument 
+(refer to :ref:`Running Experiments <running_experiments>`), or by directly employing the 
+:class:`~stable_learning_control.utils.run_utils.ExperimentGrid` class (see :ref:`running_multiple_experiments`). These 
+methods facilitate running numerous experiments with distinct hyperparameter combinations, enabling a hyperparameter grid search
+to identify the optimal parameter setting for your task. For instance, to execute the LAC algorithm on the `CartPoleCost-v0`_
+environment with various values for actor and critic learning rates using the :ref:`CLI <runner>`, employ the following command:
+
+.. code-block:: bash
+
+    python -m stable_learning_control.examples.pytorch.run lac --env CartPoleCost-v0 --lr_a 0.001 0.01 0.1 --lr_c 0.001 0.01 0.1
+
+.. _`CartPoleCost-v0`: https://rickstaa.dev/stable-gym/envs/classic_control/cartpole_cost.html
+
+.. tip:: 
+
+    You can enable logging of TensorBoard and Weights & Biases by adding the ``--use_tensorboard`` and ``--use_wandb`` flags to the
+    above command. These tools will allow you to track the performance of your experiments and compare the results of
+    different hyperparameter combinations. For more information on how to use these logging utilities, see :ref:`loggers`.
+
+Use the Ray tuning package
+--------------------------
+
+The SLC package can also be used with more advanced tuning libraries like `Ray Tune`_, which uses cutting-edge optimization algorithms to 
+find the best hyperparameters for your model faster. An example of how to use SLC with the Ray Tuning package can be found in
+``stable_learning_control/examples/torch/sac_ray_hyper_parameter_tuning.py`` and 
+``stable_learning_control/examples/tf2/sac_ray_hyper_parameter_tuning.py``. The requirements for this example can be
+installed using the following command:
+
+.. code-block:: bash
+
+    pip install .[tuning]
+
+Consider the example in ``stable_learning_control/examples/pytorch/sac_ray_hyper_parameter_tuning.py``:
+
+.. literalinclude:: /../../examples/pytorch/sac_ray_hyper_parameter_tuning.py
+   :language: python
+   :linenos:
+   :lines: 32-
+   :emphasize-lines: 12, 15-30, 38-49, 56, 59-66, 70-95, 98
+
+In this example, a boolean on line ``12`` can enable Weights & Biases logging. On lines ``15-30``, we first create a small wrapper
+function that ensures that the Ray Tuner serves the hyperparameters in the SLC algorithm's format. Following lines ``38-49`` setup
+a Weights & Biases callback if the ``USE_WANDB`` constant is set to ``True``. On line ``56``, we then set the starting point for
+several hyperparameters used in the hyperparameter search. Next, we define the hyperparameter search space on lines ``59-66`` 
+while we initialise the Ray Tuner instance on lines ``70-95``. Lastly, we start the hyperparameter search by calling the
+Tuners ``fit`` method on line ``98``.
+
+When running the script, the Ray tuner will search for the best hyperparameter combination. While doing so will print
+the results to the ``stdout``, a TensorBoard logging file and the Weights & Biases portal. You can check the TensorBoard logs using the
+``tensorboard --logdir ./data/ray_results`` command and the Weights & Biases results on `the Weights & Biases website`_. For more information on how the `Ray Tune`_ tuning package works, see
+the `Ray Tune documentation`_.
+
+.. _`Ray Tune`: https://docs.ray.io/en/latest/tune/index.html
+.. _`the Weights & Biases website`: https://wandb.ai
+.. _`Ray Tune documentation`: https://docs.ray.io/en/latest/tune/index.html
+
+..  note::
+
+    An equivalent TensorFlow example is available in ``stable_learning_control/examples/tf2/sac_ray_hyper_parameter_tuning.py``.
diff --git a/docs/source/usage/installation.rst b/docs/source/usage/installation.rst
@@ -117,8 +117,11 @@ This command will install the SLC package with the :torch:`Pytorch <>` backend i
 
 .. important::
 
-    If you are using Conda you might have to install the `mpi4py`_ package inside using Conda instead of pip.
-    This can be done using the following command:
+
+    If you are using Conda, you may come across issues while installing or utilizing the SLC package,
+    such as installation errors or script freezing. To effectively resolve these problems, it is
+    recommended to install the mpi4py_ package from within Conda instead of using pip. This can
+    be accomplished by executing the following command:
 
     .. code-block:: bash
 

diff --git a/docs/source/usage/plotting.rst b/docs/source/usage/plotting.rst
@@ -20,3 +20,8 @@ SLC ships with a simple plotting utility that can be used to plot diagnostics fr
     :align: center
 
     Example plot that displays the performance of the LAC algorithm.
+
+..  tip::
+
+    The SLC package also supports TensorBoard and Weights & Biases logging. See :ref:`loggers` for more information. This allows you
+    to inspect your experiments' results during training and compare the performance of different algorithms more interactively.
diff --git a/docs/source/usage/running.rst b/docs/source/usage/running.rst
@@ -1,3 +1,5 @@
+.. _running_experiments:
+
 ===================
 Running Experiments
 ===================
@@ -152,6 +154,8 @@ to see a readout of the docstring.
 
     to get the same result.
 
+.. _running_multiple_experiments:
+
 Launching Multiple Experiments at Once
 --------------------------------------
 
@@ -206,9 +210,9 @@ Algorithm Flags
 General Flags
 ~~~~~~~~~~~~~
 
-.. option:: --save_cps, --save_checkpoints
+.. option:: --save_cps, --save_checkpoints, default: False
 
-    Only the most recent state of the agent and environment is saved by default. When the
+    :obj:`bool`. Only the most recent state of the agent and environment is saved by default. When the
     ``--save_checkpoints`` flag is supplied, a snapshot (checkpoint) of the agent and
     environment will be saved at each epoch. These snapshots are saved in a ``checkpoints``
     folder inside the Logger output directory (for more information, see
@@ -291,15 +295,31 @@ Logger Flags
 The CLI also contains several (shortcut) flags that can be used to change the behaviour of the
 :class:`stable_learning_control.utils.log_utils.logx.EpochLogger`.
 
-.. option:: --use_tb, --logger_kwargs:use_tensorboard
+.. option:: --use_tb, --logger_kwargs:use_tensorboard, default=False
 
-    :obj:`bool`. Enables tensorboard logging.
+    :obj:`bool`. Enables TensorBoard logging.
 
 .. option:: --tb_log_freq, --logger_kwargs:tb_log_freq, default='low'
 
-    :obj:`str`. The tensorboard log frequency. Options are ``low`` (Recommended: logs at every epoch) and
+    :obj:`str`. The TensorBoard log frequency. Options are ``low`` (Recommended: logs at every epoch) and
     ``high`` (logs at every SGD update batch). Defaults to ``low`` since this is less resource intensive.
 
+.. option:: --use_wandb, --logger_kwargs:use_wandb, default=False
+
+    :obj:`bool`. Enables Weights & Biases logging.
+
+.. option:: --wandb_job_type, --logger_kwargs:wandb_job_type, default='train'
+
+    :obj:`str`. The Weights & Biases job type.
+
+.. option:: --wandb_project, --logger_kwargs:wandb_project, default='stable_learning_control'
+
+        :obj:`str`. The Weights & Biases project name.
+
+.. option:: --wandb_group, --logger_kwargs:wandb_group, default=None
+
+    :obj:`str`. The Weights & Biases group name.
+
 .. option:: --quiet, --logger_kwargs:quiet, default=False
 
     :obj:`bool`. Suppress logging of diagnostics to the stdout.
@@ -318,6 +338,8 @@ The CLI also contains several (shortcut) flags that can be used to change the be
     The verbose_vars list should be supplied as a list that can be evaluated in Python (e.g. 
     ``--verbose_vars ["Lr_a", "Lr_c"]``).
 
+.. _exp_cfg:
+
 Using experimental configuration files (yaml)
 ---------------------------------------------
 
@@ -354,8 +376,11 @@ by `Haarnoja et al., 2019`_.
             critic: [256, 256, 16]
         lr_a: "1e-4, 1e-3, 1e-2"
 
+    Additionally, if you want to specify a `on/off`_ flag, you can supply an empty key.
+
 .. _`YAML`: https://docs.ansible.com/ansible/latest/reference_appendices/YAMLSyntax.html
 .. _`Haarnoja et al., 2019`: https://arxiv.org/abs/1801.01290
+.. _`on/off`: https://docs.python.org/dev/library/argparse.html#core-functionality
 
 Where Results are Saved
 -----------------------
@@ -419,7 +444,7 @@ Extra
     ``stable_learning_control/run.py``.
 
 Use transfer learning
-----------------------------------------------------
+---------------------
 
 The ``start_policy`` command-line flag allows you to use an already trained algorithm as the starting point for
 your new algorithm:
@@ -431,7 +456,7 @@ your new algorithm:
     where the already trained policy is found.
 
 Using custom environments
-----------------------------------------------------
+-------------------------
 
 The SLC package can be used with any :gymnasium:`Gymnasium-based <>` environment. To use a custom environment, you need
 to ensure it inherits from the :class:`gym.Env` class and implements the following methods:
@@ -498,10 +523,8 @@ Consider the example in ``stable_learning_control/examples/pytorch/sac_exp_grid_
 .. literalinclude:: /../../examples/pytorch/sac_exp_grid_search.py
    :language: python
    :linenos:
-   :lines: 17-
-   :emphasize-lines: 19-25, 28
-
-(An equivalent TensorFlow example is available in ``stable_learning_control/examples/tf2/sac_exp_grid_search.py``.)
+   :lines: 16-
+   :emphasize-lines: 22-28, 31
 
 After making the ExperimentGrid object, parameters are added to it with
 
@@ -525,36 +548,6 @@ Except for the absence of shortcut kwargs (you can't use ``hid`` for ``ac_kwargs
 basic behaviour of ``ExperimentGrid`` is the same as running things from the command line.
 (In fact, ``stable_learning_control.run`` uses an ``ExperimentGrid`` under the hood.)
 
-Using the Ray tuning package
------------------------------
-
-The SLC package can also be used with more advanced tuning algorithms. An example of how to use SLC with
-the Ray Tuning package can be found in ``stable_learning_control/examples/torch/sac_ray_hyper_parameter_tuning.py`` and
-``stable_learning_control/examples/tf2/sac_ray_hyper_parameter_tuning.py``. The requirements for this example can be
-installed using the following command:
-
-.. code-block:: bash
-
-    pip install .[tuning]
-
-Consider the example in ``stable_learning_control/examples/pytorch/sac_ray_hyper_parameter_tuning.py``:
-
-.. literalinclude:: /../../examples/pytorch/sac_ray_hyper_parameter_tuning.py
-   :language: python
-   :linenos:
-   :lines: 16-
-   :emphasize-lines: 17-32, 45, 52-64, 70-85
-
-(An equivalent TensorFlow example is available in ``stable_learning_control/examples/tf2/sac_ray_hyper_parameter_tuning.py``.)
-
-In this example, on lines ``17-32`` we first create a small wrapper function that ensures that the Ray Tuner serves the
-hyperparameters in the SLC algorithm's format. Following in line ``45``, we set the starting point for several
-hyperparameters used in the hyperparameter search. Next, on lines ``52-64``, we define the hyperparameter search space.
-Lastly, we start the hyperparameter search using the :meth:`tune.run` method online ``70-85``.
-
-The Ray tuner will search for the best hyperparameter combination when running the script. While doing so, it will print
-the results both to the ``stdout`` and a Tensorboard file. You can check these Tensorboard logs using the
-``tensorboard --logdir ./data/ray_results`` command. For more information on how the ray tuning package works, see
-the `Ray tuning documentation`_.
-
-.. _`Ray tuning documentation`: https://docs.ray.io/en/latest/tune/index.html
+..  note::
+
+    An equivalent TensorFlow example is available in ``stable_learning_control/examples/tf2/sac_exp_grid_search.py``.