Skip to content

Commit

Permalink
feat: improve hyperparmeter tuning, logger and add W&B logging (#314)
Browse files Browse the repository at this point in the history
This commit:

  - Adds [Weights & Biases](https://wandb.ai/site) logging to the train and eval
utilities.
  - Cleans up the logger.
  - Cleans up the Hyperparameter pipeline and documentation.
  - Cleans up the code.
  • Loading branch information
rickstaa authored Aug 8, 2023
1 parent 6788535 commit 74afd65
Show file tree
Hide file tree
Showing 48 changed files with 1,959 additions and 1,177 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ pytest/
*.pytest_cache/
.coverage
pytest-coverage.txt
wandb/

# cache
__pycache__/
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
[![codecov](https://codecov.io/gh/rickstaa/stable-learning-control/branch/main/graph/badge.svg?token=4SAME74CJ7)](https://codecov.io/gh/rickstaa/stable-learning-control)
[![Contributions](https://img.shields.io/badge/contributions-welcome-brightgreen.svg)](CONTRIBUTING.md)
[![DOI](https://zenodo.org/badge/271989240.svg)](https://zenodo.org/badge/latestdoi/271989240)
[![Weights & Biases dashboard](https://img.shields.io/badge/Weights_&_Biases-FFCC33?style=flat\&logo=WeightsAndBiases\&logoColor=black)](https://wandb.ai/rickstaa/stable-learning-control)

## Package Overview

Expand Down
1 change: 1 addition & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,7 @@ def __getattr__(cls, name):
"gymnasium": ("https://gymnasium.farama.org/%s", None),
"tf2": ("https://www.tensorflow.org/api_docs/python/tf/%s", None),
"tb": ("https://www.tensorflow.org/tensorboard/%s", None),
"wandb": ("https://docs.wandb.ai/%s", None),
}


Expand Down
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ ready-to-use compatible environments can be found in the :stable-gym:`stable-gy
usage/installation
usage/algorithms
usage/running
usage/hyperparameter_tuning
usage/saving_and_loading
usage/plotting
usage/eval_robustness
Expand Down
38 changes: 31 additions & 7 deletions docs/source/usage/eval_robustness.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,11 @@ SLC ships with a handy utility for evaluating the policy's robustness. This is d

.. parsed-literal::
python -m stable_learning_control.run eval_robustness [path/to/output_directory] [disturber] [-h] [--list_disturbers]
[--disturber_config DISTURBER_CONFIG] [--data_dir DATA_DIR] [--itr ITR]
[--len LEN] [--episodes EPISODES] [--render] [--deterministic] [--disable_baseline]
[--observations [OBSERVATIONS [OBSERVATIONS ...]]] [--references [REFERENCES [REFERENCES ...]]]
[--reference_errors [REFERENCE_ERRORS [REFERENCE_ERRORS ...]]] [--absolute_reference_errors]
[--merge_reference_errors] [--use_subplots] [--use_time] [--save_result] [--save_plots]
[--figs_fmt FIGS_FMT] [--font_scale FONT_SCALE]
python -m stable_learning_control.run eval_robustness [path/to/output_directory] [disturber] [-h] [--list_disturbers] [--disturber_config DISTURBER_CONFIG] [--data_dir DATA_DIR] [--itr ITR] [--len LEN] [--episodes EPISODES] [--render] [--deterministic]
[--disable_baseline] [--observations [OBSERVATIONS [OBSERVATIONS ...]]] [--references [REFERENCES [REFERENCES ...]]]
[--reference_errors [REFERENCE_ERRORS [REFERENCE_ERRORS ...]]] [--absolute_reference_errors] [--merge_reference_errors] [--use_subplots] [--use_time] [--save_result]
[--save_plots] [--figs_fmt FIGS_FMT] [--font_scale FONT_SCALE] [--use_wandb] [--wandb_job_type WANDB_JOB_TYPE] [--wandb_project WANDB_PROJECT] [--wandb_group WANDB_GROUP]
[--wandb_run_name WANDB_RUN_NAME]
The most important input arguments are:

Expand All @@ -37,6 +35,18 @@ The most important input arguments are:
For more information about all the input arguments available for the ``eval_robustness`` tool you can use the ``--help`` flag or check the :ref:`robustness evaluation utility <eval_robustness>`
documentation or :ref:`the API reference <autoapi>`.

Robustness eval configuration file (yaml)
-----------------------------------------

The SLC CLI comes with a handy configuration file loader that can be used to load `YAML`_ configuration files.
These configuration files provide a convenient way to store your robustness evaluation parameters such that results
can be reproduced. You can supply the CLI with an experiment configuration file using the ``--eval_cfg`` flag. The
configuration file format equals the format expected by the :ref:`--exp_cfg <exp_cfg>` flag of the :ref:`run experiments <running_experiments>` utility.

.. option:: --eval_cfg

:obj:`path str`. Sets the path to the ``yml`` config file used for loading experiment hyperparameter.

Available disturbers
====================

Expand Down Expand Up @@ -80,6 +90,11 @@ The robustness evaluation tool can save several files to disk that contain infor

These files will be saved inside the ``eval`` directory inside the output directory.

.. tip::

You can also log these results to Weights & Biases by adding the and ``--use_wandb`` flag to the
CLI command (see :ref:`eval_robustness` for more information).

Plots
-----

Expand Down Expand Up @@ -157,6 +172,15 @@ by specifying the module containing your disturber and the disturber class name.
python -m stable_learning_control.run eval_robustness [path/to/output_directory] --disturber "my_module.MyDisturber"
Special attributes
------------------

The SLC package looks for several attributes in the disturber class to get information about the disturber that can be used during the robustness evaluation. These attributes are:

.. describe:: disturbance_label

:obj:`str`. Can be used to set the label of the disturber in the plots. If not present the :ref:`robustness evaluation utility <eval_robustness>` will generate a label based on the disturber configuration.

Manual robustness evaluation
============================

Expand Down
72 changes: 72 additions & 0 deletions docs/source/usage/hyperparameter_tuning.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
=====================
Hyperparameter Tuning
=====================

Hyperparameter tuning is crucial in RL as it directly impacts the agent's performance and stability. Properly selected
hyperparameters can lead to faster convergence, improved overall task performance and generalizability. Because of this,
the SLC package provides several tools to help with hyperparameter tuning.

Use the ExperimentGrid utility
------------------------------

As outlined in the :ref:`Running Experiments <running_multiple_experiments>` section, the SLC package includes a utility
class called :ref:`ExperimentGrid <exp_grid_utility>`, which enables the execution of multiple experiments **sequentially**.
You can utilize this utility in two ways: by supplying the :ref:`CLI <runner>` with more than one value for a specific argument
(refer to :ref:`Running Experiments <running_experiments>`), or by directly employing the
:class:`~stable_learning_control.utils.run_utils.ExperimentGrid` class (see :ref:`running_multiple_experiments`). These
methods facilitate running numerous experiments with distinct hyperparameter combinations, enabling a hyperparameter grid search
to identify the optimal parameter setting for your task. For instance, to execute the LAC algorithm on the `CartPoleCost-v0`_
environment with various values for actor and critic learning rates using the :ref:`CLI <runner>`, employ the following command:

.. code-block:: bash
python -m stable_learning_control.examples.pytorch.run lac --env CartPoleCost-v0 --lr_a 0.001 0.01 0.1 --lr_c 0.001 0.01 0.1
.. _`CartPoleCost-v0`: https://rickstaa.dev/stable-gym/envs/classic_control/cartpole_cost.html

.. tip::

You can enable logging of TensorBoard and Weights & Biases by adding the ``--use_tensorboard`` and ``--use_wandb`` flags to the
above command. These tools will allow you to track the performance of your experiments and compare the results of
different hyperparameter combinations. For more information on how to use these logging utilities, see :ref:`loggers`.

Use the Ray tuning package
--------------------------

The SLC package can also be used with more advanced tuning libraries like `Ray Tune`_, which uses cutting-edge optimization algorithms to
find the best hyperparameters for your model faster. An example of how to use SLC with the Ray Tuning package can be found in
``stable_learning_control/examples/torch/sac_ray_hyper_parameter_tuning.py`` and
``stable_learning_control/examples/tf2/sac_ray_hyper_parameter_tuning.py``. The requirements for this example can be
installed using the following command:

.. code-block:: bash
pip install .[tuning]
Consider the example in ``stable_learning_control/examples/pytorch/sac_ray_hyper_parameter_tuning.py``:

.. literalinclude:: /../../examples/pytorch/sac_ray_hyper_parameter_tuning.py
:language: python
:linenos:
:lines: 32-
:emphasize-lines: 12, 15-30, 38-49, 56, 59-66, 70-95, 98

In this example, a boolean on line ``12`` can enable Weights & Biases logging. On lines ``15-30``, we first create a small wrapper
function that ensures that the Ray Tuner serves the hyperparameters in the SLC algorithm's format. Following lines ``38-49`` setup
a Weights & Biases callback if the ``USE_WANDB`` constant is set to ``True``. On line ``56``, we then set the starting point for
several hyperparameters used in the hyperparameter search. Next, we define the hyperparameter search space on lines ``59-66``
while we initialise the Ray Tuner instance on lines ``70-95``. Lastly, we start the hyperparameter search by calling the
Tuners ``fit`` method on line ``98``.

When running the script, the Ray tuner will search for the best hyperparameter combination. While doing so will print
the results to the ``stdout``, a TensorBoard logging file and the Weights & Biases portal. You can check the TensorBoard logs using the
``tensorboard --logdir ./data/ray_results`` command and the Weights & Biases results on `the Weights & Biases website`_. For more information on how the `Ray Tune`_ tuning package works, see
the `Ray Tune documentation`_.

.. _`Ray Tune`: https://docs.ray.io/en/latest/tune/index.html
.. _`the Weights & Biases website`: https://wandb.ai
.. _`Ray Tune documentation`: https://docs.ray.io/en/latest/tune/index.html

.. note::

An equivalent TensorFlow example is available in ``stable_learning_control/examples/tf2/sac_ray_hyper_parameter_tuning.py``.
7 changes: 5 additions & 2 deletions docs/source/usage/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -117,8 +117,11 @@ This command will install the SLC package with the :torch:`Pytorch <>` backend i

.. important::

If you are using Conda you might have to install the `mpi4py`_ package inside using Conda instead of pip.
This can be done using the following command:

If you are using Conda, you may come across issues while installing or utilizing the SLC package,
such as installation errors or script freezing. To effectively resolve these problems, it is
recommended to install the mpi4py_ package from within Conda instead of using pip. This can
be accomplished by executing the following command:

.. code-block:: bash
Expand Down
5 changes: 5 additions & 0 deletions docs/source/usage/plotting.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,8 @@ SLC ships with a simple plotting utility that can be used to plot diagnostics fr
:align: center

Example plot that displays the performance of the LAC algorithm.

.. tip::

The SLC package also supports TensorBoard and Weights & Biases logging. See :ref:`loggers` for more information. This allows you
to inspect your experiments' results during training and compare the performance of different algorithms more interactively.
81 changes: 37 additions & 44 deletions docs/source/usage/running.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _running_experiments:

===================
Running Experiments
===================
Expand Down Expand Up @@ -152,6 +154,8 @@ to see a readout of the docstring.
to get the same result.

.. _running_multiple_experiments:

Launching Multiple Experiments at Once
--------------------------------------

Expand Down Expand Up @@ -206,9 +210,9 @@ Algorithm Flags
General Flags
~~~~~~~~~~~~~

.. option:: --save_cps, --save_checkpoints
.. option:: --save_cps, --save_checkpoints, default: False

Only the most recent state of the agent and environment is saved by default. When the
:obj:`bool`. Only the most recent state of the agent and environment is saved by default. When the
``--save_checkpoints`` flag is supplied, a snapshot (checkpoint) of the agent and
environment will be saved at each epoch. These snapshots are saved in a ``checkpoints``
folder inside the Logger output directory (for more information, see
Expand Down Expand Up @@ -291,15 +295,31 @@ Logger Flags
The CLI also contains several (shortcut) flags that can be used to change the behaviour of the
:class:`stable_learning_control.utils.log_utils.logx.EpochLogger`.

.. option:: --use_tb, --logger_kwargs:use_tensorboard
.. option:: --use_tb, --logger_kwargs:use_tensorboard, default=False

:obj:`bool`. Enables tensorboard logging.
:obj:`bool`. Enables TensorBoard logging.

.. option:: --tb_log_freq, --logger_kwargs:tb_log_freq, default='low'

:obj:`str`. The tensorboard log frequency. Options are ``low`` (Recommended: logs at every epoch) and
:obj:`str`. The TensorBoard log frequency. Options are ``low`` (Recommended: logs at every epoch) and
``high`` (logs at every SGD update batch). Defaults to ``low`` since this is less resource intensive.

.. option:: --use_wandb, --logger_kwargs:use_wandb, default=False

:obj:`bool`. Enables Weights & Biases logging.

.. option:: --wandb_job_type, --logger_kwargs:wandb_job_type, default='train'

:obj:`str`. The Weights & Biases job type.

.. option:: --wandb_project, --logger_kwargs:wandb_project, default='stable_learning_control'

:obj:`str`. The Weights & Biases project name.

.. option:: --wandb_group, --logger_kwargs:wandb_group, default=None

:obj:`str`. The Weights & Biases group name.

.. option:: --quiet, --logger_kwargs:quiet, default=False

:obj:`bool`. Suppress logging of diagnostics to the stdout.
Expand All @@ -318,6 +338,8 @@ The CLI also contains several (shortcut) flags that can be used to change the be
The verbose_vars list should be supplied as a list that can be evaluated in Python (e.g.
``--verbose_vars ["Lr_a", "Lr_c"]``).

.. _exp_cfg:

Using experimental configuration files (yaml)
---------------------------------------------

Expand Down Expand Up @@ -354,8 +376,11 @@ by `Haarnoja et al., 2019`_.
critic: [256, 256, 16]
lr_a: "1e-4, 1e-3, 1e-2"
Additionally, if you want to specify a `on/off`_ flag, you can supply an empty key.

.. _`YAML`: https://docs.ansible.com/ansible/latest/reference_appendices/YAMLSyntax.html
.. _`Haarnoja et al., 2019`: https://arxiv.org/abs/1801.01290
.. _`on/off`: https://docs.python.org/dev/library/argparse.html#core-functionality

Where Results are Saved
-----------------------
Expand Down Expand Up @@ -419,7 +444,7 @@ Extra
``stable_learning_control/run.py``.

Use transfer learning
----------------------------------------------------
---------------------

The ``start_policy`` command-line flag allows you to use an already trained algorithm as the starting point for
your new algorithm:
Expand All @@ -431,7 +456,7 @@ your new algorithm:
where the already trained policy is found.

Using custom environments
----------------------------------------------------
-------------------------

The SLC package can be used with any :gymnasium:`Gymnasium-based <>` environment. To use a custom environment, you need
to ensure it inherits from the :class:`gym.Env` class and implements the following methods:
Expand Down Expand Up @@ -498,10 +523,8 @@ Consider the example in ``stable_learning_control/examples/pytorch/sac_exp_grid_
.. literalinclude:: /../../examples/pytorch/sac_exp_grid_search.py
:language: python
:linenos:
:lines: 17-
:emphasize-lines: 19-25, 28

(An equivalent TensorFlow example is available in ``stable_learning_control/examples/tf2/sac_exp_grid_search.py``.)
:lines: 16-
:emphasize-lines: 22-28, 31

After making the ExperimentGrid object, parameters are added to it with

Expand All @@ -525,36 +548,6 @@ Except for the absence of shortcut kwargs (you can't use ``hid`` for ``ac_kwargs
basic behaviour of ``ExperimentGrid`` is the same as running things from the command line.
(In fact, ``stable_learning_control.run`` uses an ``ExperimentGrid`` under the hood.)

Using the Ray tuning package
-----------------------------

The SLC package can also be used with more advanced tuning algorithms. An example of how to use SLC with
the Ray Tuning package can be found in ``stable_learning_control/examples/torch/sac_ray_hyper_parameter_tuning.py`` and
``stable_learning_control/examples/tf2/sac_ray_hyper_parameter_tuning.py``. The requirements for this example can be
installed using the following command:

.. code-block:: bash
pip install .[tuning]
Consider the example in ``stable_learning_control/examples/pytorch/sac_ray_hyper_parameter_tuning.py``:

.. literalinclude:: /../../examples/pytorch/sac_ray_hyper_parameter_tuning.py
:language: python
:linenos:
:lines: 16-
:emphasize-lines: 17-32, 45, 52-64, 70-85

(An equivalent TensorFlow example is available in ``stable_learning_control/examples/tf2/sac_ray_hyper_parameter_tuning.py``.)

In this example, on lines ``17-32`` we first create a small wrapper function that ensures that the Ray Tuner serves the
hyperparameters in the SLC algorithm's format. Following in line ``45``, we set the starting point for several
hyperparameters used in the hyperparameter search. Next, on lines ``52-64``, we define the hyperparameter search space.
Lastly, we start the hyperparameter search using the :meth:`tune.run` method online ``70-85``.

The Ray tuner will search for the best hyperparameter combination when running the script. While doing so, it will print
the results both to the ``stdout`` and a Tensorboard file. You can check these Tensorboard logs using the
``tensorboard --logdir ./data/ray_results`` command. For more information on how the ray tuning package works, see
the `Ray tuning documentation`_.

.. _`Ray tuning documentation`: https://docs.ray.io/en/latest/tune/index.html
.. note::

An equivalent TensorFlow example is available in ``stable_learning_control/examples/tf2/sac_exp_grid_search.py``.
Loading

0 comments on commit 74afd65

Please sign in to comment.