feat(latc): add LATC algorithm

This commit adds the LATC algorithm. The Laypunov Actor-Twin Critic (LATC) algorithm is a successor to the LAC algorithm. In contrast to its predecessor, the LATC algorithm employs a dual-critic approach, aligning it more closely with the SAC algorithm upon which LAC was built initially.
rickstaa · Aug 11, 2023 · 004b7ec · 004b7ec
1 parent dfc239b
commit 004b7ec
Show file tree

Hide file tree

Showing 39 changed files with 1,624 additions and 104 deletions.
diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -72,6 +72,7 @@ def __getattr__(cls, name):
 \usepackage{amsmath}
 \usepackage{cancel}
 \usepackage{physics}
+\usepackage{bm}
 """
 latex_macros = r"""
 \newcommand{\E}{{\mathrm E}}

diff --git a/docs/source/dev/doc_dev.rst b/docs/source/dev/doc_dev.rst
@@ -34,12 +34,10 @@ the ``docs/build/html`` directory. If the documentation is successfully created,
 ``make linkcheck`` command to check for broken links.
 
 .. attention::
-
     Ensure you are in the Conda environment where you installed the :slc:`stable_learning_control <>`
     package with its dependencies.
 
 .. note::
-
     Sometimes the ``make linkcheck`` command doesn't show the results on the stdout. You can also find the results
     in the ``docs/build/linkcheck`` folder. 
 

diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -19,7 +19,6 @@ ready-to-use compatible environments can be found in the  :stable-gym:`stable-gy
 :ros-gazebo-gym:`Ros Gazebo Gym <>` packages.
 
 .. note::
-
    This framework was built upon the `SpinningUp`_ educational resource. By doing this, we 
    hope to make it easier for new researchers to start with our Algorithms. If you are new
    to RL, check out the SpinningUp documentation and play with it before diving into our

diff --git a/docs/source/usage/algorithms.rst b/docs/source/usage/algorithms.rst
@@ -17,7 +17,6 @@ Stable Agents
 -------------
 
 .. important::
-
    As explained in the :ref:`installation section <gym_envs_install>` of the documentation,
    although the ``opt_type`` algorithm variable can be used to train on standard
    :gymnasium:`gymnasium <>` environments, the stable RL agents require a positive definite
@@ -31,6 +30,7 @@ The SLC package currently contains the following theoretically stable RL algorit
    :maxdepth: 1
 
    algorithms/lac
+   algorithms/latc
 
 Unstable Agents
 ---------------

diff --git a/docs/source/usage/algorithms/lac.rst b/docs/source/usage/algorithms/lac.rst
@@ -1,13 +1,12 @@
 .. _lac:
 
-=====================
-Lyapunov Actor-Critic
-=====================
+===========================
+Lyapunov Actor-Critic (LAC)
+===========================
 
 .. contents:: Table of Contents
 
 .. seealso::
-
     This document assumes you are familiar with the `Soft Actor-Critic (SAC)`_ algorithm.
     It is not meant to be a comprehensive guide but mainly depicts the difference between
     the :ref:`SAC <sac>` and `Lyapunov Actor-Critic (LAC)`_ algorithms. For more information,
@@ -21,7 +20,6 @@ Lyapunov Actor-Critic
 
 
 .. important::
-
     The LAC algorithm only guarantees stability in **mean cost** when trained on environments 
     with a positive definite cost function (i.e. environments in which the cost is minimized).
     The ``opt_type`` argument can be set to ``maximize`` when training in environments where
@@ -120,10 +118,15 @@ Where :math:`L_{target}` is the approximation target received from the `infinite
 and :math:`\mathcal{D}` the set of collected transition pairs.
 
 .. note::
-
     As explained by `Han et al., 2020`_ the sum of cost over a finite time horizon can also be used as the
     approximation target. This version still needs to be implemented in the SLC framework.
 
+.. seealso:: 
+    The SLC package also contains a LAC implementation using a double Q-Critic (i.e., :ref:`Lyapunov Twin Critic <latc>`).
+    For more information about this version, see the :ref:`LAC Twin Critic <latc>` documentation. This version can be used
+    by specifying the ``latc`` algorithm in the CLI, by supplying the :meth:`~stable_learning_control.algos.pytorch.lac.lac.lac` function with the ``actor_critic=LyapunovActorTwinCritic``
+    argument or by directly calling the :meth:`~stable_learning_control.algos.pytorch.lac.latc.latc` function.
+
 .. _`mean-squared Bellman error (MSBE) minimisation`: https://spinningup.openai.com/en/latest/algorithms/ddpg.html?highlight=msbe#the-q-learning-side-of-ddpg
 .. _`infinite-horizon discounted return value function`: https://spinningup.openai.com/en/latest/spinningup/rl_intro.html#value-functions
 .. _`Belman equation`: https://spinningup.openai.com/en/latest/spinningup/rl_intro.html#bellman-equations
@@ -197,7 +200,6 @@ For more information on the LAC algorithm, please check out the original paper o
 
 .. _`Han et al., 2020`: https://arxiv.org/abs/2004.14288
 
-
 Pseudocode
 ----------
 
@@ -222,9 +224,6 @@ Pseudocode
     \end{algorithmic}
     \end{algorithm}
 
-.. _`11 of Han et al., 2020`: https://arxiv.org/pdf/2004.14288.pdf
-.. _`eq. (7) and (14) from Han et al., 2020`: https://arxiv.org/pdf/2004.14288.pdf
-
 Implementation
 ==============
 
@@ -251,7 +250,6 @@ Algorithm: TensorFlow Version
 -----------------------------
 
 .. attention::
-
     The TensorFlow version is still experimental. It is not guaranteed to work, and it is not
     guaranteed to be up-to-date with the PyTorch version.
 
@@ -260,7 +258,7 @@ Algorithm: TensorFlow Version
 Saved Model Contents: TensorFlow Version
 ----------------------------------------
 
-The TensorFlow version of the SAC algorithm is implemented by subclassing the :class:`tf.nn.Model` class. As a result, both the
+The TensorFlow version of the LAC algorithm is implemented by subclassing the :class:`tf.nn.Model` class. As a result, both the
 full model and the current model weights are saved. The complete model can be found in the ``saved_model.pb`` file, while the current
 weights checkpoints are found in the ``tf_safe/weights_checkpoint*`` file. For an example of using these two methods, see :ref:`saving_and_loading`
 or the :tensorflow:`TensorFlow documentation <tutorials/keras/save_and_load>`.

diff --git a/docs/source/usage/algorithms/latc.rst b/docs/source/usage/algorithms/latc.rst
@@ -0,0 +1,141 @@
+.. _latc:
+
+=================================
+Lyapunov Actor-Twin Critic (LATC) 
+=================================
+
+.. contents:: Table of Contents
+
+.. seealso::
+    This document assumes you are familiar with the :ref:`Lyapunov Actor-Critic (LAC) <lac>` algorithm.
+    It is not a comprehensive guide but mainly depicts the difference between the
+    :ref:`Lyapunov Actor-Twin Critic <latc>` and :ref:`Lyapunov Actor-Critic (LAC) <lac>` algorithms. It
+    is therefore meant to complement the :ref:`LAC <lac>` algorithm documentation.
+
+.. important::
+    Like the LAC algorithm, this LATC algorithm only guarantees stability in **mean cost** when trained on
+    environments with a positive definite cost function (i.e. environments in which the cost is minimised).
+    The ``opt_type`` argument can be set to ``maximise `` when training in environments where the reward is
+    maximised. However, because the `Lyapunov's stability conditions`_ are not satisfied, the LAC algorithm
+    no longer guarantees stability in **mean** cost.
+
+.. _`Lyapunov's stability conditions`: https://www.cds.caltech.edu/~murray/courses/cds101/fa02/caltech/mls93-lyap.pdf
+
+Background
+==========
+
+The Laypunov Actor-Twin Critic (LATC) algorithm is a successor to the :ref:`LAC <lac>` algorithm. In contrast
+to its predecessor, the LATC algorithm employs a dual-critic approach, aligning it more closely with the
+:ref:`SAC <sac>` algorithm upon which LAC was built initially. In the SAC framework, these dual critics
+served to counteract overestimation bias by selecting the minimum value from both critics for the actor updates. 
+In our case, we employ the maximum to minimise the cost, thus addressing potential underestimation bias in
+Lyapunov values. For a deeper exploration of this concept, refer to the research paper by `Haarnoja et. al 2019`_.
+For more information on the inner workings of the LAC algorithm, refer to the :ref:`LAC <lac>` algorithm
+documentation. Below only the differences between the LAC and LATC algorithms are discussed.
+
+.. _`Haarnoja et. al 2019`: https://arxiv.org/abs/1801.01290
+
+Differences with the LAC algorithm
+------------------------------------
+
+Like its direct predecessor, the LATC algorithm also uses **entropy regularisation** to increase exploration and a
+Gaussian actor and value-critic to develop the best action. The main difference lies in the fact that the
+:class:`~stable_learning_control.algos.pytorch.policies.lyapunov_actor_twin_critic.LyapunovActorTwinCritic`
+contains two critic instead of one. These critics are identical to the critic used in the :ref:`LAC <lac>`
+algorithm but trained separately. Following their maximum is used to update the actor. Because of this the policy issues
+optimised according to
+
+.. math::
+    :label: latc_policy_update
+
+    \min_{\theta} \underE{s \sim \mathcal{D} \\ \xi \sim \mathcal{N}}{\lambda(\bm{L_{c_{max}}(s^{'}, f_{\theta}(\epsilon, s^{'})})-L_{c}(s, a) + \alpha_{3}c) + \mathcal{\alpha}\log \pi_{\theta}(f_{\theta}(\epsilon, s)|s) + \mathcal{H}_{t}}
+
+Where :math:`L_{c_{max}}` now represents the maximum of the two critics. The rest of the algorithm remains the same.
+
+.. important:: 
+    Because the LATC and LAC algorithms are so similar, the :meth:`~stable_learning_control.algos.pytorch.latc.latc` is
+    implemented as a wrapper around the :meth:`~stable_learning_control.algos.pytorch.lac.lac` function. This wrapper
+    only changes the actor-critic architecture to :class:`~stable_learning_control.algos.pytorch.policies.lyapunov_actor_twin_critic.LyapunovActorTwinCritic`.
+    To prevent code duplication, the :class:`stable_learning_control.algos.pytorch.policies.lyapunov_actor_critic.LyapunovActorCritic` class
+    was modified to use the maximum of the two critics when the :class:`~stable_learning_control.algos.pytorch.policies.lyapunov_actor_twin_critic.LyapunovActorTwinCritic`
+    class is set as the actor-critic architecture.
+
+Quick Fact
+----------
+
+* LATC is an off-policy algorithm.
+* It is guaranteed to be stable in mean cost.
+* The version of LATC implemented here can only be used for environments with continuous action spaces.
+* An alternate version of LATC, which slightly changes the policy update rule, can be implemented to handle discrete action spaces.
+* The SLC implementation of LATC does not support parallelisation.
+
+
+Further Reading
+---------------
+
+For more information on the LATC algorithm, please check out the :ref:`LAC <lac>` documentation and the original paper of `Han et al., 2020`_.
+
+.. _`Han et al., 2020`: https://arxiv.org/abs/2004.14288
+
+Pseudocode
+----------
+
+.. math::
+    :nowrap:
+
+    \begin{algorithm}[H]
+        \caption{Lyapunov-based Actor-Twin Critic (LATC)}
+        \label{alg1}
+    \begin{algorithmic}[1]
+        \REQUIRE Maximum episode length $N$ and maximum update steps $M$
+        \REPEAT
+            \STATE Samples $s_{0}$ according to $\rho$
+            \FOR{$t=1$ to $N$}
+                \STATE Sample $a$ from $\pi(a|s)$ and step forward
+                \STATE Observe $s'$, $c$ and store ($s$, $a$, $c$, $s'$) in $\mathcal{D}$
+            \ENDFOR
+            \FOR{$i=1$ to $M$}
+                \STATE Sample mini-batches of transitions from $D$ and update $L_{c}$, $L2_{c}$, $\pi$, Lagrance multipliers with eq. (7) and (14) of Han et al., 2020 and the new actor update rule described above
+            \ENDFOR
+        \UNTIL{eq. 11 of Han et al., 2020 is satisfied}
+    \end{algorithmic}
+    \end{algorithm}
+
+Implementation
+==============
+
+.. admonition:: You Should Know
+
+    In what follows, we give documentation for the PyTorch and TensorFlow implementations of LATC in SLC.
+    They have nearly identical function calls and docstrings, except for details relating to model construction.
+    However, we include both full docstrings for completeness.
+
+Algorithm: PyTorch Version
+--------------------------
+
+.. autofunction:: stable_learning_control.algos.pytorch.latc.latc
+
+Saved Model Contents: PyTorch Version
+-------------------------------------
+
+The PyTorch version of the LATC algorithm is implemented by subclassing the :class:`torch.nn.Module` class. Because of this and because
+the LATC algorithm is implemented as a wrapper around the LAC algorithm; the model weights are saved using the ``model_state`` dictionary ( :attr:`~stable_learning_control.algos.pytorch.lac.LAC.state_dict`). 
+These saved weights can be found in the ``torch_save/model_state.pt`` file. For an example of how to load a model using
+this file, see :ref:`saving_and_loading` or the :torch:`PyTorch documentation <tutorials/beginner/saving_loading_models.html>`.
+
+Algorithm: TensorFlow Version
+-----------------------------
+
+.. attention::
+    The TensorFlow version is still experimental. It is not guaranteed to work, and it is not
+    guaranteed to be up-to-date with the PyTorch version.
+
+.. autofunction:: stable_learning_control.algos.tf2.latc.latc
+
+Saved Model Contents: TensorFlow Version
+----------------------------------------
+
+The TensorFlow version of the LATC algorithm is implemented by subclassing the :class:`tf.nn.Model` class. As a result, both the
+full model and the current model weights are saved. The complete model can be found in the ``saved_model.pb`` file, while the current
+weights checkpoints are found in the ``tf_safe/weights_checkpoint*`` file. For an example of using these two methods, see :ref:`saving_and_loading`
+or the :tensorflow:`TensorFlow documentation <tutorials/keras/save_and_load>`.
diff --git a/docs/source/usage/algorithms/sac.rst b/docs/source/usage/algorithms/sac.rst
@@ -7,7 +7,6 @@ Soft Actor-Critic
 .. contents:: Table of Contents
 
 .. important::
-
     The SAC algorithm has no stability guarantees. Please use the :ref:`LAC <lac>` algorithm if
     you require stability guarantees.
 
@@ -88,7 +87,6 @@ Algorithm: TensorFlow Version
 ---------------------------------
 
 .. attention::
-
     The TensorFlow version is still experimental. It is not guaranteed to work, and it is not
     guaranteed to be up-to-date with the PyTorch version.
 

diff --git a/docs/source/usage/eval_robustness.rst b/docs/source/usage/eval_robustness.rst
@@ -31,7 +31,6 @@ The most important input arguments are:
     :class:`~stable_learning_control.disturbers.ObservationRandomNoiseDisturber` disturber).
 
 .. note::
-
     For more information about all the input arguments available for the ``eval_robustness`` tool you can use the ``--help`` flag or check the :ref:`robustness evaluation utility <eval_robustness>`
     documentation or :ref:`the API reference <autoapi>`.
 
@@ -91,7 +90,6 @@ The robustness evaluation tool can save several files to disk that contain infor
 These files will be saved inside the ``eval`` directory inside the output directory.
 
 .. tip:: 
-
     You can also log these results to Weights & Biases by adding the and ``--use_wandb`` flag to the
     CLI command (see :ref:`eval_robustness` for more information).
 

diff --git a/docs/source/usage/hyperparameter_tuning.rst b/docs/source/usage/hyperparameter_tuning.rst
@@ -25,7 +25,6 @@ environment with various values for actor and critic learning rates using the :r
 .. _`CartPoleCost-v0`: https://rickstaa.dev/stable-gym/envs/classic_control/cartpole_cost.html
 
 .. tip:: 
-
     You can enable logging of TensorBoard and Weights & Biases by adding the ``--use_tensorboard`` and ``--use_wandb`` flags to the
     above command. These tools will allow you to track the performance of your experiments and compare the results of
     different hyperparameter combinations. For more information on how to use these logging utilities, see :ref:`loggers`.

diff --git a/docs/source/usage/installation.rst b/docs/source/usage/installation.rst
@@ -28,7 +28,6 @@ For mac this command can be used:
     brew install openmpi
 
 .. note::
-
     The `Microsoft MPI`_ package can be used for Windows. 
 
 .. attention::
@@ -95,7 +94,6 @@ this environment. The SLC has two versions you can install:
       of the RL algorithms.
 
 .. note::
-
     We choose PyTorch as the default backend as it is easier to work with than TensorFlow. However, at the
     time of writing, it is slightly slower than the TensorFlow backend. This is caused because the agents
     used in the SLC package use components not yet supported by :torch:`TorchScript <docs/stable/jit.html>`
@@ -116,8 +114,6 @@ you can install the SLC package using the following bash command:
 This command will install the SLC package with the :torch:`Pytorch <>` backend in your Conda environment.
 
 .. important::
-
-
     If you are using Conda, you may come across issues while installing or utilizing the SLC package,
     such as installation errors or script freezing. To effectively resolve these problems, it is
     recommended to install the mpi4py_ package from within Conda instead of using pip. This can
@@ -133,7 +129,6 @@ Install the TensorFlow version
 ------------------------------
 
 .. attention::
-
     As stated above, the Pytorch version was used during our experiments. As a result, the
     TensorFlow version is less well-tested than the Pytorch version and has limited support.
     It should therefore be considered experimental, as no guarantees can be given about the
@@ -147,7 +142,6 @@ package with the the following command:
     pip install -e .[tf2]
 
 .. warning::
-
     If you want to use the GPU version of TensorFlow, you must ensure you performed all
     the steps described in the `TensorFlow installation guide`_. It is also essential to
     know that depending on the version of TensorFlow and PyTorch you use, you might have

diff --git a/docs/source/usage/plotting.rst b/docs/source/usage/plotting.rst
@@ -13,7 +13,6 @@ SLC ships with a simple plotting utility that can be used to plot diagnostics fr
         [--select [SELECT [SELECT ...]]] [--exclude [EXCLUDE [EXCLUDE ...]]] [--est EST]
 
 .. seealso::
-
     For more information on this utility, see the :ref:`plot utility <plot>` documentation or code :ref:`the API reference <autoapi>`.
 
 .. figure:: ../images/plots/lac/example_lac_performance_plot.svg
@@ -22,6 +21,5 @@ SLC ships with a simple plotting utility that can be used to plot diagnostics fr
     Example plot that displays the performance of the LAC algorithm.
 
 ..  tip::
-
     The SLC package also supports TensorBoard and Weights & Biases logging. See :ref:`loggers` for more information. This allows you
     to inspect your experiments' results during training and compare the performance of different algorithms more interactively.
diff --git a/docs/source/usage/running.rst b/docs/source/usage/running.rst
@@ -106,7 +106,6 @@ the runner will look in ``stable_learning_control/user_config.py`` for which ver
 default to that algorithm.
 
 .. attention::
-
     The TensorFlow version is still experimental. It is not guaranteed to work, and it is not
     guaranteed to be up-to-date with the PyTorch version.
 
@@ -334,7 +333,6 @@ The CLI also contains several (shortcut) flags that can be used to change the be
     :obj:`list`. A list of variables you want to log to the stdout when ``quiet`` is ``False``. The default :obj:`None` means that all variables are logged.
 
 .. important::
-
     The verbose_vars list should be supplied as a list that can be evaluated in Python (e.g. 
     ``--verbose_vars ["Lr_a", "Lr_c"]``).
 
@@ -360,7 +358,6 @@ by `Haarnoja et al., 2019`_.
 
 
 .. important::
-
     Please note that if you want to run multiple hyperparameter variants, for example, multiple seeds or
     learning rates, you have to use comma/space-separated strings in your configuration file:
 

diff --git a/docs/source/usage/saving_and_loading.rst b/docs/source/usage/saving_and_loading.rst
@@ -154,7 +154,6 @@ is successfully saved alongside the agent, you can watch the trained agent act i
     python -m stable_learning_control.run test_policy [path/to/output_directory]
 
 .. seealso::
-
     For more information on using this utility, see the :ref:`test_policy` documentation or the code :ref:`the API reference <autoapi>`.
 
 .. _manual_policy_testing:
@@ -318,7 +317,6 @@ Deploy PyTorch Algorithms
 -------------------------
 
 .. attention::
-
     PyTorch provides multiple ways to deploy trained models to hardware (see the :torch:`PyTorch serving documentation <blog/model-serving-in-pyorch>`). 
     Unfortunately, at the time of writing, these methods currently do not support the agents used in the SLC package. For more information, see
     `this issue`_.