fix(oscillator): fix minor oscillator environments bugs (#360)

This commit fixes several errors that were found in the translation of the oscillator environments found in the paper of [Han et al. 2020](https://arxiv.org/abs/2004.14288): - Action gain was increased to adhere to the article. - Action space was decreased to adhere to the article. - Observation space was increased to adhere to the article. - Reward range was made variable.
rickstaa · Feb 1, 2024 · 1f928d7 · 1f928d7
1 parent df9d1b1
commit 1f928d7
Show file tree

Hide file tree

Showing 6 changed files with 230 additions and 160 deletions.
diff --git a/stable_gym/envs/biological/oscillator/README.md b/stable_gym/envs/biological/oscillator/README.md
@@ -1,6 +1,13 @@
 # Oscillator gymnasium environment
 
-A gymnasium environment for a synthetic oscillatory network of transcriptional regulators called a repressilator. A repressilator is a three-gene regulatory network where the dynamics of mRNA and proteins follow an oscillatory behaviour. First presented by [Han et al. 2020](https://arxiv.org/abs/2004.14288).
+A gymnasium environment for a synthetic oscillatory network of transcriptional regulators called a repressilator. A repressilator is a three-gene regulatory network where the dynamics of mRNA and proteins follow an oscillatory behaviour. First presented by [Han et al. 2020](https://arxiv.org/abs/2004.14288). Compared to the original implementation, our version introduces several enhancements to the environment, making it more flexible and user-friendly:
+
+* We've added environment arguments that allow for the modification of reference signal parameters.
+* System parameters can now be individually tailored for each protein, instead of applying a uniform set of parameters across all proteins.
+* The reference can now be excluded from the observation if desired.
+* The reference error can be included in the 'info' dictionary for additional context.
+* The observation space was expanded to accurately reproduce the plots presented in [Han et al. 2020](https://arxiv.org/abs/2004.14288), which was not possible with the original code's observation space.
+* Introduced an adjustable `max_cost` variable for terminating episodes, defaulting to $100$ for consistency with the original environment.
 
 ## Observation space
 

diff --git a/stable_gym/envs/biological/oscillator/oscillator.py b/stable_gym/envs/biological/oscillator/oscillator.py
@@ -10,7 +10,7 @@
 
 # TODO: Update solving criteria after training.
 class Oscillator(gym.Env):
-    """Synthetic oscillatory network environment.
+    r"""Synthetic oscillatory network environment.
 
     .. Note::
         Can also be used in a vectorized manner. See the
@@ -23,7 +23,20 @@ class Oscillator(gym.Env):
 
     Source:
         This environment corresponds to the Oscillator environment used in the paper
-        `Han et al. 2020`_.
+        `Han et al. 2020`_. In our implementation several additional features were added
+        to the environment to make it more flexible and easier to use:
+
+            - Environment arguments now allow for modification of the reference signal
+              parameters.
+            - System parameters can now be individually adjusted for each protein,
+              rather than applying the same parameters across all proteins.
+            - The reference can be omitted from the observation.
+            - Reference error can be included in the info dictionary.
+            - The observation space was expanded to accurately reproduce the plots
+              presented in `Han et al. 2020`_, which was not possible with the original
+              code's observation space.
+            - Added an adjustable ``max_cost`` threshold for episode termination,
+              defaulting to $100$ to match the original environment.
 
     .. _`Han et al. 2020`: https://arxiv.org/abs/2004.14288
 
@@ -33,24 +46,24 @@ class Oscillator(gym.Env):
         +-----+-----------------------------------------------+-------------------+-------------------+
         | Num | Observation                                   | Min               | Max               |
         +=====+===============================================+===================+===================+
-        | 0   | Lacl mRNA transcripts concentration           | 0                 | 100               |
+        | 0   | Lacl mRNA transcripts concentration           | 0                 | $\infty$          |
         +-----+-----------------------------------------------+-------------------+-------------------+
-        | 1   | tetR mRNA transcripts concentration           | 0                 | 100               |
+        | 1   | tetR mRNA transcripts concentration           | 0                 | $\infty$          |
         +-----+-----------------------------------------------+-------------------+-------------------+
-        | 2   | CI mRNA transcripts concentration             | 0                 | 100               |
+        | 2   | CI mRNA transcripts concentration             | 0                 | $\infty$          |
         +-----+-----------------------------------------------+-------------------+-------------------+
-        | 3   || lacI (repressor) protein concentration       | 0                 | 100               |
+        | 3   || lacI (repressor) protein concentration       | 0                 | $\infty$          |
         |     || (Inhibits transcription of the tetR gene)    |                   |                   |
         +-----+-----------------------------------------------+-------------------+-------------------+
-        | 4   || tetR (repressor) protein concentration       | 0                 | 100               |
+        | 4   || tetR (repressor) protein concentration       | 0                 | $\infty$          |
         |     || (Inhibits transcription of CI gene)          |                   |                   |
         +-----+-----------------------------------------------+-------------------+-------------------+
-        | 5   || CI (repressor) protein concentration         | 0                 | 100               |
+        | 5   || CI (repressor) protein concentration         | 0                 | $\infty$          |
         |     || (Inhibits transcription of lacI gene)        |                   |                   |
         +-----+-----------------------------------------------+-------------------+-------------------+
-        | 6   | The reference we want to follow               | 0                 | 100               |
+        | 6   | The reference we want to follow               | 0                 | $\infty$          |
         +-----+-----------------------------------------------+-------------------+-------------------+
-        | (7) || **Optional** - The error between the current | -100              | 100               |
+        | (7) || **Optional** - The error between the current | $-\infty$         | $\infty$          |
         |     || value of protein 1 and the reference         |                   |                   |
         +-----+-----------------------------------------------+-------------------+-------------------+
 
@@ -60,13 +73,13 @@ class Oscillator(gym.Env):
         +-----+------------------------------------------------------------+---------+---------+
         | Num | Action                                                     | Min     |   Max   |
         +=====+============================================================+=========+=========+
-        | 0   || Relative intensity of light signal that induce the        | -5      | 5       |
+        | 0   || Relative intensity of light signal that induce the        | 0       | 1       |
         |     || expression of the Lacl mRNA gene.                         |         |         |
         +-----+------------------------------------------------------------+---------+---------+
-        | 1   || Relative intensity of light signal that induce the        | -5      | 5       |
+        | 1   || Relative intensity of light signal that induce the        | 0       | 1       |
         |     || expression of the tetR mRNA gene.                         |         |         |
         +-----+------------------------------------------------------------+---------+---------+
-        | 2   || Relative intensity of light signal that induce the        | -5      | 5       |
+        | 2   || Relative intensity of light signal that induce the        | 0       | 1       |
         |     || expression of the CI mRNA gene.                           |         |         |
         +-----+------------------------------------------------------------+---------+---------+
 
@@ -81,8 +94,8 @@ class Oscillator(gym.Env):
         All observations are assigned a uniform random value in ``[0..5]``
 
     Episode Termination:
-        -   An episode is terminated when the maximum step limit is reached.
-        -   The step cost is greater than 100.
+        - An episode is terminated when the maximum step limit is reached.
+        - The step exceeds a threshold (default is $100$). This threshold can be adjusted using the `max_cost` environment argument.
 
     Solved Requirements:
         Considered solved when the average cost is lower than 300.
@@ -102,18 +115,21 @@ class Oscillator(gym.Env):
         t (float): The current time step.
         dt (float): The environment step size. Also available as :attr:`.tau`.
         sigma (float): The variance of the system noise.
+        max_cost (float): The maximum cost allowed before the episode is terminated.
     """  # noqa: E501
 
     def __init__(
         self,
         render_mode=None,
+        # NOTE: Custom environment arguments.
+        max_cost=100.0,
         reference_target_position=8.0,
         reference_amplitude=7.0,
         reference_frequency=(1 / 200),  # NOTE: Han et al. 2020 uses a period of 200.
         reference_phase_shift=0.0,
         clip_action=True,
         exclude_reference_from_observation=False,
-        exclude_reference_error_from_observation=True,
+        exclude_reference_error_from_observation=False,
         action_space_dtype=np.float64,
         observation_space_dtype=np.float64,
     ):
@@ -122,6 +138,8 @@ def __init__(
         Args:
             render_mode (str, optional): The render mode you want to use. Defaults to
                 ``None``. Not used in this environment.
+            max_cost (float, optional): The maximum cost allowed before the episode is
+                terminated. Defaults to ``100.0``.
             reference_target_position: The reference target position, by default
                 ``8.0`` (i.e. the mean of the reference signal).
             reference_amplitude: The reference amplitude, by default ``7.0``.
@@ -132,13 +150,15 @@ def __init__(
             exclude_reference_from_observation (bool, optional): Whether the reference
                 should be excluded from the observation. Defaults to ``False``.
             exclude_reference_error_from_observation (bool, optional): Whether the error
-                should be excluded from the observation. Defaults to ``True``.
+                should be excluded from the observation. Defaults to ``False``.
             action_space_dtype (union[numpy.dtype, str], optional): The data type of the
                 action space. Defaults to ``np.float64``.
             observation_space_dtype (union[numpy.dtype, str], optional): The data type
                 of the observation space. Defaults to ``np.float64``.
         """
         super().__init__()
+        assert max_cost > 0, "The maximum cost must be greater than 0."
+        self.max_cost = max_cost
         self._action_clip_warning = False
         self._clip_action = clip_action
         self._exclude_reference_from_observation = exclude_reference_from_observation
@@ -187,9 +207,9 @@ def __init__(
         self.c1 = 0.06  # Protein degradation rate p1.
         self.c2 = 0.06  # Protein degradation rate p2.
         self.c3 = 0.06  # Protein degradation rate p3.
-        self.b1 = 1.0  # Control input gain u1.
-        self.b2 = 1.0  # Control input gain u2.
-        self.b3 = 1.0  # Control input gain u3.
+        self.b1 = 5.0  # Control input gain u1.
+        self.b2 = 5.0  # Control input gain u2.
+        self.b3 = 5.0  # Control input gain u3.
 
         # Set noise parameters.
         # NOTE: Zero during training.
@@ -200,23 +220,31 @@ def __init__(
         self.delta5 = 0.0  # p2 noise.
         self.delta6 = 0.0  # p3 noise.
 
-        obs_low = np.array([0.0, 0.0, 0.0, 0.0, 0.0, 0.0])
-        obs_high = np.array([100.0, 100.0, 100.0, 100.0, 100.0, 100.0])
+        # NOTE: Observation space was changed compared to the original codebase of
+        # Han et al. 2020 to match paper's plots.
+        obs_low = np.array(
+            [0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
+        )  # NOTE:  Han's original code used -1.0.
+        obs_high = np.array(
+            [np.inf, np.inf, np.inf, np.inf, np.inf, np.inf]
+        )  # NOTE:  Han's original code used 1.0.
         if not self._exclude_reference_from_observation:
             obs_low = np.append(obs_low, 0.0)
-            obs_high = np.append(obs_high, 100.0)
+            obs_high = np.append(obs_high, np.inf)
         if not self._exclude_reference_error_from_observation:
-            obs_low = np.append(obs_low, -100.0)
-            obs_high = np.append(obs_high, 100.0)
+            obs_low = np.append(obs_low, -np.inf)
+            obs_high = np.append(obs_high, np.inf)
+        # NOTE: Han et al. 2020 did not clearly detail the action space in their paper.
+        # As a result the action space from their original code is used.
         self.action_space = spaces.Box(
-            low=np.array([-5.0, -5.0, -5.0]),
-            high=np.array([5.0, 5.0, 5.0]),
+            low=np.array([0.0, 0.0, 0.0]),
+            high=np.array([1.0, 1.0, 1.0]),
             dtype=self._action_space_dtype,
         )
         self.observation_space = spaces.Box(
             obs_low, obs_high, dtype=self._observation_space_dtype
         )
-        self.reward_range = (0.0, 100.0)
+        self.reward_range = (0.0, self.max_cost)
 
         self.viewer = None
         self.state = None
@@ -445,8 +473,8 @@ def reset(
             else self._init_state
         )
         self.t = 0.0
-        _, _, _, p1, _, _ = self.state.astype(self._observation_space_dtype)
         obs = self.state.astype(self._observation_space_dtype)
+        p1 = obs[3]
         r1 = self.reference(self.t).astype(self._observation_space_dtype)
         if not self._exclude_reference_from_observation:
             obs = np.append(obs, r1)
@@ -539,7 +567,7 @@ def physics_time(self):
     reference.append(info["reference"])
     print(f"\nPerforming '{EPISODES}' in the 'Oscillator' environment...\n")
     print(f"Episode: {episode}")
-    while episode <= EPISODES:
+    while episode + 1 <= EPISODES:
         action = (
             env.action_space.sample()
             if RANDOM_STEP

diff --git a/stable_gym/envs/biological/oscillator_complicated/README.md b/stable_gym/envs/biological/oscillator_complicated/README.md
@@ -1,6 +1,13 @@
 # Oscillator Complicated gymnasium environment
 
-A more challenging (i.e. complicated) version of the [Oscillator environment](https://rickstaa.dev/stable-gym/envs/biological/oscillator.html). This version adds an extra 4th protein and its accompanying mRNA transcription concentration to the environment. The light signal of an additional action input induces the mRNA transcription of this extra protein.
+A more challenging (i.e. complicated) version of the [Oscillator environment](https://rickstaa.dev/stable-gym/envs/biological/oscillator.html). This version adds an extra 4th protein and its accompanying mRNA transcription concentration to the environment. The light signal of an additional action input induces the mRNA transcription of this extra protein. First presented by [Han et al. 2020](https://arxiv.org/abs/2004.14288). Compared to the original implementation, our version introduces several enhancements to the environment, making it more flexible and user-friendly:
+
+* We've added environment arguments that allow for the modification of reference signal parameters.
+* System parameters can now be individually tailored for each protein, instead of applying a uniform set of parameters across all proteins.
+* The reference can now be excluded from the observation if desired.
+* The reference error can be included in the 'info' dictionary for additional context.
+* The observation space was expanded to accurately reproduce the plots presented in [Han et al. 2020](https://arxiv.org/abs/2004.14288), which was not possible with the original code's observation space.
+* Introduced an adjustable `max_cost` threshold for terminating episodes, defaulting to $\infty$ for consistency with the original environment.
 
 ## Observation space