Skip to content

Commit

Permalink
fix(oscillator): fix minor oscillator environments bugs (#360)
Browse files Browse the repository at this point in the history
This commit fixes several errors that were found in the translation of
the oscillator environments found in the paper of [Han et al.
2020](https://arxiv.org/abs/2004.14288):
- Action gain was increased to adhere to the article.
- Action space was decreased to adhere to the article.
- Observation space was increased to adhere to the article.
- Reward range was made variable.
  • Loading branch information
rickstaa authored Feb 1, 2024
1 parent df9d1b1 commit 1f928d7
Show file tree
Hide file tree
Showing 6 changed files with 230 additions and 160 deletions.
9 changes: 8 additions & 1 deletion stable_gym/envs/biological/oscillator/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,13 @@
# Oscillator gymnasium environment

Check failure on line 1 in stable_gym/envs/biological/oscillator/README.md

View workflow job for this annotation

GitHub Actions / remark-lint

[remark-lint] stable_gym/envs/biological/oscillator/README.md#L1

TypeError: Cannot read properties of undefined (reading 'mathFlowInside')
Raw output
   1:1  error    TypeError: Cannot read properties of undefined (reading 'mathFlowInside')

A gymnasium environment for a synthetic oscillatory network of transcriptional regulators called a repressilator. A repressilator is a three-gene regulatory network where the dynamics of mRNA and proteins follow an oscillatory behaviour. First presented by [Han et al. 2020](https://arxiv.org/abs/2004.14288).
A gymnasium environment for a synthetic oscillatory network of transcriptional regulators called a repressilator. A repressilator is a three-gene regulatory network where the dynamics of mRNA and proteins follow an oscillatory behaviour. First presented by [Han et al. 2020](https://arxiv.org/abs/2004.14288). Compared to the original implementation, our version introduces several enhancements to the environment, making it more flexible and user-friendly:

* We've added environment arguments that allow for the modification of reference signal parameters.
* System parameters can now be individually tailored for each protein, instead of applying a uniform set of parameters across all proteins.
* The reference can now be excluded from the observation if desired.
* The reference error can be included in the 'info' dictionary for additional context.
* The observation space was expanded to accurately reproduce the plots presented in [Han et al. 2020](https://arxiv.org/abs/2004.14288), which was not possible with the original code's observation space.
* Introduced an adjustable `max_cost` variable for terminating episodes, defaulting to $100$ for consistency with the original environment.

## Observation space

Expand Down
88 changes: 58 additions & 30 deletions stable_gym/envs/biological/oscillator/oscillator.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@

# TODO: Update solving criteria after training.
class Oscillator(gym.Env):
"""Synthetic oscillatory network environment.
r"""Synthetic oscillatory network environment.
.. Note::
Can also be used in a vectorized manner. See the
Expand All @@ -23,7 +23,20 @@ class Oscillator(gym.Env):
Source:
This environment corresponds to the Oscillator environment used in the paper
`Han et al. 2020`_.
`Han et al. 2020`_. In our implementation several additional features were added
to the environment to make it more flexible and easier to use:
- Environment arguments now allow for modification of the reference signal
parameters.
- System parameters can now be individually adjusted for each protein,
rather than applying the same parameters across all proteins.
- The reference can be omitted from the observation.
- Reference error can be included in the info dictionary.
- The observation space was expanded to accurately reproduce the plots
presented in `Han et al. 2020`_, which was not possible with the original
code's observation space.
- Added an adjustable ``max_cost`` threshold for episode termination,
defaulting to $100$ to match the original environment.
.. _`Han et al. 2020`: https://arxiv.org/abs/2004.14288
Expand All @@ -33,24 +46,24 @@ class Oscillator(gym.Env):
+-----+-----------------------------------------------+-------------------+-------------------+
| Num | Observation | Min | Max |
+=====+===============================================+===================+===================+
| 0 | Lacl mRNA transcripts concentration | 0 | 100 |
| 0 | Lacl mRNA transcripts concentration | 0 | $\infty$ |
+-----+-----------------------------------------------+-------------------+-------------------+
| 1 | tetR mRNA transcripts concentration | 0 | 100 |
| 1 | tetR mRNA transcripts concentration | 0 | $\infty$ |
+-----+-----------------------------------------------+-------------------+-------------------+
| 2 | CI mRNA transcripts concentration | 0 | 100 |
| 2 | CI mRNA transcripts concentration | 0 | $\infty$ |
+-----+-----------------------------------------------+-------------------+-------------------+
| 3 || lacI (repressor) protein concentration | 0 | 100 |
| 3 || lacI (repressor) protein concentration | 0 | $\infty$ |
| || (Inhibits transcription of the tetR gene) | | |
+-----+-----------------------------------------------+-------------------+-------------------+
| 4 || tetR (repressor) protein concentration | 0 | 100 |
| 4 || tetR (repressor) protein concentration | 0 | $\infty$ |
| || (Inhibits transcription of CI gene) | | |
+-----+-----------------------------------------------+-------------------+-------------------+
| 5 || CI (repressor) protein concentration | 0 | 100 |
| 5 || CI (repressor) protein concentration | 0 | $\infty$ |
| || (Inhibits transcription of lacI gene) | | |
+-----+-----------------------------------------------+-------------------+-------------------+
| 6 | The reference we want to follow | 0 | 100 |
| 6 | The reference we want to follow | 0 | $\infty$ |
+-----+-----------------------------------------------+-------------------+-------------------+
| (7) || **Optional** - The error between the current | -100 | 100 |
| (7) || **Optional** - The error between the current | $-\infty$ | $\infty$ |
| || value of protein 1 and the reference | | |
+-----+-----------------------------------------------+-------------------+-------------------+
Expand All @@ -60,13 +73,13 @@ class Oscillator(gym.Env):
+-----+------------------------------------------------------------+---------+---------+
| Num | Action | Min | Max |
+=====+============================================================+=========+=========+
| 0 || Relative intensity of light signal that induce the | -5 | 5 |
| 0 || Relative intensity of light signal that induce the | 0 | 1 |
| || expression of the Lacl mRNA gene. | | |
+-----+------------------------------------------------------------+---------+---------+
| 1 || Relative intensity of light signal that induce the | -5 | 5 |
| 1 || Relative intensity of light signal that induce the | 0 | 1 |
| || expression of the tetR mRNA gene. | | |
+-----+------------------------------------------------------------+---------+---------+
| 2 || Relative intensity of light signal that induce the | -5 | 5 |
| 2 || Relative intensity of light signal that induce the | 0 | 1 |
| || expression of the CI mRNA gene. | | |
+-----+------------------------------------------------------------+---------+---------+
Expand All @@ -81,8 +94,8 @@ class Oscillator(gym.Env):
All observations are assigned a uniform random value in ``[0..5]``
Episode Termination:
- An episode is terminated when the maximum step limit is reached.
- The step cost is greater than 100.
- An episode is terminated when the maximum step limit is reached.
- The step exceeds a threshold (default is $100$). This threshold can be adjusted using the `max_cost` environment argument.
Solved Requirements:
Considered solved when the average cost is lower than 300.
Expand All @@ -102,18 +115,21 @@ class Oscillator(gym.Env):
t (float): The current time step.
dt (float): The environment step size. Also available as :attr:`.tau`.
sigma (float): The variance of the system noise.
max_cost (float): The maximum cost allowed before the episode is terminated.
""" # noqa: E501

def __init__(
self,
render_mode=None,
# NOTE: Custom environment arguments.
max_cost=100.0,
reference_target_position=8.0,
reference_amplitude=7.0,
reference_frequency=(1 / 200), # NOTE: Han et al. 2020 uses a period of 200.
reference_phase_shift=0.0,
clip_action=True,
exclude_reference_from_observation=False,
exclude_reference_error_from_observation=True,
exclude_reference_error_from_observation=False,
action_space_dtype=np.float64,
observation_space_dtype=np.float64,
):
Expand All @@ -122,6 +138,8 @@ def __init__(
Args:
render_mode (str, optional): The render mode you want to use. Defaults to
``None``. Not used in this environment.
max_cost (float, optional): The maximum cost allowed before the episode is
terminated. Defaults to ``100.0``.
reference_target_position: The reference target position, by default
``8.0`` (i.e. the mean of the reference signal).
reference_amplitude: The reference amplitude, by default ``7.0``.
Expand All @@ -132,13 +150,15 @@ def __init__(
exclude_reference_from_observation (bool, optional): Whether the reference
should be excluded from the observation. Defaults to ``False``.
exclude_reference_error_from_observation (bool, optional): Whether the error
should be excluded from the observation. Defaults to ``True``.
should be excluded from the observation. Defaults to ``False``.
action_space_dtype (union[numpy.dtype, str], optional): The data type of the
action space. Defaults to ``np.float64``.
observation_space_dtype (union[numpy.dtype, str], optional): The data type
of the observation space. Defaults to ``np.float64``.
"""
super().__init__()
assert max_cost > 0, "The maximum cost must be greater than 0."
self.max_cost = max_cost
self._action_clip_warning = False
self._clip_action = clip_action
self._exclude_reference_from_observation = exclude_reference_from_observation
Expand Down Expand Up @@ -187,9 +207,9 @@ def __init__(
self.c1 = 0.06 # Protein degradation rate p1.
self.c2 = 0.06 # Protein degradation rate p2.
self.c3 = 0.06 # Protein degradation rate p3.
self.b1 = 1.0 # Control input gain u1.
self.b2 = 1.0 # Control input gain u2.
self.b3 = 1.0 # Control input gain u3.
self.b1 = 5.0 # Control input gain u1.
self.b2 = 5.0 # Control input gain u2.
self.b3 = 5.0 # Control input gain u3.

# Set noise parameters.
# NOTE: Zero during training.
Expand All @@ -200,23 +220,31 @@ def __init__(
self.delta5 = 0.0 # p2 noise.
self.delta6 = 0.0 # p3 noise.

obs_low = np.array([0.0, 0.0, 0.0, 0.0, 0.0, 0.0])
obs_high = np.array([100.0, 100.0, 100.0, 100.0, 100.0, 100.0])
# NOTE: Observation space was changed compared to the original codebase of
# Han et al. 2020 to match paper's plots.
obs_low = np.array(
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
) # NOTE: Han's original code used -1.0.
obs_high = np.array(
[np.inf, np.inf, np.inf, np.inf, np.inf, np.inf]
) # NOTE: Han's original code used 1.0.
if not self._exclude_reference_from_observation:
obs_low = np.append(obs_low, 0.0)
obs_high = np.append(obs_high, 100.0)
obs_high = np.append(obs_high, np.inf)
if not self._exclude_reference_error_from_observation:
obs_low = np.append(obs_low, -100.0)
obs_high = np.append(obs_high, 100.0)
obs_low = np.append(obs_low, -np.inf)
obs_high = np.append(obs_high, np.inf)
# NOTE: Han et al. 2020 did not clearly detail the action space in their paper.
# As a result the action space from their original code is used.
self.action_space = spaces.Box(
low=np.array([-5.0, -5.0, -5.0]),
high=np.array([5.0, 5.0, 5.0]),
low=np.array([0.0, 0.0, 0.0]),
high=np.array([1.0, 1.0, 1.0]),
dtype=self._action_space_dtype,
)
self.observation_space = spaces.Box(
obs_low, obs_high, dtype=self._observation_space_dtype
)
self.reward_range = (0.0, 100.0)
self.reward_range = (0.0, self.max_cost)

self.viewer = None
self.state = None
Expand Down Expand Up @@ -445,8 +473,8 @@ def reset(
else self._init_state
)
self.t = 0.0
_, _, _, p1, _, _ = self.state.astype(self._observation_space_dtype)
obs = self.state.astype(self._observation_space_dtype)
p1 = obs[3]
r1 = self.reference(self.t).astype(self._observation_space_dtype)
if not self._exclude_reference_from_observation:
obs = np.append(obs, r1)
Expand Down Expand Up @@ -539,7 +567,7 @@ def physics_time(self):
reference.append(info["reference"])
print(f"\nPerforming '{EPISODES}' in the 'Oscillator' environment...\n")
print(f"Episode: {episode}")
while episode <= EPISODES:
while episode + 1 <= EPISODES:
action = (
env.action_space.sample()
if RANDOM_STEP
Expand Down
9 changes: 8 additions & 1 deletion stable_gym/envs/biological/oscillator_complicated/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,13 @@
# Oscillator Complicated gymnasium environment

Check failure on line 1 in stable_gym/envs/biological/oscillator_complicated/README.md

View workflow job for this annotation

GitHub Actions / remark-lint

[remark-lint] stable_gym/envs/biological/oscillator_complicated/README.md#L1

TypeError: Cannot read properties of undefined (reading 'mathFlowInside')
Raw output
   1:1  error    TypeError: Cannot read properties of undefined (reading 'mathFlowInside')

A more challenging (i.e. complicated) version of the [Oscillator environment](https://rickstaa.dev/stable-gym/envs/biological/oscillator.html). This version adds an extra 4th protein and its accompanying mRNA transcription concentration to the environment. The light signal of an additional action input induces the mRNA transcription of this extra protein.
A more challenging (i.e. complicated) version of the [Oscillator environment](https://rickstaa.dev/stable-gym/envs/biological/oscillator.html). This version adds an extra 4th protein and its accompanying mRNA transcription concentration to the environment. The light signal of an additional action input induces the mRNA transcription of this extra protein. First presented by [Han et al. 2020](https://arxiv.org/abs/2004.14288). Compared to the original implementation, our version introduces several enhancements to the environment, making it more flexible and user-friendly:

* We've added environment arguments that allow for the modification of reference signal parameters.
* System parameters can now be individually tailored for each protein, instead of applying a uniform set of parameters across all proteins.
* The reference can now be excluded from the observation if desired.
* The reference error can be included in the 'info' dictionary for additional context.
* The observation space was expanded to accurately reproduce the plots presented in [Han et al. 2020](https://arxiv.org/abs/2004.14288), which was not possible with the original code's observation space.
* Introduced an adjustable `max_cost` threshold for terminating episodes, defaulting to $\infty$ for consistency with the original environment.

## Observation space

Expand Down
Loading

0 comments on commit 1f928d7

Please sign in to comment.