A Gymnasium environment modelling Probabilistic Boolean Networks and Probabilistic Boolean Control Networks.
Probabilistic Boolean (Control) Networks are Boolean Networks where the logic functions for each node are switched stochastically according to a probability distribution. They were introduced by Shmulevich, Ilya, et al., 2002 and are used primarily to model Gene Regulatory Networks. As such, their control has applications in therapeutics, and specifically cancer treatment.
The control of Probabilistic Boolean (Control) Networks is a well studied problem in control theory. Recently, however, there has been promise on the application of Reinforcement Learning for control of said networks to certain attractor states as well Papagiannis, Georgios, et al., 2019.
This repository contains code used in our IEEE TCNS paper on control of large-scale PBNs and this Elsevier Information Sciences paper.
The point of this library is to provide accessible PB(C)N control MDPs as Gymnasium environments. The environments provided are fully Stable Baselines3-compatible.
gym-PBN/PBN-v0
: The base Probabilistic Boolean Network environment. Actions involve taking no action, or "flipping" the value of a node at the provided index.gym-PBN/PBCN-v0
: The base Probabilistic Boolean Control Network environment. Actions involve setting the control nodes to a certain value.gym-PBN/PBN-target-v0
: The base environment for so-called "target" control. This is the SSD-based control objective in our IEEE TCNS paper, where the goal is to increase the environment's state distribution to a more favourable one w.r.t. the expression of given nodes, and you can do so by perturbing a subset of the nodes (a single node in our case).gym-PBN/Bittner-X-v0
with X being either28
,70
,100
or200
: Instantiations of thePBN-target-v0
environment using gene data from Bittner et al. used to infer a PBN of size N=28,70,100,200 respectively.gym-PBN/PBN-sampled-data-v0
: A so-called "sampled-data control" (temporally abstract actions in conventional RL terms) problem setting. The agent takes an action constituting a tuple: the actual action, and the duration for this action in integer time-steps.gym-PBN/PBCN-sampled-data-v0
: The same as above but with a PBCN instead, and the actions are thus a tuple of values to set the control nodes to.gym-PBN/PBN-self-triggering-v0
: Same as above except the duration is a termination probability value. Thus, the action duration is stochastic. Perhaps more in line with the conventional options framework in RL.gym-PBN/PBCN-self-triggering-v0
: Same as above except the network is a PBCN.
The environments provide the framework for such networks. They need to be instantiated with appropriate node data.
Requirements: Python 3.9+.
PIP: python -m pip install gym-PBN
Custom network environments need to be parameterised with one of the following two configurations:
-
PBN_data
: list of tuples containing node information.- Each tuple should contain the following five variables, in order:
input_mask
: a boolean mask as a numPy array that indicates which nodes affect the node's value.function
: a truth table representing a boolean function over the nodes singled out by theinput_mask
. The truth table should have a tree-like shape of[2] * sum(input_mask)
, and the item at positionpos
should indicate the probability of the node taking the valueTrue
givenpos
as the state of the input nodes (which are singled out byinput_mask
.i
: the position of the node in the network's list of vertices.name
: string representing the name of the node. Could beNone
.is_control
: boolean flag on whether or not this is a control node (for PBCNs).
- Each tuple should contain the following five variables, in order:
-
logic_func_data
: Think setting all the previous information manually is a pain? We do too, so this is the more sane configuration option. Throughlogic_func_data
, you can pass in just the names of the nodes, and the associated logic functions (with their probability of activating) and the constructor will do the rest.logic_func_data
is a tuple. The tuple contains:node_names
: a list of string literals representing each node in the network. Make sure to put control nodes at the start of the list.logic_funcs
: a list of logic functions for each node. Each inner list (associated with the node in the corresponding position in the node names) contains tuples describing logic functions and probabilities of activating for this node. This is modelled as follows:logic_expr
: the logic expression representing the logic function for the tuple. You can use literals that appear in thenode_names
list and Python boolean operatorsand
,not
,or
.probability
: a float representing the probability of this function activating.
You can view an example of this second configuration over at example.py.
Another thing you can configure for your own network is the actual control target. When not provided explicitly, the environment calculates the attractors for the environment and selects the last one as the target. However, especially for PBCNs, we encourage you to provide it explicitly. To do so, provide a goal_config
argument to the environment instantiation, with the following information:
"all_attractors"
: list of all the attractors present in the PB(C)N. This should be a list of sets. Each set should contain tuples that represent the attractor states if it's a cyclic attractor, or a single tuple representing the equilibrium point if it's a single-state attractor."target"
: the target attractor, out of the list of attractors. Naturally this is a set of tuples, or just one tuple in the equilibrium point case.
The final thing you can configure without modifying the environment is the actual reward (and cost) values for the reward function. Simply provide a reward_config
argument to the environment instantiation, with the following information:
"successful_reward"
: integer reward given for actions that transition into the target attractor. Defaults to5
. Recommended:> 2
."wrong_attractor_cost"
: integer cost associated to actions that transition into an undesired attractor. Defaults to2
. This is applied for every attractor that the new state hits (sometimes it's multiple)."action_cost"
: integer cost associated to actions being taken. Defaults to1
, to discourage the agent from intervening too often.
The majority of the work for the implementation of Probabilistic Boolean Networks in Python can be attributed to Vytenis Šliogeris and his PBN_env package. In fact he implemented the prototype version of gym-PBN
some time ago.
Evangelos Chatzaroulas finished the adaptation to Gymnasium and implemented PB(C)N support. He is currently the primary maintainer.