Skip to content

Commit

Permalink
feat: replace OpenAi gym with Gymnasium (#255)
Browse files Browse the repository at this point in the history
This commit replaces the (unmaintained) [OpenAi gym](https://github.com/openai/gym) package with the
new [Gymnasium](https://github.com/Farama-Foundation/Gymnasium) package.

BREAKING CHANGE: This package now depends on Gymnasium instead of Gym. Furthermore, it also requires
Gymnasium>=26 (see https://gymnasium.farama.org/content/migration-guide/) for more information.
  • Loading branch information
rickstaa authored Jun 22, 2023
1 parent 2a71272 commit 9873a03
Show file tree
Hide file tree
Showing 38 changed files with 274 additions and 274 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@

Welcome to the Bayesian Learning Control (BLC) framework! The Bayesian Learning Control framework enables you to automatically create, train and deploy various safe (stable and robust) Reinforcement Learning (RL) and Imitation learning (IL) control algorithms directly from real-world data. This framework is made up of four main modules:

* [Modeling](./bayesian_learning_control/modeling): Module that uses state of the art System Identification and State Estimation techniques to create an Openai gym environment out of real data.
* [Control](./bayesian_learning_control/control): Module used to train several [Bayesian Learning Control](https://rickstaa.github.io/bayesian-learning-control/control/control.html) RL/IL agents on the built gym environments.
* [Modeling](./bayesian_learning_control/modeling): Module that uses state of the art System Identification and State Estimation techniques to create an [gymnasium environment](https://gymnasium.farama.org/) out of real data.
* [Control](./bayesian_learning_control/control): Module used to train several [Bayesian Learning Control](https://rickstaa.github.io/bayesian-learning-control/control/control.html) RL/IL agents on the built [gymnasium](https://gymnasium.farama.org/) environments.
* [Hardware](./bayesian_learning_control/hardware): Module that can be used to deploy the trained RL/IL agents onto the hardware of your choice.

This framework follows a code structure similar to the [Spinningup](https://spinningup.openai.com/en/latest/) educational package. By doing this, we hope to make it easier for new researchers to get started with our Algorithms. If you are new to RL, you are therefore highly encouraged first to check out the SpinningUp documentation and play with before diving into our codebase. Our implementation sometimes deviates from the [Spinningup](https://spinningup.openai.com/en/latest/) version to increase code maintainability, extensibility and readability.
Expand Down
2 changes: 1 addition & 1 deletion bayesian_learning_control/control/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ The following algorithms are implemented in the Bayesian Learning Control packag
* [Soft Actor-Critic (SAC)](https://rickstaa.github.io/bayesian-learning-control/control/algorithms/sac.html)
* [Lyapunov Actor-Critic (LAC)](https://rickstaa.github.io/bayesian-learning-control/control/algorithms/lac.html)

They are all implemented with [MLP](https://en.wikipedia.org/wiki/Multilayer_perceptron) (non-recurrent) actor-critics, making them suitable for fully-observed, non-image-based RL environments, e.g. the [Gym Mujoco](https://gym.openai.com/envs/#mujoco) environments.
They are all implemented with [MLP](https://en.wikipedia.org/wiki/Multilayer_perceptron) (non-recurrent) actor-critics, making them suitable for fully-observed, non-image-based RL environments, e.g. the [gymnasium Mujoco](https://gymnasium.farama.org/environments/mujoco/) environments.

Bayesian Learning Control has two implementations for each algorithm: one that uses [PyTorch](https://pytorch.org/) as the neural network library, and one that uses [Tensorflow v2](https://www.tensorflow.org/) as the neural network library. The default backend is [Pytorch](https://pytorch.org). Please run the `pip install .[tf]` command if you want to use
the [Tensorflow v2](https://www.tensorflow.org/) implementations.
78 changes: 41 additions & 37 deletions bayesian_learning_control/control/algos/pytorch/lac/lac.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
from copy import deepcopy
from pathlib import Path

import gym
import gymnasium as gym
import numpy as np
import torch
import torch.nn as nn
Expand Down Expand Up @@ -121,10 +121,10 @@ def __init__( # noqa: C901
"""Lyapunov (soft) Actor-Critic (LAC)
Args:
env (:obj:`gym.env`): The gym environment the LAC is training in. This is
env (:obj:`gym.env`): The gymnasium environment the LAC is training in. This is
used to retrieve the activation and observation space dimensions. This
is used while creating the network sizes. The environment must satisfy
the OpenAI Gym API.
the gymnasium API.
actor_critic (torch.nn.Module, optional): The constructor method for a
Torch Module with an ``act`` method, a ``pi`` module and several
``Q`` or ``L`` modules. The ``act`` method and ``pi`` module should
Expand Down Expand Up @@ -210,10 +210,10 @@ def __init__( # noqa: C901
k: v for k, v in locals().items() if k not in ["self", "__class__", "env"]
}

# Validate gym env
# Validate gymnasium env
# NOTE: The current implementation only works with continuous spaces.
if not is_gym_env(env):
raise ValueError("Env must be a valid Gym environment.")
raise ValueError("Env must be a valid gymnasium environment.")
if is_discrete_space(env.action_space) or is_discrete_space(
env.observation_space
):
Expand Down Expand Up @@ -829,7 +829,7 @@ def lac( # noqa: C901
Args:
env_fn: A function which creates a copy of the environment.
The environment must satisfy the OpenAI Gym API.
The environment must satisfy the gymnasium API.
actor_critic (torch.nn.Module, optional): The constructor method for a
Torch Module with an ``act`` method, a ``pi`` module and several
``Q`` or ``L`` modules. The ``act`` method and ``pi`` module should
Expand Down Expand Up @@ -956,6 +956,32 @@ def lac( # noqa: C901

validate_args(**locals())

env = env_fn()

# Validate gymnasium env
# NOTE: The current implementation only works with continuous spaces.
if not is_gym_env(env):
raise ValueError("Env must be a valid gymnasium environment.")
if is_discrete_space(env.action_space) or is_discrete_space(env.observation_space):
raise NotImplementedError(
"The LAC algorithm does not yet support discrete observation/action "
"spaces. Please open a feature/pull request on "
"https://github.com/rickstaa/bayesian-learning-control/issues if you "
"need this."
)

env = gym.wrappers.FlattenObservation(
env
) # NOTE: Done to make sure the alg works with dict observation spaces
if num_test_episodes != 0:
test_env = env_fn()
test_env = gym.wrappers.FlattenObservation(test_env)
obs_dim = env.observation_space.shape
act_dim = env.action_space.shape[0]
rew_dim = (
env.reward_range.shape[0] if isinstance(env.reward_range, gym.spaces.Box) else 1
)

logger_kwargs["verbose_vars"] = (
logger_kwargs["verbose_vars"]
if (
Expand All @@ -980,19 +1006,6 @@ def lac( # noqa: C901
} # Retrieve hyperparameters (Ignore logger object)
logger.save_config(hyper_paramet_dict) # Write hyperparameters to logger

env = env_fn()
env = gym.wrappers.FlattenObservation(
env
) # NOTE: Done to make sure the alg works with dict observation spaces
if num_test_episodes != 0:
test_env = env_fn()
test_env = gym.wrappers.FlattenObservation(test_env)
obs_dim = env.observation_space.shape
act_dim = env.action_space.shape[0]
rew_dim = (
env.reward_range.shape[0] if isinstance(env.reward_range, gym.spaces.Box) else 1
)

# Retrieve max episode length
if max_ep_len is None:
max_ep_len = env.env._max_episode_steps
Expand Down Expand Up @@ -1020,9 +1033,6 @@ def lac( # noqa: C901
torch.manual_seed(seed)
np.random.seed(seed)
random.seed(seed)
env.seed(seed)
if num_test_episodes != 0:
test_env.seed(seed)

policy = LAC(
env,
Expand Down Expand Up @@ -1134,7 +1144,8 @@ def lac( # noqa: C901

# Main loop: collect experience in env and update/log each epoch
start_time = time.time()
o, ep_ret, ep_len = env.reset(), 0, 0
o, _ = env.reset()
ep_ret, ep_len = 0, 0
for t in range(total_steps):
# Until start_steps have elapsed, randomly sample actions
# from a uniform distribution for better exploration. Afterwards,
Expand All @@ -1145,24 +1156,20 @@ def lac( # noqa: C901
a = env.action_space.sample()

# Take step in the env
o_, r, d, _ = env.step(a)
o_, r, d, truncated, _ = env.step(a)
ep_ret += r
ep_len += 1

# Ignore the "done" signal if it comes from hitting the time
# horizon (that is, when it's an artificial terminal signal
# that isn't based on the agent's state)
d = False if ep_len == max_ep_len else d

replay_buffer.store(o, a, r, o_, d)

# Make sure to update most recent observation!
o = o_

# End of trajectory handling
if d or (ep_len == max_ep_len):
if d or truncated:
logger.store(EpRet=ep_ret, EpLen=ep_len)
o, ep_ret, ep_len = env.reset(), 0, 0
o, _ = env.reset()
ep_ret, ep_len = 0, 0

# Update handling
if (t + 1) >= update_after and ((t + 1) - update_after) % update_every == 0:
Expand Down Expand Up @@ -1316,18 +1323,15 @@ def lac( # noqa: C901


if __name__ == "__main__":
# NOTE: You can import your custom gym environment here.
# import stable_gym # noqa: F401

parser = argparse.ArgumentParser(
description="Trains a LAC agent in a given environment."
)
parser.add_argument(
"--env",
type=str,
default="Oscillator-v1",
help="the gym env (default: Oscillator-v1)",
)
default="stable_gym:Oscillator-v1",
help="the gymnasium env (default: stable_gym:Oscillator-v1)",
) # NOTE: Ensure the environment is installed in the current python environment.
parser.add_argument(
"--hid_a",
type=int,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,8 +48,8 @@ def __init__(
network object.
Args:
observation_space (:obj:`gym.space.box.Box`): A gym observation space.
action_space (:obj:`gym.space.box.Box`): A gym action space.
observation_space (:obj:`gym.space.box.Box`): A gymnasium observation space.
action_space (:obj:`gym.space.box.Box`): A gymnasium action space.
hidden_sizes (Union[dict, tuple, list], optional): Sizes of the hidden
layers for the actor. Defaults to ``(256, 256)``.
activation (Union[:obj:`dict`, :obj:`torch.nn.modules.activation`], optional):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,8 @@ def __init__(
object.
Args:
observation_space (:obj:`gym.space.box.Box`): A gym observation space.
action_space (:obj:`gym.space.box.Box`): A gym action space.
observation_space (:obj:`gym.space.box.Box`): A gymnasium observation space.
action_space (:obj:`gym.space.box.Box`): A gymnasium action space.
hidden_sizes (Union[dict, tuple, list], optional): Sizes of the hidden
layers for the actor. Defaults to ``(256, 256)``.
activation (Union[:obj:`dict`, :obj:`torch.nn.modules.activation`], optional):
Expand Down
78 changes: 41 additions & 37 deletions bayesian_learning_control/control/algos/pytorch/sac/sac.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
from copy import deepcopy
from pathlib import Path

import gym
import gymnasium as gym
import numpy as np
import torch
import torch.nn as nn
Expand Down Expand Up @@ -118,10 +118,10 @@ def __init__( # noqa: C901
"""Soft Actor-Critic (SAC)
Args:
env (:obj:`gym.env`): The gym environment the SAC is training in. This is
env (:obj:`gym.env`): The gymnasium environment the SAC is training in. This is
used to retrieve the activation and observation space dimensions. This
is used while creating the network sizes. The environment must satisfy
the OpenAI Gym API.
the gymnasium API.
actor_critic (torch.nn.Module, optional): The constructor method for a
Torch Module with an ``act`` method, a ``pi`` module and several
``Q`` or ``L`` modules. The ``act`` method and ``pi`` module should
Expand Down Expand Up @@ -203,10 +203,10 @@ def __init__( # noqa: C901
k: v for k, v in locals().items() if k not in ["self", "__class__", "env"]
}

# Validate gym env
# Validate gymnasium env
# NOTE: The current implementation only works with continuous spaces.
if not is_gym_env(env):
raise ValueError("Env must be a valid Gym environment.")
raise ValueError("Env must be a valid gymnasium environment.")
if is_discrete_space(env.action_space) or is_discrete_space(
env.observation_space
):
Expand Down Expand Up @@ -770,7 +770,7 @@ def sac( # noqa: C901
Args:
env_fn: A function which creates a copy of the environment.
The environment must satisfy the OpenAI Gym API.
The environment must satisfy the gymnasium API.
actor_critic (torch.nn.Module, optional): The constructor method for a
Torch Module with an ``act`` method, a ``pi`` module and several
``Q`` or ``L`` modules. The ``act`` method and ``pi`` module should
Expand Down Expand Up @@ -893,6 +893,32 @@ def sac( # noqa: C901

validate_args(**locals())

env = env_fn()

# Validate gymnasium env
# NOTE: The current implementation only works with continuous spaces.
if not is_gym_env(env):
raise ValueError("Env must be a valid gymnasium environment.")
if is_discrete_space(env.action_space) or is_discrete_space(env.observation_space):
raise NotImplementedError(
"The SAC algorithm does not yet support discrete observation/action "
"spaces. Please open a feature/pull request on "
"https://github.com/rickstaa/bayesian-learning-control/issues if you "
"need this."
)

env = gym.wrappers.FlattenObservation(
env
) # NOTE: Done to make sure the alg works with dict observation spaces
if num_test_episodes != 0:
test_env = env_fn()
test_env = gym.wrappers.FlattenObservation(test_env)
obs_dim = env.observation_space.shape
act_dim = env.action_space.shape[0]
rew_dim = (
env.reward_range.shape[0] if isinstance(env.reward_range, gym.spaces.Box) else 1
)

logger_kwargs["verbose_vars"] = (
logger_kwargs["verbose_vars"]
if (
Expand All @@ -917,19 +943,6 @@ def sac( # noqa: C901
} # Retrieve hyperparameters (Ignore logger object)
logger.save_config(hyper_paramet_dict) # Write hyperparameters to logger

env = env_fn()
env = gym.wrappers.FlattenObservation(
env
) # NOTE: Done to make sure the alg works with dict observation spaces
if num_test_episodes != 0:
test_env = env_fn()
test_env = gym.wrappers.FlattenObservation(test_env)
obs_dim = env.observation_space.shape
act_dim = env.action_space.shape[0]
rew_dim = (
env.reward_range.shape[0] if isinstance(env.reward_range, gym.spaces.Box) else 1
)

# Retrieve max episode length
if max_ep_len is None:
max_ep_len = env.env._max_episode_steps
Expand Down Expand Up @@ -957,9 +970,6 @@ def sac( # noqa: C901
torch.manual_seed(seed)
np.random.seed(seed)
random.seed(seed)
env.seed(seed)
if num_test_episodes != 0:
test_env.seed(seed)

policy = SAC(
env,
Expand Down Expand Up @@ -1052,7 +1062,8 @@ def sac( # noqa: C901

# Main loop: collect experience in env and update/log each epoch
start_time = time.time()
o, ep_ret, ep_len = env.reset(), 0, 0
o, _ = env.reset()
ep_ret, ep_len = 0, 0
for t in range(total_steps):
# Until start_steps have elapsed, randomly sample actions
# from a uniform distribution for better exploration. Afterwards,
Expand All @@ -1063,24 +1074,20 @@ def sac( # noqa: C901
a = env.action_space.sample()

# Take step in the env
o_, r, d, _ = env.step(a)
o_, r, d, truncated, _ = env.step(a)
ep_ret += r
ep_len += 1

# Ignore the "done" signal if it comes from hitting the time
# horizon (that is, when it's an artificial terminal signal
# that isn't based on the agent's state)
d = False if ep_len == max_ep_len else d

replay_buffer.store(o, a, r, o_, d)

# Make sure to update most recent observation!
o = o_

# End of trajectory handling
if d or (ep_len == max_ep_len):
if d or truncated:
logger.store(EpRet=ep_ret, EpLen=ep_len)
o, ep_ret, ep_len = env.reset(), 0, 0
o, _ = env.reset()
ep_ret, ep_len = 0, 0

# Update handling
if (t + 1) >= update_after and ((t + 1) - update_after) % update_every == 0:
Expand Down Expand Up @@ -1219,18 +1226,15 @@ def sac( # noqa: C901


if __name__ == "__main__":
# NOTE: You can import your custom gym environment here.
# import stable_gym # noqa: F401

parser = argparse.ArgumentParser(
description="Trains a SAC agent in a given environment."
)
parser.add_argument(
"--env",
type=str,
default="Oscillator-v1",
help="the gym env (default: Oscillator-v1)",
)
default="stable_gym:Oscillator-v1",
help="the gymnasium env (default: stable_gym:Oscillator-v1)",
) # NOTE: Ensure the environment is installed in the current python environment.
parser.add_argument(
"--hid_a",
type=int,
Expand Down
Loading

0 comments on commit 9873a03

Please sign in to comment.