Implementations of reinforcement learning algorithms, including Deep Q-Learning
Note: this project is in development and lacks full documentation. More algorithms will be added in the future, and they will be presented pedagogically. Fuller READMEs will also be provided.
toy_text contains warm-up reinforcement learning environments. The small state space allows classic Q-Learning algorithms to succeed in solving the environments.
classic_control contains environments with (in-principle) continuous state spaces and/or action spaces, requiring the use of more sophisticated methods to solve.
-
cartpole features a continuous state space and a discrete (indeed, two-valued) action space. To handle the continuous state space, we use a Deep Q-Network (Mnih et al. (2016)) featuring an experience replay memory. To improve training stability, we use both a policy network and a target network. Instead of copying the policy network to the target network after a constant number of steps, we have the target network track the policy network by a factor of
$\tau = 0.005$ (Lillicrap et al. (2016)). - acrobot is similar to cartpole and the same DQN suffices to solve the environment (albeit requiring about half as many training episodes).
-
mountain_car_continuous is significantly more challenging than both cartpole and acrobot, as it features both a continuous state space and a continuous action space. To solve this environment, the use of actor-critic methods is required in addition to the aforementioned sophistications (Lillicrap et al. (2016), Silver et al. (2014)).
- mountain_car is like mountain_car_continuous, except that it has a discrete action space. The continuous state space can be reasonably approximated using discretization methods, allowing us to use traditional Q-learning methods.
To run one of the agents, first run the env_main.py
file, to train the agent, then run the env_testing.py
file to observe the agent's performance. For example, run cartpole_main.py
, then cartpole_testing.py
. The env_main.py
file saves the trained model in the corresponding saved_models
directory.