Implementation of Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
Added another branch for Soft Actor-Critic Algorithms and Applications -> SAC_V1.
Soft Q-Learning uses the following objective function instead of the conventional expected cumulative return:
The entropy term is also maximized which have two major benefits:- The exploration will be intelligently tuned and maximized as much as need, so the exploration/exploitation trade off is well satisfied.
- It prevent the learning procedure to get stuck in a local optima which results to a suboptimal policy.
Humanoid-v2 | Walker2d-v2 | Hopper-v2 |
---|---|---|
Humanoid-v2 | Walker2d-v2 | Hopper-v2 |
---|---|---|
- gym == 0.17.2
- mujoco-py == 2.0.2.13
- numpy == 1.19.1
- psutil == 5.4.2
- torch == 1.4.0
pip3 install -r requirements.txt
python3 main.py
- You may use
Train
flag to specify whether to train your agent when it isTrue
or test it when the flag isFalse
. - There are some pre-trained weights in pre-trained models dir, you can test the agent by using them; put them on the root folder of the project and turn
Train
flag toFalse
.
- Humanoid-v2
- Hopper-v2
- Walker2d-v2
- HalfCheetah-v2
- Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, Haarnoja et al., 2018
- Soft Actor-Critic Algorithms and Applications, Haarnoja et al., 2018
All credits goes to @pranz24 for his brilliant Pytorch implementation of SAC.
Special thanks to @p-christ for SAC.py