Name	Name	Last commit message	Last commit date
parent directory ..
benchmark	benchmark
cleanrl	cleanrl
cleanrl_utils	cleanrl_utils
cloud	cloud
docs	docs
requirements	requirements
src	src
tests	tests
tuners	tuners
.dockerignore	.dockerignore
.gitignore	.gitignore
.gitpod.Dockerfile	.gitpod.Dockerfile
.gitpod.yml	.gitpod.yml
CONTRIBUTING.md	CONTRIBUTING.md
Dockerfile	Dockerfile
LICENSE	LICENSE
README.md	README.md
entrypoint.sh	entrypoint.sh
mkdocs.yml	mkdocs.yml
poetry.lock	poetry.lock
pyproject.toml	pyproject.toml
tuner_example.py	tuner_example.py
tuner_meow_ant.py	tuner_meow_ant.py
tuner_meow_halfcheetah.py	tuner_meow_halfcheetah.py
tuner_meow_hopper.py	tuner_meow_hopper.py
tuner_meow_humanoid.py	tuner_meow_humanoid.py
tuner_meow_walker.py	tuner_meow_walker.py

MEow in CleanRL Implementation

This folder contains the Clean RL implementation of MEow. CleanRL is a library that features single-file implementation of several widely-used online reinforcement learning algorithms. More details about CleanRL is provided in their documentation.

Install Dependencies

Launch a docker image through the following commands:

# assume the current directory is the root of this repository
docker run --rm -it --gpus all --ipc=host -v ${PWD}:/app nvcr.io/nvidia/pytorch:20.12-py3
# inside the docker container, run:
cd /app

Install the dependencies using the following commands:

# install the default dependencies
pip install --ignore-installed PyYAML
pip install -r requirements/requirements.txt --use-feature=2020-resolver
pip install -U gymnasium --use-feature=2020-resolver
pip install -r requirements/requirements-mujoco.txt

# install the ray tune package for parallelizable training
pip install ray[tune]

It's okay to have the following error messages:

ERROR: torchvision 0.9.0a0 requires torch==1.8.0a0+1606899, but you'll have torch 1.12.1 which is incompatible.
ERROR: stable-baselines3 2.0.0 requires gymnasium==0.28.1, but you'll have gymnasium 0.29.1 which is incompatible.

NOTE: For more details or assistance with installation, please refer to Clean RL's documentation.

Training Commands

(Optional) Example

Use the following command to train PPO on the CartPole-v1 environment:

python cleanrl/ppo.py --seed 1 --env-id CartPole-v1 --total-timesteps 50000

NOTE: The results will be saved in the ./runs directory. NOTE: Launch Tensorboard to verify if the training was successful.

MEow

Single Run on a GPU

# Hopper-v4
python cleanrl/meow_continuous_action.py --seed 1 --env-id Hopper-v4 --total-timesteps 1500000 --tau 0.005 --alpha 0.25 --learning-starts 5000 --sigma-max -0.3 --sigma-min -5.0 --deterministic_action
# HalfCheetah-v4
python cleanrl/meow_continuous_action.py --seed 1 --env-id HalfCheetah-v4 --total-timesteps 1500000 --tau 0.003 --alpha 0.25 --learning_starts 10000 --sigma_max 2.0 --sigma_min -5.0 --deterministic_action
# Walker2d-v4
python cleanrl/meow_continuous_action.py --seed 1 --env-id Walker2d-v4 --total-timesteps 4000000 --tau 0.005 --alpha 0.1 --learning-starts 10000 --sigma-max -0.3 --sigma-min -5.0 --deterministic_action
# Ant-v4
python cleanrl/meow_continuous_action.py --seed 1 --env-id Ant-v4 --total-timesteps 4000000 --tau 0.0001 --alpha 0.05 --learning_starts 5000 --sigma_max -0.3 --sigma_min -5.0 --deterministic_action
# Humanoid-v4
python cleanrl/meow_continuous_action.py --seed 1 --env-id Humanoid-v4 --total-timesteps 5000000 --tau 0.0005 --alpha 0.125 --learning_starts 5000 --sigma_max -0.3 --sigma_min -5.0 --deterministic_action

Parallelizable Training using Ray Tune

# Hopper-v4
python tuner_meow_hopper.py
# HalfCheetah-v4
python tuner_meow_halfcheetah.py
# Walker2d-v4
python tuner_meow_walker.py
# Ant-v4
python tuner_meow_ant.py
# Humanoid-v4
python tuner_meow_humanoid.py

NOTE: Adjust the gpu setting in resources_per_trial on line 63 of the tuner code to modify the throughput. For example, setting 'gpu': 0.25 allows for the simultaneous training of four different runs.

Evaluation Results

Most of the evaluation results presented in the paper can be found here. Read our instruction for reproducing the plots presented in our paper. The results of the baseline algorithms (i.e., SAC, DDPG, TD3, PPO) can be reproduced using Stable Baseline 3 with its refined hyperparameters (i.e., RL Baselines3 Zoo). The results of the SQL algorithm can be reproduced via ChienFeng-hub/softqlearning-pytorch.

References

This code implementation is developed based on the following repositories:

vwxyzjn/cleanrl (at commit 8cbca61) is licensed under the MIT License.
VincentStimper/normalizing-flows (at commit 848277e) is licensed under the MIT License.

Cite this Repository

If you find this repository useful, please consider citing our paper:

@inproceedings{chao2024maximum,
    title={Maximum Entropy Reinforcement Learning via Energy-Based Normalizing Flow},
    author={Chao, Chen-Hao and Feng, Chien and Sun, Wei-Fang and Lee, Cheng-Kuang and See, Simon and Lee, Chun-Yi},
    booktitle={Proceedings of the International Conference on Neural Information Processing Systems (NeurIPS)},
    year={2024}
}

Contributors of the Code Implementation

Visit our GitHub pages by clicking the images above.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cleanrl

cleanrl

README.md

MEow in CleanRL Implementation

Install Dependencies

Training Commands

(Optional) Example

MEow

Single Run on a GPU

Parallelizable Training using Ray Tune

Evaluation Results

References

Cite this Repository

Contributors of the Code Implementation

Files

cleanrl

Directory actions

More options

Directory actions

More options

Latest commit

History

cleanrl

Folders and files

parent directory

README.md

MEow in CleanRL Implementation

Install Dependencies

Training Commands

(Optional) Example

MEow

Single Run on a GPU

Parallelizable Training using Ray Tune

Evaluation Results

References

Cite this Repository

Contributors of the Code Implementation