From ec07a5fce86edb4014dbc9a73387b12d9e895591 Mon Sep 17 00:00:00 2001 From: Anji Liu Date: Mon, 18 Nov 2019 14:18:19 -0800 Subject: [PATCH] Update README.md --- README.md | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/README.md b/README.md index c73cd29..fdf967a 100644 --- a/README.md +++ b/README.md @@ -49,19 +49,19 @@ The breakdown of time consumption (tested with 16 expansion and simulation worke 1. Download or clone the repository. 2. Run with the default settings: ``` - python3 main.py --model P-UCT + python3 main.py --model WU-UCT ``` -3. For additional hyperparameters please have a look at [main.py](https://github.com/liuanji/P-UCT/tree/master/main.py) (they are also listed below), where descriptions are also included. For example, if you want to run the game PongNoFrameskip-v0 with 200 MCTS rollouts, simply run: +3. For additional hyperparameters please have a look at [main.py](https://github.com/liuanji/WU-UCT/tree/master/main.py) (they are also listed below), where descriptions are also included. For example, if you want to run the game PongNoFrameskip-v0 with 200 MCTS rollouts, simply run: ``` - python3 main.py --model P-UCT --env-name PongNoFrameskip-v0 --MCTS-max-steps 200 + python3 main.py --model WU-UCT --env-name PongNoFrameskip-v0 --MCTS-max-steps 200 ``` or if you want to record the video of gameplay, run: ``` - python3 main.py --model P-UCT --env-name PongNoFrameskip-v0 --record-video + python3 main.py --model WU-UCT --env-name PongNoFrameskip-v0 --record-video ``` * A full list of parameters - * --model: MCTS model to use (currently support P-UCT and UCT). + * --model: MCTS model to use (currently support WU-UCT and UCT). * --env-name: name of the environment. * --MCTS-max-steps: number of simulation steps in the planning phase. * --MCTS-max-depth: maximum planning depth. @@ -77,24 +77,24 @@ or if you want to record the video of gameplay, run: * --mode: MCTS or Distill, see [Planning with prior policy](#Planning-with-prior-policy). ### Planning with prior policy -The code currently support three default policies (policy used to perform simulation): *Random*, *PPO*, *DistillPPO* (to use them, change the “--policy” parameter). To use the *PPO* and *DistillPPO* policy, corresponding policy files need to be put in [./Policy/PPO/PolicyFiles](https://github.com/liuanji/P-UCT/tree/master/Policy/PPO/PolicyFiles). PPO policy files can be generated by [Atari_PPO_training](https://github.com/liuanji/P-UCT/tree/master/Utils/Atari_PPO_training). For example, by running +The code currently support three default policies (policy used to perform simulation): *Random*, *PPO*, *DistillPPO* (to use them, change the “--policy” parameter). To use the *PPO* and *DistillPPO* policy, corresponding policy files need to be put in [./Policy/PPO/PolicyFiles](https://github.com/liuanji/WU-UCT/tree/master/Policy/PPO/PolicyFiles). PPO policy files can be generated by [Atari_PPO_training](https://github.com/liuanji/WU-UCT/tree/master/Utils/Atari_PPO_training). For example, by running ``` cd Utils/Atari_PPO_training python3 main.py PongNoFrameskip-v0 ``` -a policy file will be generated in [./Utils/Atari_PPO_training/save](https://github.com/liuanji/P-UCT/tree/master/Utils/Atari_PPO_training/save). To run DistillPPO, we have to run the distill training process by +a policy file will be generated in [./Utils/Atari_PPO_training/save](https://github.com/liuanji/WU-UCT/tree/master/Utils/Atari_PPO_training/save). To run DistillPPO, we have to run the distill training process by ``` python3 main.py --mode Distill --env-name PongNoFrameskip-v0 ``` ## Run on your own environments -We kindly provide an [environment wrapper](https://github.com/liuanji/P-UCT/tree/master/Env/EnvWrapper.py) and a [policy wrapper](https://github.com/liuanji/P-UCT/tree/master/Policy/PolicyWrapper.py) to make easy extensions to other environments. All you need is to modify [./Env/EnvWrapper.py](https://github.com/liuanji/P-UCT/tree/master/Env/EnvWrapper.py) and [./Policy/PolicyWrapper.py](https://github.com/liuanji/P-UCT/tree/master/Policy/PolicyWrapper.py), and fit in your own environment. Please follow the below instructions. +We kindly provide an [environment wrapper](https://github.com/liuanji/WU-UCT/tree/master/Env/EnvWrapper.py) and a [policy wrapper](https://github.com/liuanji/WU-UCT/tree/master/Policy/PolicyWrapper.py) to make easy extensions to other environments. All you need is to modify [./Env/EnvWrapper.py](https://github.com/liuanji/WU-UCT/tree/master/Env/EnvWrapper.py) and [./Policy/PolicyWrapper.py](https://github.com/liuanji/WU-UCT/tree/master/Policy/PolicyWrapper.py), and fit in your own environment. Please follow the below instructions. -1. Edit the class EnvWrapper in [./Env/EnvWrapper.py](https://github.com/liuanji/P-UCT/tree/master/Env/EnvWrapper.py). +1. Edit the class EnvWrapper in [./Env/EnvWrapper.py](https://github.com/liuanji/WU-UCT/tree/master/Env/EnvWrapper.py). Nest your environment into the wrapper by providing specific functionality in each of the member function of EnvWrapper. There are currently four input arguments to EnvWrapper: *env_name*, *max_episode_length*, *enable_record*, and *record_path*. If additional information needs to be imported, you may first consider adding them in *env_name*. -2. Edit the class PolicyWrapper in [./Policy/PolicyWrapper.py](https://github.com/liuanji/P-UCT/tree/master/Policy/PolicyWrapper.py). +2. Edit the class PolicyWrapper in [./Policy/PolicyWrapper.py](https://github.com/liuanji/WU-UCT/tree/master/Policy/PolicyWrapper.py). Similarly, nest your default policy in PolicyWrapper, and pass the corresponding method using --policy. You will need to rewrite *get_action*, *get_value*, and *get_prior_prob* three member functions.