Merge branch 'master' of https://github.com/liuanji/P-UCT

liuanji · Nov 18, 2019 · fe91eee · fe91eee
2 parents 3943e7d + ec07a5f
commit fe91eee
Showing 1 changed file with 10 additions and 10 deletions.
diff --git a/README.md b/README.md
@@ -49,19 +49,19 @@ The breakdown of time consumption (tested with 16 expansion and simulation worke
 1. Download or clone the repository.
 2. Run with the default settings:
 ```
-  python3 main.py --model P-UCT
+  python3 main.py --model WU-UCT
 ```
-3. For additional hyperparameters please have a look at [main.py](https://github.com/liuanji/P-UCT/tree/master/main.py) (they are also listed below), where descriptions are also included. For example, if you want to run the game PongNoFrameskip-v0 with 200 MCTS rollouts, simply run:
+3. For additional hyperparameters please have a look at [main.py](https://github.com/liuanji/WU-UCT/tree/master/main.py) (they are also listed below), where descriptions are also included. For example, if you want to run the game PongNoFrameskip-v0 with 200 MCTS rollouts, simply run:
 ```
-  python3 main.py --model P-UCT --env-name PongNoFrameskip-v0 --MCTS-max-steps 200
+  python3 main.py --model WU-UCT --env-name PongNoFrameskip-v0 --MCTS-max-steps 200
 ```
 or if you want to record the video of gameplay, run:
 ```
-  python3 main.py --model P-UCT --env-name PongNoFrameskip-v0 --record-video
+  python3 main.py --model WU-UCT --env-name PongNoFrameskip-v0 --record-video
 ```
 
 * A full list of parameters
-  * --model: MCTS model to use (currently support P-UCT and UCT).
+  * --model: MCTS model to use (currently support WU-UCT and UCT).
   * --env-name: name of the environment.
   * --MCTS-max-steps: number of simulation steps in the planning phase.
   * --MCTS-max-depth: maximum planning depth.
@@ -77,24 +77,24 @@ or if you want to record the video of gameplay, run:
   * --mode: MCTS or Distill, see [Planning with prior policy](#Planning-with-prior-policy).
 
 ### Planning with prior policy
-The code currently support three default policies (policy used to perform simulation): *Random*, *PPO*, *DistillPPO* (to use them, change the “--policy” parameter). To use the *PPO* and *DistillPPO* policy, corresponding policy files need to be put in [./Policy/PPO/PolicyFiles](https://github.com/liuanji/P-UCT/tree/master/Policy/PPO/PolicyFiles). PPO policy files can be generated by [Atari_PPO_training](https://github.com/liuanji/P-UCT/tree/master/Utils/Atari_PPO_training). For example, by running
+The code currently support three default policies (policy used to perform simulation): *Random*, *PPO*, *DistillPPO* (to use them, change the “--policy” parameter). To use the *PPO* and *DistillPPO* policy, corresponding policy files need to be put in [./Policy/PPO/PolicyFiles](https://github.com/liuanji/WU-UCT/tree/master/Policy/PPO/PolicyFiles). PPO policy files can be generated by [Atari_PPO_training](https://github.com/liuanji/WU-UCT/tree/master/Utils/Atari_PPO_training). For example, by running
 ```
   cd Utils/Atari_PPO_training
   python3 main.py PongNoFrameskip-v0
 ```
-a policy file will be generated in [./Utils/Atari_PPO_training/save](https://github.com/liuanji/P-UCT/tree/master/Utils/Atari_PPO_training/save). To run DistillPPO, we have to run the distill training process by
+a policy file will be generated in [./Utils/Atari_PPO_training/save](https://github.com/liuanji/WU-UCT/tree/master/Utils/Atari_PPO_training/save). To run DistillPPO, we have to run the distill training process by
 ```
   python3 main.py --mode Distill --env-name PongNoFrameskip-v0
 ```
 
 ## Run on your own environments
-We kindly provide an [environment wrapper](https://github.com/liuanji/P-UCT/tree/master/Env/EnvWrapper.py) and a [policy wrapper](https://github.com/liuanji/P-UCT/tree/master/Policy/PolicyWrapper.py) to make easy extensions to other environments. All you need is to modify [./Env/EnvWrapper.py](https://github.com/liuanji/P-UCT/tree/master/Env/EnvWrapper.py) and [./Policy/PolicyWrapper.py](https://github.com/liuanji/P-UCT/tree/master/Policy/PolicyWrapper.py), and fit in your own environment. Please follow the below instructions.
+We kindly provide an [environment wrapper](https://github.com/liuanji/WU-UCT/tree/master/Env/EnvWrapper.py) and a [policy wrapper](https://github.com/liuanji/WU-UCT/tree/master/Policy/PolicyWrapper.py) to make easy extensions to other environments. All you need is to modify [./Env/EnvWrapper.py](https://github.com/liuanji/WU-UCT/tree/master/Env/EnvWrapper.py) and [./Policy/PolicyWrapper.py](https://github.com/liuanji/WU-UCT/tree/master/Policy/PolicyWrapper.py), and fit in your own environment. Please follow the below instructions.
 
-1. Edit the class EnvWrapper in [./Env/EnvWrapper.py](https://github.com/liuanji/P-UCT/tree/master/Env/EnvWrapper.py).
+1. Edit the class EnvWrapper in [./Env/EnvWrapper.py](https://github.com/liuanji/WU-UCT/tree/master/Env/EnvWrapper.py).
 
     Nest your environment into the wrapper by providing specific functionality in each of the member function of EnvWrapper. There are currently four input arguments to EnvWrapper: *env_name*, *max_episode_length*, *enable_record*, and *record_path*. If additional information needs to be imported, you may first consider adding them in *env_name*.
 
-2. Edit the class PolicyWrapper in [./Policy/PolicyWrapper.py](https://github.com/liuanji/P-UCT/tree/master/Policy/PolicyWrapper.py).
+2. Edit the class PolicyWrapper in [./Policy/PolicyWrapper.py](https://github.com/liuanji/WU-UCT/tree/master/Policy/PolicyWrapper.py).
 
     Similarly, nest your default policy in PolicyWrapper, and pass the corresponding method using --policy. You will need to rewrite *get_action*, *get_value*, and *get_prior_prob* three member functions.