Skip to content

Commit

Permalink
Merge branch 'master' of https://github.com/liuanji/P-UCT
Browse files Browse the repository at this point in the history
  • Loading branch information
liuanji committed Nov 18, 2019
2 parents 3943e7d + ec07a5f commit fe91eee
Showing 1 changed file with 10 additions and 10 deletions.
20 changes: 10 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,19 +49,19 @@ The breakdown of time consumption (tested with 16 expansion and simulation worke
1. Download or clone the repository.
2. Run with the default settings:
```
python3 main.py --model P-UCT
python3 main.py --model WU-UCT
```
3. For additional hyperparameters please have a look at [main.py](https://github.com/liuanji/P-UCT/tree/master/main.py) (they are also listed below), where descriptions are also included. For example, if you want to run the game PongNoFrameskip-v0 with 200 MCTS rollouts, simply run:
3. For additional hyperparameters please have a look at [main.py](https://github.com/liuanji/WU-UCT/tree/master/main.py) (they are also listed below), where descriptions are also included. For example, if you want to run the game PongNoFrameskip-v0 with 200 MCTS rollouts, simply run:
```
python3 main.py --model P-UCT --env-name PongNoFrameskip-v0 --MCTS-max-steps 200
python3 main.py --model WU-UCT --env-name PongNoFrameskip-v0 --MCTS-max-steps 200
```
or if you want to record the video of gameplay, run:
```
python3 main.py --model P-UCT --env-name PongNoFrameskip-v0 --record-video
python3 main.py --model WU-UCT --env-name PongNoFrameskip-v0 --record-video
```

* A full list of parameters
* --model: MCTS model to use (currently support P-UCT and UCT).
* --model: MCTS model to use (currently support WU-UCT and UCT).
* --env-name: name of the environment.
* --MCTS-max-steps: number of simulation steps in the planning phase.
* --MCTS-max-depth: maximum planning depth.
Expand All @@ -77,24 +77,24 @@ or if you want to record the video of gameplay, run:
* --mode: MCTS or Distill, see [Planning with prior policy](#Planning-with-prior-policy).

### Planning with prior policy
The code currently support three default policies (policy used to perform simulation): *Random*, *PPO*, *DistillPPO* (to use them, change the “--policy” parameter). To use the *PPO* and *DistillPPO* policy, corresponding policy files need to be put in [./Policy/PPO/PolicyFiles](https://github.com/liuanji/P-UCT/tree/master/Policy/PPO/PolicyFiles). PPO policy files can be generated by [Atari_PPO_training](https://github.com/liuanji/P-UCT/tree/master/Utils/Atari_PPO_training). For example, by running
The code currently support three default policies (policy used to perform simulation): *Random*, *PPO*, *DistillPPO* (to use them, change the “--policy” parameter). To use the *PPO* and *DistillPPO* policy, corresponding policy files need to be put in [./Policy/PPO/PolicyFiles](https://github.com/liuanji/WU-UCT/tree/master/Policy/PPO/PolicyFiles). PPO policy files can be generated by [Atari_PPO_training](https://github.com/liuanji/WU-UCT/tree/master/Utils/Atari_PPO_training). For example, by running
```
cd Utils/Atari_PPO_training
python3 main.py PongNoFrameskip-v0
```
a policy file will be generated in [./Utils/Atari_PPO_training/save](https://github.com/liuanji/P-UCT/tree/master/Utils/Atari_PPO_training/save). To run DistillPPO, we have to run the distill training process by
a policy file will be generated in [./Utils/Atari_PPO_training/save](https://github.com/liuanji/WU-UCT/tree/master/Utils/Atari_PPO_training/save). To run DistillPPO, we have to run the distill training process by
```
python3 main.py --mode Distill --env-name PongNoFrameskip-v0
```

## Run on your own environments
We kindly provide an [environment wrapper](https://github.com/liuanji/P-UCT/tree/master/Env/EnvWrapper.py) and a [policy wrapper](https://github.com/liuanji/P-UCT/tree/master/Policy/PolicyWrapper.py) to make easy extensions to other environments. All you need is to modify [./Env/EnvWrapper.py](https://github.com/liuanji/P-UCT/tree/master/Env/EnvWrapper.py) and [./Policy/PolicyWrapper.py](https://github.com/liuanji/P-UCT/tree/master/Policy/PolicyWrapper.py), and fit in your own environment. Please follow the below instructions.
We kindly provide an [environment wrapper](https://github.com/liuanji/WU-UCT/tree/master/Env/EnvWrapper.py) and a [policy wrapper](https://github.com/liuanji/WU-UCT/tree/master/Policy/PolicyWrapper.py) to make easy extensions to other environments. All you need is to modify [./Env/EnvWrapper.py](https://github.com/liuanji/WU-UCT/tree/master/Env/EnvWrapper.py) and [./Policy/PolicyWrapper.py](https://github.com/liuanji/WU-UCT/tree/master/Policy/PolicyWrapper.py), and fit in your own environment. Please follow the below instructions.

1. Edit the class EnvWrapper in [./Env/EnvWrapper.py](https://github.com/liuanji/P-UCT/tree/master/Env/EnvWrapper.py).
1. Edit the class EnvWrapper in [./Env/EnvWrapper.py](https://github.com/liuanji/WU-UCT/tree/master/Env/EnvWrapper.py).

Nest your environment into the wrapper by providing specific functionality in each of the member function of EnvWrapper. There are currently four input arguments to EnvWrapper: *env_name*, *max_episode_length*, *enable_record*, and *record_path*. If additional information needs to be imported, you may first consider adding them in *env_name*.

2. Edit the class PolicyWrapper in [./Policy/PolicyWrapper.py](https://github.com/liuanji/P-UCT/tree/master/Policy/PolicyWrapper.py).
2. Edit the class PolicyWrapper in [./Policy/PolicyWrapper.py](https://github.com/liuanji/WU-UCT/tree/master/Policy/PolicyWrapper.py).

Similarly, nest your default policy in PolicyWrapper, and pass the corresponding method using --policy. You will need to rewrite *get_action*, *get_value*, and *get_prior_prob* three member functions.

Expand Down

0 comments on commit fe91eee

Please sign in to comment.