Skip to content

Commit

Permalink
Improve documentation (#65)
Browse files Browse the repository at this point in the history
* extend documentation to address #64

and a few additional comments regarding hyperparameter defaults in general

* Update changelog and readme

* Update README

Co-authored-by: Antonin RAFFIN <[email protected]>
  • Loading branch information
cboettig and araffin authored Jan 12, 2021
1 parent 3afaef7 commit 0a853b6
Show file tree
Hide file tree
Showing 2 changed files with 43 additions and 2 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
- Fixed a bug when using HER + DQN/TQC for hyperparam optimization

### Documentation
- Improved documentation (@cboettig)

### Other
- Refactored train script, now uses a `ExperimentManager` class
Expand Down
44 changes: 42 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,9 +86,28 @@ python train.py --algo sac --env Pendulum-v0 --save-replay-buffer
It will be automatically loaded if present when continuing training.


## Hyperparameter yaml syntax

The syntax used in `hyperparameters/algo_name.yml` for setting hyperparameters (likewise the syntax to [overwrite hyperparameters](https://github.com/DLR-RM/rl-baselines3-zoo#overwrite-hyperparameters) on the cli) may be specialized if the argument is a function. See examples in the `hyperparameters/` directory. For example:

- Specify a linear schedule for the learning rate:

```yaml
learning_rate: lin_0.012486195510232303
```
Specify a different activation function for the network:
```yaml
policy_kwargs: "dict(activation_fn=nn.ReLU)"
```
## Hyperparameter Tuning
We use [Optuna](https://optuna.org/) for optimizing the hyperparameters.
Not all hyperparameters are tuned, and tuning enforces certain default hyperparameter settings that may be different from the official defaults. See [utils/hyperparams_opt.py](https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/utils/hyperparams_opt.py) for the current settings for each agent.
Hyperparameters not specified in [utils/hyperparams_opt.py](https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/utils/hyperparams_opt.py) are taken from the associated YAML file and fallback to the default values of SB3 if not present.
Note: hyperparameters search is not implemented for DQN for now.
when using SuccessiveHalvingPruner ("halving"), you must specify `--n-jobs > 1`
Expand All @@ -105,18 +124,39 @@ Distributed optimization using a shared database is also possible (see the corre
python train.py --algo ppo --env MountainCar-v0 -optimize --study-name test --storage sqlite:///example.db
```

### Hyperparameters search space

Note that the default hyperparameters used in the zoo when tuning are not always the same as the defaults provided in [stable-baselines3](https://stable-baselines3.readthedocs.io/en/master/modules/base.html). Consult the latest source code to be sure of these settings. For example:

- PPO tuning assumes a network architecture with `ortho_init = False` when tuning, though it is `True` by [default](https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html#ppo-policies). You can change that by updating [utils/hyperparams_opt.py](https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/utils/hyperparams_opt.py).

- Non-epsodic rollout in TD3 and DDPG assumes `gradient_steps = train_freq` and so tunes only `train_freq` to reduce the search space.

When working with continuous actions, we recommend to enable [gSDE](https://arxiv.org/abs/2005.05719) by uncommenting lines in [utils/hyperparams_opt.py](https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/utils/hyperparams_opt.py).

## Env normalization

In the hyperparameter file, `normalize: True` means that the training environment will be wrapped in a [VecNormalize](https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/common/vec_env/vec_normalize.py#L13) wrapper.

[Normalization uses](https://github.com/DLR-RM/rl-baselines3-zoo/issues/64) the default parameters of `VecNormalize`, with the exception of `gamma` which is set to match that of the agent. This can be [overridden](https://github.com/DLR-RM/rl-baselines3-zoo/blob/v0.10.0/hyperparams/sac.yml#L239) using the appropriate `hyperparameters/algo_name.yml`, e.g.

```yaml
normalize: "{'norm_obs': True, 'norm_reward': False}"
```


## Env Wrappers

You can specify in the hyperparameter config one or more wrapper to use around the environment:

for one wrapper:
```
```yaml
env_wrapper: gym_minigrid.wrappers.FlatObsWrapper
```

for multiple, specify a list:

```
```yaml
env_wrapper:
- utils.wrappers.DoneOnSuccessWrapper:
reward_offset: 1.0
Expand Down

0 comments on commit 0a853b6

Please sign in to comment.