Improve documentation (#65)

* extend documentation to address #64 and a few additional comments regarding hyperparameter defaults in general * Update changelog and readme * Update README Co-authored-by: Antonin RAFFIN <[email protected]>
DLR-RM · Jan 12, 2021 · 0a853b6 · 0a853b6
1 parent 3afaef7
commit 0a853b6
Show file tree

Hide file tree

Showing 2 changed files with 43 additions and 2 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -17,6 +17,7 @@
 - Fixed a bug when using HER + DQN/TQC for hyperparam optimization
 
 ### Documentation
+- Improved documentation (@cboettig)
 
 ### Other
 - Refactored train script, now uses a `ExperimentManager` class

diff --git a/README.md b/README.md
@@ -86,9 +86,28 @@ python train.py --algo sac --env Pendulum-v0 --save-replay-buffer
 It will be automatically loaded if present when continuing training.
 
 
+## Hyperparameter yaml syntax
+
+The syntax used in `hyperparameters/algo_name.yml` for setting hyperparameters (likewise the syntax to [overwrite hyperparameters](https://github.com/DLR-RM/rl-baselines3-zoo#overwrite-hyperparameters) on the cli) may be specialized if the argument is a function.  See examples in the `hyperparameters/` directory. For example:
+
+- Specify a linear schedule for the learning rate:
+
+```yaml
+  learning_rate: lin_0.012486195510232303
+```
+
+Specify a different activation function for the network:
+
+```yaml
+  policy_kwargs: "dict(activation_fn=nn.ReLU)"
+```
+
 ## Hyperparameter Tuning
 
 We use [Optuna](https://optuna.org/) for optimizing the hyperparameters.
+Not all hyperparameters are tuned, and tuning enforces certain default hyperparameter settings that may be different from the official defaults. See [utils/hyperparams_opt.py](https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/utils/hyperparams_opt.py) for the current settings for each agent.
+
+Hyperparameters not specified in [utils/hyperparams_opt.py](https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/utils/hyperparams_opt.py) are taken from the associated YAML file and fallback to the default values of SB3 if not present.
 
 Note: hyperparameters search is not implemented for DQN for now.
 when using SuccessiveHalvingPruner ("halving"), you must specify `--n-jobs > 1`
@@ -105,18 +124,39 @@ Distributed optimization using a shared database is also possible (see the corre
 python train.py --algo ppo --env MountainCar-v0 -optimize --study-name test --storage sqlite:///example.db
 ```
 
+### Hyperparameters search space
+
+Note that the default hyperparameters used in the zoo when tuning are not always the same as the defaults provided in [stable-baselines3](https://stable-baselines3.readthedocs.io/en/master/modules/base.html). Consult the latest source code to be sure of these settings. For example:
+
+- PPO tuning assumes a network architecture with `ortho_init = False` when tuning, though it is `True` by [default](https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html#ppo-policies). You can change that by updating [utils/hyperparams_opt.py](https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/utils/hyperparams_opt.py).
+
+- Non-epsodic rollout in TD3 and DDPG assumes `gradient_steps = train_freq` and so tunes only `train_freq` to reduce the search space.  
+
+When working with continuous actions, we recommend to enable [gSDE](https://arxiv.org/abs/2005.05719) by uncommenting lines in [utils/hyperparams_opt.py](https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/utils/hyperparams_opt.py).
+
+## Env normalization
+
+In the hyperparameter file, `normalize: True` means that the training environment will be wrapped in a [VecNormalize](https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/common/vec_env/vec_normalize.py#L13) wrapper.
+
+[Normalization uses](https://github.com/DLR-RM/rl-baselines3-zoo/issues/64) the default parameters of `VecNormalize`, with the exception of `gamma` which is set to match that of the agent.  This can be [overridden](https://github.com/DLR-RM/rl-baselines3-zoo/blob/v0.10.0/hyperparams/sac.yml#L239) using the appropriate `hyperparameters/algo_name.yml`, e.g.
+
+```yaml
+  normalize: "{'norm_obs': True, 'norm_reward': False}"
+```
+
+
 ## Env Wrappers
 
 You can specify in the hyperparameter config one or more wrapper to use around the environment:
 
 for one wrapper:
-```
+```yaml
 env_wrapper: gym_minigrid.wrappers.FlatObsWrapper
 ```
 
 for multiple, specify a list:
 
-```
+```yaml
 env_wrapper:
     - utils.wrappers.DoneOnSuccessWrapper:
         reward_offset: 1.0