added gamma to reward normalization wrappers #209

Howuhh · 2022-06-20T15:04:15Z

Description

Fixes incorrect gamma in reward normalization wrapper for non-default gamma's. See #203.

Types of changes

Bug fix
New feature
New algorithm
Documentation

Checklist:

I've read the CONTRIBUTION guide (required).
I have ensured pre-commit run --all-files passes (required).

If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See #137 as an example PR.

I have contacted @vwxyzjn to obtain access to the openrlbenchmark W&B team (required).
I have tracked applicable experiments in openrlbenchmark/cleanrl with --capture-video flag toggled on (required).
I have added additional documentation and previewed the changes via mkdocs serve.
- I have explained note-worthy implementation details.
- I have added the learning curves (in PNG format with width=500 and height=300).
- I have added links to the tracked experiments.

vercel · 2022-06-20T15:04:19Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated
cleanrl	✅ Ready (Inspect)	Visit Preview	Jul 6, 2022 at 7:59PM (UTC)

vwxyzjn · 2022-06-20T15:36:42Z

So here is the tricky part - the original implementation actually uses 0.999 for gamma, but 0.99 for the normalization wrapper. See https://github.com/openai/train-procgen/blob/1a2ae2194a61f76a733a39339530401c024c3ad8/train_procgen/train.py#L43

This would cause a performance change unfortunately. There are two ways to go forward

re-run the procgen benchmark experiments with gym.wrappers.NormalizeReward(envs, gamma=args.gamma).

cleanrl/benchmark/ppo.sh

Lines 39 to 44 in 6387191

    
           poetry install -E procgen 
        
           xvfb-run -a python -m cleanrl_utils.benchmark \ 
        
               --env-ids starpilot bossfight bigfish \ 
        
               --command "poetry run python cleanrl/ppo_procgen.py --track --capture-video" \ 
        
               --num-seeds 3 \ 
        
               --workers 1

cleanrl/benchmark/ppg.sh

Lines 3 to 8 in 6387191

    
           poetry install -E procgen 
        
           xvfb-run -a python -m cleanrl_utils.benchmark \ 
        
               --env-ids starpilot bossfight bigfish \ 
        
               --command "poetry run python cleanrl/ppg_procgen.py --track --capture-video" \ 
        
               --num-seeds 3 \ 
        
               --workers 1

keep the procgen scripts untouched.

@Howuhh what do you think we should do?

Howuhh · 2022-06-20T15:43:24Z

@vwxyzjn To be honest, I think this is a bug in original code, not a feature, so it will be more accurate to rerun for correct results. However, procgen is image based env and for now I don't have resources to train on images.

vwxyzjn · 2022-06-20T15:47:32Z

Ok, no worries. I will take care from here. @Dipamc77 I don't have the GPU memory to run the PPG experiments. Would you mind running them with this PR? I can take care of the ppo procgen experiments.

cleanrl/benchmark/ppg.sh

Lines 3 to 8 in 6387191

    
           poetry install -E procgen 
        
           xvfb-run -a python -m cleanrl_utils.benchmark \ 
        
               --env-ids starpilot bossfight bigfish \ 
        
               --command "poetry run python cleanrl/ppg_procgen.py --track --capture-video" \ 
        
               --num-seeds 3 \ 
        
               --workers 1

vwxyzjn · 2022-06-21T01:01:47Z

Running the PPO experiments now. Also tried a fun thing by adding a wandb tag like

WANDB_TAGS=$(git describe --tags)  xvfb-run -a python -m cleanrl_utils.benchmark \
    --env-ids starpilot bossfight bigfish \
    --command "poetry run python cleanrl/ppo_procgen.py --track --capture-video" \
    --num-seeds 3 \
    --workers 1

which produces runs like

@dosssman I think this tagging system could somehow help us phase out past openrlbenchmark experiments without deleting them. I will have to think about the workflow a bit more.

vwxyzjn · 2022-06-30T21:48:44Z

Following up here

vwxyzjn · 2022-06-30T21:49:14Z

The bigfish performance degradation could easily be due to a random seed.

vwxyzjn · 2022-07-06T19:32:31Z

No major performance regression. Going to document this change and merge.

vwxyzjn · 2022-07-06T20:00:44Z

I have just updated all of the experiments and documentation. @Howuhh @dosssman could you give it a pass, please? Thank you!

Howuhh · 2022-07-07T13:04:35Z

@vwxyzjn Seems okay to me. Thanks for redoing the experiments btw.

added gamma to reward normalization wrappers

4ea73d9

vercel bot deployed to Preview June 20, 2022 15:04 View deployment

vwxyzjn added 2 commits July 6, 2022 15:44

update learning curve

935e7f3

update learning curves

974a34b

vercel bot deployed to Preview July 6, 2022 19:46 View deployment

vwxyzjn added 2 commits July 6, 2022 15:49

update numbers in docs

410b236

Update documentation

be9a46d

vercel bot deployed to Preview July 6, 2022 19:59 View deployment

vwxyzjn merged commit cd2011c into vwxyzjn:master Jul 7, 2022

vwxyzjn mentioned this pull request Oct 19, 2022

RLops Guide #296

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added gamma to reward normalization wrappers #209

added gamma to reward normalization wrappers #209

Howuhh commented Jun 20, 2022 •

edited by vwxyzjn

Loading

vercel bot commented Jun 20, 2022 •

edited

Loading

vwxyzjn commented Jun 20, 2022

Howuhh commented Jun 20, 2022 •

edited

Loading

vwxyzjn commented Jun 20, 2022

vwxyzjn commented Jun 21, 2022 •

edited

Loading

vwxyzjn commented Jun 30, 2022

vwxyzjn commented Jun 30, 2022

vwxyzjn commented Jul 6, 2022

vwxyzjn commented Jul 6, 2022

Howuhh commented Jul 7, 2022

added gamma to reward normalization wrappers #209

added gamma to reward normalization wrappers #209

Conversation

Howuhh commented Jun 20, 2022 • edited by vwxyzjn Loading

Description

Types of changes

Checklist:

vercel bot commented Jun 20, 2022 • edited Loading

vwxyzjn commented Jun 20, 2022

Howuhh commented Jun 20, 2022 • edited Loading

vwxyzjn commented Jun 20, 2022

vwxyzjn commented Jun 21, 2022 • edited Loading

vwxyzjn commented Jun 30, 2022

vwxyzjn commented Jun 30, 2022

vwxyzjn commented Jul 6, 2022

vwxyzjn commented Jul 6, 2022

Howuhh commented Jul 7, 2022

Howuhh commented Jun 20, 2022 •

edited by vwxyzjn

Loading

vercel bot commented Jun 20, 2022 •

edited

Loading

Howuhh commented Jun 20, 2022 •

edited

Loading

vwxyzjn commented Jun 21, 2022 •

edited

Loading