Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DDPG bug: layer norm not really applied when initializing the critic (Q) network #913

Open
xuanlinli17 opened this issue May 23, 2019 · 1 comment

Comments

@xuanlinli17
Copy link

xuanlinli17 commented May 23, 2019

In the DDPG implementation, in models.py, note that the **network_kwargs in
self.network_builder = get_network_builder(network)(**network_kwargs)
does not contain layer_norm=True/False. Thus, when the critic uses this network builder to build mlp, layer norm is not applied. This causes the model to fail on many environments such as HalfCheetah.

Variable names of the critic in the original code:
critic/mlp_fc0/w:0
critic/mlp_fc0/b:0
critic/mlp_fc1/w:0
critic/mlp_fc1/b:0
critic/output/kernel:0
critic/output/bias:0

Variable names of the critic should be:
critic/mlp_fc0/w:0
critic/mlp_fc0/b:0
critic/LayerNorm/beta:0
critic/LayerNorm/gamma:0
critic/mlp_fc1/w:0
critic/mlp_fc1/b:0
critic/LayerNorm_1/beta:0
critic/LayerNorm_1/gamma:0
critic/output/kernel:0
critic/output/bias:0

However, even after fixing this, DDPG still runs poorly on HalfCheetah after 2M time steps (reward is less than 1000). It should reach a reward of ~3000+ according to many papers. It is possible that there are other bugs.

@xuanlinli17 xuanlinli17 changed the title DDPG bug: layer norm not really applied when initializing the critic (Q) model DDPG bug: layer norm not really applied when initializing the critic (Q) network May 23, 2019
@DanielTakeshi
Copy link

@lilililiiiii Can you try after the change proposed in #938?

banerjs pushed a commit to banerjs/baselines that referenced this issue Jul 21, 2020
* Only obs of the first env is added to the list when using vecenv without images (openai#913)

* Fixed gen of traces with non-image vecenv (openai#913)

* Fixed gen of traces with non-image vecenv (openai#913)

* Fixed gen of traces with non-image vecenv (openai#913)

* Added vecenv non img expert traj test (openai#913)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants