DDPG bug: layer norm not really applied when initializing the critic (Q) network #913

xuanlinli17 · 2019-05-23T22:17:42Z

In the DDPG implementation, in models.py, note that the **network_kwargs in
self.network_builder = get_network_builder(network)(**network_kwargs)
does not contain layer_norm=True/False. Thus, when the critic uses this network builder to build mlp, layer norm is not applied. This causes the model to fail on many environments such as HalfCheetah.

Variable names of the critic in the original code:
critic/mlp_fc0/w:0
critic/mlp_fc0/b:0
critic/mlp_fc1/w:0
critic/mlp_fc1/b:0
critic/output/kernel:0
critic/output/bias:0

Variable names of the critic should be:
critic/mlp_fc0/w:0
critic/mlp_fc0/b:0
critic/LayerNorm/beta:0
critic/LayerNorm/gamma:0
critic/mlp_fc1/w:0
critic/mlp_fc1/b:0
critic/LayerNorm_1/beta:0
critic/LayerNorm_1/gamma:0
critic/output/kernel:0
critic/output/bias:0

However, even after fixing this, DDPG still runs poorly on HalfCheetah after 2M time steps (reward is less than 1000). It should reach a reward of ~3000+ according to many papers. It is possible that there are other bugs.

The text was updated successfully, but these errors were encountered:

DanielTakeshi · 2019-06-26T02:11:35Z

@lilililiiiii Can you try after the change proposed in #938?

* Only obs of the first env is added to the list when using vecenv without images (openai#913) * Fixed gen of traces with non-image vecenv (openai#913) * Fixed gen of traces with non-image vecenv (openai#913) * Fixed gen of traces with non-image vecenv (openai#913) * Added vecenv non img expert traj test (openai#913)

xuanlinli17 changed the title ~~DDPG bug: layer norm not really applied when initializing the critic (Q) model~~ DDPG bug: layer norm not really applied when initializing the critic (Q) network May 23, 2019

DanielTakeshi mentioned this issue Jun 20, 2019

DDPG implementation fails to learn well on at least five MuJoCo-v2 envs for all three noise types. I report steps to reproduce and learning curve plots [and show that PPO2 seems to work fine]. #938

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DDPG bug: layer norm not really applied when initializing the critic (Q) network #913

DDPG bug: layer norm not really applied when initializing the critic (Q) network #913

xuanlinli17 commented May 23, 2019 •

edited

Loading

DanielTakeshi commented Jun 26, 2019

DDPG bug: layer norm not really applied when initializing the critic (Q) network #913

DDPG bug: layer norm not really applied when initializing the critic (Q) network #913

Comments

xuanlinli17 commented May 23, 2019 • edited Loading

DanielTakeshi commented Jun 26, 2019

xuanlinli17 commented May 23, 2019 •

edited

Loading