You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the DDPG implementation, in models.py, note that the **network_kwargs in self.network_builder = get_network_builder(network)(**network_kwargs)
does not contain layer_norm=True/False. Thus, when the critic uses this network builder to build mlp, layer norm is not applied. This causes the model to fail on many environments such as HalfCheetah.
Variable names of the critic in the original code:
critic/mlp_fc0/w:0
critic/mlp_fc0/b:0
critic/mlp_fc1/w:0
critic/mlp_fc1/b:0
critic/output/kernel:0
critic/output/bias:0
Variable names of the critic should be:
critic/mlp_fc0/w:0
critic/mlp_fc0/b:0
critic/LayerNorm/beta:0
critic/LayerNorm/gamma:0
critic/mlp_fc1/w:0
critic/mlp_fc1/b:0
critic/LayerNorm_1/beta:0
critic/LayerNorm_1/gamma:0
critic/output/kernel:0
critic/output/bias:0
However, even after fixing this, DDPG still runs poorly on HalfCheetah after 2M time steps (reward is less than 1000). It should reach a reward of ~3000+ according to many papers. It is possible that there are other bugs.
The text was updated successfully, but these errors were encountered:
xuanlinli17
changed the title
DDPG bug: layer norm not really applied when initializing the critic (Q) model
DDPG bug: layer norm not really applied when initializing the critic (Q) network
May 23, 2019
* Only obs of the first env is added to the list when using vecenv without images (openai#913)
* Fixed gen of traces with non-image vecenv (openai#913)
* Fixed gen of traces with non-image vecenv (openai#913)
* Fixed gen of traces with non-image vecenv (openai#913)
* Added vecenv non img expert traj test (openai#913)
In the DDPG implementation, in models.py, note that the
**network_kwargs
inself.network_builder = get_network_builder(network)(**network_kwargs)
does not contain
layer_norm=True/False
. Thus, when the critic uses this network builder to build mlp, layer norm is not applied. This causes the model to fail on many environments such as HalfCheetah.Variable names of the critic in the original code:
critic/mlp_fc0/w:0
critic/mlp_fc0/b:0
critic/mlp_fc1/w:0
critic/mlp_fc1/b:0
critic/output/kernel:0
critic/output/bias:0
Variable names of the critic should be:
critic/mlp_fc0/w:0
critic/mlp_fc0/b:0
critic/LayerNorm/beta:0
critic/LayerNorm/gamma:0
critic/mlp_fc1/w:0
critic/mlp_fc1/b:0
critic/LayerNorm_1/beta:0
critic/LayerNorm_1/gamma:0
critic/output/kernel:0
critic/output/bias:0
However, even after fixing this, DDPG still runs poorly on HalfCheetah after 2M time steps (reward is less than 1000). It should reach a reward of ~3000+ according to many papers. It is possible that there are other bugs.
The text was updated successfully, but these errors were encountered: