You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, in my project I have encountered and issue that I'm not sure if it's caused by invalid usage of the library or there is some bug in the library code. I cannot provide minimal code for reproduction because bug occurs during training so I will describe it as best as I can.
Pseudo-code for my training looks something like:
For model I have tested VGG16, Resnet34 and VGG16_bn with appropriate canonizers, and for composite I have used EpsilonPlusFlat. All models have their heads changed to have 20 outputs, and are randomly initialized. I have noticed that models with BatchNorm have significant difference between output when in train mode and when in eval.
I have logged the sum of output during training to show this for different models.
For VGG16 we can see that output sums have around the same order of magnitude which is expected:
For ResNet34 we see drastic change in output sums, around 4 orders of magnitudes difference
For VGG16_bn we again see difference in output sums but difference is "only" around 1 order of magnitude:
I see that this behaviour is very strange but it all points to something being wrong with BatchNorm.
Version of Zennit I'm using is 0.5.2.dev5.
I would really appreciate your help regarding this one.
Thanks in advance.
The text was updated successfully, but these errors were encountered:
Hi, in my project I have encountered and issue that I'm not sure if it's caused by invalid usage of the library or there is some bug in the library code. I cannot provide minimal code for reproduction because bug occurs during training so I will describe it as best as I can.
Pseudo-code for my training looks something like:
For model I have tested VGG16, Resnet34 and VGG16_bn with appropriate canonizers, and for composite I have used EpsilonPlusFlat. All models have their heads changed to have 20 outputs, and are randomly initialized. I have noticed that models with BatchNorm have significant difference between output when in train mode and when in eval.
I have logged the sum of output during training to show this for different models.
For VGG16 we can see that output sums have around the same order of magnitude which is expected:
For ResNet34 we see drastic change in output sums, around 4 orders of magnitudes difference
For VGG16_bn we again see difference in output sums but difference is "only" around 1 order of magnitude:
I see that this behaviour is very strange but it all points to something being wrong with BatchNorm.
Version of Zennit I'm using is 0.5.2.dev5.
I would really appreciate your help regarding this one.
Thanks in advance.
The text was updated successfully, but these errors were encountered: