You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ChainerMN has mostly-copied BatchNormalization code (but several AllReduce added), which means potential bugs from Chainer could also be imported. chainer/chainer#4191 could be one of them; porting it to ChainerMN seems obvious but we have another major choice, which is to think of cleaner porting from Chainer's BN code, to ride on a free lunch from Chainer. Thoughts?
The text was updated successfully, but these errors were encountered:
@kuenishi we have the same question here. we recently tried multi-nodes test experiment, we found that googlenet_v2, googlenet_v3 and resnet50 show unexpected low(<10%) validation accuracy while alexnet and googlenet can achieve SOTA accuracy. We guess there might be bug in batch normalization implementation.
FYI, above networks can achieve same accuracy as SOTA/GPU on single node.
ChainerMN has mostly-copied BatchNormalization code (but several AllReduce added), which means potential bugs from Chainer could also be imported. chainer/chainer#4191 could be one of them; porting it to ChainerMN seems obvious but we have another major choice, which is to think of cleaner porting from Chainer's BN code, to ride on a free lunch from Chainer. Thoughts?
The text was updated successfully, but these errors were encountered: