Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port Chainer#4191 or use Chainer's BN implementation #203

Open
kuenishi opened this issue Feb 7, 2018 · 2 comments
Open

Port Chainer#4191 or use Chainer's BN implementation #203

kuenishi opened this issue Feb 7, 2018 · 2 comments
Assignees

Comments

@kuenishi
Copy link
Member

kuenishi commented Feb 7, 2018

ChainerMN has mostly-copied BatchNormalization code (but several AllReduce added), which means potential bugs from Chainer could also be imported. chainer/chainer#4191 could be one of them; porting it to ChainerMN seems obvious but we have another major choice, which is to think of cleaner porting from Chainer's BN code, to ride on a free lunch from Chainer. Thoughts?

@mingxiaoh
Copy link

@kuenishi we have the same question here. we recently tried multi-nodes test experiment, we found that googlenet_v2, googlenet_v3 and resnet50 show unexpected low(<10%) validation accuracy while alexnet and googlenet can achieve SOTA accuracy. We guess there might be bug in batch normalization implementation.
FYI, above networks can achieve same accuracy as SOTA/GPU on single node.

@kuenishi
Copy link
Member Author

kuenishi commented May 7, 2018

@mingxiaoh Thank you for reporting. Do you have any chance trying to port chainer/chainer#4191 to ChainerMN's BN code to verify it's a bug?

@keisukefukuda keisukefukuda self-assigned this Jun 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants