Port Chainer#4191 or use Chainer's BN implementation #203

kuenishi · 2018-02-07T06:17:23Z

ChainerMN has mostly-copied BatchNormalization code (but several AllReduce added), which means potential bugs from Chainer could also be imported. chainer/chainer#4191 could be one of them; porting it to ChainerMN seems obvious but we have another major choice, which is to think of cleaner porting from Chainer's BN code, to ride on a free lunch from Chainer. Thoughts?

mingxiaoh · 2018-05-03T06:59:22Z

@kuenishi we have the same question here. we recently tried multi-nodes test experiment, we found that googlenet_v2, googlenet_v3 and resnet50 show unexpected low(<10%) validation accuracy while alexnet and googlenet can achieve SOTA accuracy. We guess there might be bug in batch normalization implementation.
FYI, above networks can achieve same accuracy as SOTA/GPU on single node.

kuenishi · 2018-05-07T05:43:47Z

@mingxiaoh Thank you for reporting. Do you have any chance trying to port chainer/chainer#4191 to ChainerMN's BN code to verify it's a bug?

kuenishi added bug question labels Feb 7, 2018

keisukefukuda self-assigned this Jun 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port Chainer#4191 or use Chainer's BN implementation #203

Port Chainer#4191 or use Chainer's BN implementation #203

kuenishi commented Feb 7, 2018

mingxiaoh commented May 3, 2018

kuenishi commented May 7, 2018

Port Chainer#4191 or use Chainer's BN implementation #203

Port Chainer#4191 or use Chainer's BN implementation #203

Comments

kuenishi commented Feb 7, 2018

mingxiaoh commented May 3, 2018

kuenishi commented May 7, 2018