Probably backward batchnorm has numerical instability. #2469

CAHEK7 · 2023-10-19T16:12:40Z

Checking out b2ce275 shows an example where slightly different random sequence may fail one set of tests and fix another set of tests.
Both sequences have the same probability distribution, have the same set of "discreet" fp values, have very similar mean and stdev values: passed 24th test mean: 2.10394e-05 stdev: 0.574497, failed 24th test mean: -3.9892e-05 stdev: 0.574472

The only difference is an order of random values and the test ./bin/test_bn_bwd --gtest_filter=*BnBwdCKFloat/24 failed with following error: Error beyond tolerance. Error: 0.00034430102545170334, Threshold: 0.0001

Steps to reproduce:

checkout b2ce275
run MIOPEN_DEBUG_UNSTABLE_BN=0 ./bin/test_bn_bwd --all to get 24th, 25th and 26th tests failed and 0 test passed
run MIOPEN_DEBUG_UNSTABLE_BN=1 ./bin/test_bn_bwd --all to get 24th, 25th and 26th tests passed and 0 test failed

Also setting different random seeds using envvar MIOPEN_DEBUG_DRIVER_PRNG_SEED= may break some of the tests as well

For example, using MIOPEN_DEBUG_DRIVER_PRNG_SEED=24604 the test ./bin/test_bn_bwd --gtest_filter=*BnBwdCKFloat/24 failed even with 0.0005 threshold.

The text was updated successfully, but these errors were encountered:

CAHEK7 added the correctness label Oct 19, 2023

junliume added the urgency_high label Oct 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Probably backward batchnorm has numerical instability. #2469

Probably backward batchnorm has numerical instability. #2469

CAHEK7 commented Oct 19, 2023 •

edited

Loading

Probably backward batchnorm has numerical instability. #2469

Probably backward batchnorm has numerical instability. #2469

Comments

CAHEK7 commented Oct 19, 2023 • edited Loading

CAHEK7 commented Oct 19, 2023 •

edited

Loading