Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Probably backward batchnorm has numerical instability. #2469

Open
CAHEK7 opened this issue Oct 19, 2023 · 0 comments
Open

Probably backward batchnorm has numerical instability. #2469

CAHEK7 opened this issue Oct 19, 2023 · 0 comments

Comments

@CAHEK7
Copy link
Contributor

CAHEK7 commented Oct 19, 2023

Checking out b2ce275 shows an example where slightly different random sequence may fail one set of tests and fix another set of tests.
Both sequences have the same probability distribution, have the same set of "discreet" fp values, have very similar mean and stdev values: passed 24th test mean: 2.10394e-05 stdev: 0.574497, failed 24th test mean: -3.9892e-05 stdev: 0.574472

The only difference is an order of random values and the test ./bin/test_bn_bwd --gtest_filter=*BnBwdCKFloat/24 failed with following error: Error beyond tolerance. Error: 0.00034430102545170334, Threshold: 0.0001

Steps to reproduce:

  1. checkout b2ce275
  2. run MIOPEN_DEBUG_UNSTABLE_BN=0 ./bin/test_bn_bwd --all to get 24th, 25th and 26th tests failed and 0 test passed
  3. run MIOPEN_DEBUG_UNSTABLE_BN=1 ./bin/test_bn_bwd --all to get 24th, 25th and 26th tests passed and 0 test failed

Also setting different random seeds using envvar MIOPEN_DEBUG_DRIVER_PRNG_SEED= may break some of the tests as well

For example, using MIOPEN_DEBUG_DRIVER_PRNG_SEED=24604 the test ./bin/test_bn_bwd --gtest_filter=*BnBwdCKFloat/24 failed even with 0.0005 threshold.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants