Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loss goes nan when training dual swin-base #39

Open
seanzhuh opened this issue Aug 25, 2021 · 2 comments
Open

Loss goes nan when training dual swin-base #39

seanzhuh opened this issue Aug 25, 2021 · 2 comments

Comments

@seanzhuh
Copy link

Hi, I've transferred your code to my own codebase, since your modification of the original mmdetection lies in 3 files (please correct me if I'm wrong):

  1. CBNetV2/mmdet/models/backbones/cbnet.py
  2. CBNetV2/mmdet/models/necks/cbnet_fpn.py
  3. CBNetV2/mmdet/models/detectors/two_stage.py

I directly do a copy-paste to transfer your code my own version of mmdetection, however, loss goes to nan since epoch 17, the reason is that gradient is overflowing, amp loss scaler has to shrink to a small number, until divided by zero, thus it goes to nan.

I use default setting of AMP as yours, I can't figure out what's wrong, could you help me?

@fuweifu-vtoo
Copy link

I meet the same situation!
have you solved it?

@seanzhuh
Copy link
Author

seanzhuh commented Sep 16, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants