Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Computing Accuarcy on 16bit softmaxed values causes ValueError #245

Closed
l-salewski opened this issue May 12, 2021 · 3 comments
Closed

Computing Accuarcy on 16bit softmaxed values causes ValueError #245

l-salewski opened this issue May 12, 2021 · 3 comments
Labels
bug / fix Something isn't working help wanted Extra attention is needed

Comments

@l-salewski
Copy link

🐛 Bug

Computing an accuarcy on on-the-fly softmaxed values causes ValueError Probabilities in preds must sum up to 1 across the C dimension even though the tensor does sum up to 1 in the correct dimension.

I upgraded pytorch-lightning from 1.1.8 to 1.2.10 to 1.3.1 and thus had to manually insert F.softmax(logits) to make the accuarcies work again (see this and this). No big deal in itself. But I am getting said ValueError immediately on 1.3.1 and after a couple of dozens of steps on 1.2.10. I have checked, that I am setting the correct dim on F.softmax(logits, dim=1).

I am suspecting that the root cause could be half precision (which I am always using, as my model is large), causing torch.isclose(...) to wrongly trigger this check:

if not torch.isclose(preds.sum(dim=1), torch.ones_like(preds.sum(dim=1))).all():

After removing the .all() it turns out, that only for some values, isclose returns False.

Workaround

I tried to adjust rtol of isclose and 1e-04 instead of the original 1e-05 and that works fine. A hacky solution I guess, ideally this setting would be derived from the possible value range of 16 bit numbers.

Expected behavior

No ValueError should be risen, as the values sum up to 1.

Environment

  • PyTorch Version (e.g., 1.0): 1.8.1
  • OS (e.g., Linux): Ubuntu
  • How you installed PyTorch: conda
  • Python version: 3.9.4
  • CUDA/cuDNN version: 11
  • GPU models and configuration: v100
@l-salewski l-salewski added bug / fix Something isn't working help wanted Extra attention is needed labels May 12, 2021
@github-actions
Copy link

Hi! thanks for your contribution!, great first issue!

@SkafteNicki
Copy link
Member

Hi l-salewski,
Thanks for raising this issue. Yesterday this PR #200 was merged that removes the check that is the preds tensor needs to sum to 1 such that users also can input unnomalized model output (logits). The issue therefore seems to be automatically solved by that. To get these changes you can install from master (pip install git+https://github.com/PytorchLightning/metrics.git@master)
Going to close the issue, feel free to reopen if the issue persist :]

@l-salewski
Copy link
Author

Hi SkafeNicki, thanks for the swift response and the even faster PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug / fix Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants