Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update loss.py #49

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Update loss.py #49

wants to merge 1 commit into from

Conversation

banjuanshua
Copy link

Fix clc loss nan bug. Sum of cls_loss should divide total number of pos_neg

Fix clc loss nan bug.   Sum of cls_loss should divide total number of pos_neg
@SCoulY
Copy link

SCoulY commented Jan 23, 2019

Good job

@pengfeidip
Copy link

@banjuanshua Have you successfully run the code?

@ridhajuneja123
Copy link

This is also giving nan loss how the above solution solve the problem can u explain

@noahcao
Copy link

noahcao commented Sep 26, 2019

For the best, you should also add clamp in the focal_loss in case the sigmoid produce inf value:

def focal_loss_alt(self, x, y):
        '''Focal loss alternative.

        Args:
          x: (tensor) sized [N,D].
          y: (tensor) sized [N,].

        Return:
          (tensor) focal loss.
        '''
        alpha = 0.25

        t = one_hot_embedding(y.data.cpu(), 1+self.num_classes)
        t = t[:,1:]
        t = Variable(t).cuda()

        xt = x*(2*t-1)  # xt = x if t > 0 else -x
        pt = (2*xt+1).sigmoid()

        w = alpha*t + (1-alpha)*(1-t)
        logit = pt.clamp(self.eps, 1.-self.eps)
        loss = -w*logit.log() / 2
       
        return loss.sum()

This should be better, where we could set self.eps=1e-7

@zjysteven
Copy link

Have you trained a satisfying model with your modification? In my case, dividing cls_loss with pos_neg will give a very small cls_loss (e.g. around 0.0020) since the number of background anchors is so large. However, after training, the classification outputs were all near 0 and I think it is because such a small cls_loss prevented the model from learning.

Also, other implementations below both uses num_pos as the normalizer. So I don't think your PR is actually helpful.
https://github.com/fizyr/keras-retinanet/blob/a8bbd6535655602a84f194b1557be1ed5180fa71/keras_retinanet/losses.py#L61-L65
https://github.com/yhenon/pytorch-retinanet/blob/bb11485e410dba9da3578de5b2cb36d9cd6fdd03/retinanet/losses.py#L92

@twmht
Copy link

twmht commented Mar 12, 2020

@zjysteven

what is the training cls loss when you use only num_pos as the normalizer?

@zjysteven
Copy link

@twmht Sorry it's been a long time so I can't remember what exactly the value of that is. But I have confirmed with the authors of several other implementations, and it is correct to divide the cls loss by only num_pos. It is also the case in Facebook's Detectron2, so you can be assured to do so. However, according to my own experience, this repo may have some other subtle implementation errors so it is not guaranteed that you get desirable results with this repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants