In my dataset, the loss of ALS is very large, and it is normal to use other loss functions #22

ghost · 2020-12-01T13:47:35Z

Hello, thank you very much and your team's contribution in this respect, I intend to apply this loss function to my image multi label classification model (only label, no border label),
loss_function=AsymmetricLoss()
logits = net(images.to(device))
loss = loss_function(logits,labels.to(device))
I haven't changed your ALS loss function at all. At first, the loss was 156. Finally, it dropped to 4, ACC = 0. What's the matter? Why did the loss value just start to be more than 100, and still be 4 after training, and the accuracy rate is zero？When I use BCEloss, it's perfectly normal
train loss: 100%[->]4.9414
[epoch 1] train_loss: 21.409 test_accuracy: 0.000
train loss: 100%[->]5.7753

mrT23 · 2020-12-01T18:01:41Z

our default params for ASL are for highly imbalanced multi label datasets.

i suggest you try gradually variants of ASL, and make sure results are logical and consistent

(1)
start with simple CE, and make sure you reproduce your BCEloss results:
loss_function=AsymmetricLoss(gamma_neg=0, gamma_pos=0, clip=0)

(2) than try simple focal loss:
loss_function=AsymmetricLoss(gamma_neg=2, gamma_pos=2, clip=0)

(3) try now ASL:
loss_function=AsymmetricLoss(gamma_neg=2, gamma_pos=1, clip=0)
loss_function=AsymmetricLoss(gamma_neg=4, gamma_pos=1, clip=0.05)

(4) also try the 'disable_torch_grad_focal_loss' mode, it can stabilize results:
loss_function=AsymmetricLoss(gamma_neg=4, gamma_pos=1, clip=0.05,disable_torch_grad_focal_loss=True)

ghost · 2020-12-02T01:43:10Z

Hello, Thank you for your reply

I used a simple example to test and found that BCEloss can't be reproduced. What's the problem？
from losses import AsymmetricLossOptimized,AsymmetricLoss

import torch
import numpy as np
import torch.nn.functional as F
pred = np.array([[-0.4089, -1.2471, 0.5907],
[-0.4897, -0.8267, -0.7349],
[0.5241, -0.1246, -0.4751]])
label = np.array([[0, 1, 1],
[0, 0, 1],
[1, 0, 1]])

pred = torch.from_numpy(pred).float()
label = torch.from_numpy(label).float()

crition1 = torch.nn.BCEWithLogitsLoss()
loss1 = crition1(pred, label)
print(loss1)

crition2 = AsymmetricLoss(gamma_neg = 0,gamma_pos = 0,clip = 0,disable_torch_grad_focal_loss=True)

loss2 = crition2(pred, label)
print(loss2)

crition3 = AsymmetricLossOptimized(gamma_neg = 0,gamma_pos = 0,clip = 0)
loss3 = crition3(pred, label)
print(loss3)

tensor(0.7193)
tensor(6.4739)
tensor(6.4739)

mrT23 · 2020-12-02T08:15:16Z

ASL preforms sigmoid

BCEWithLogitsLoss does no perform sigmoid

ghost · 2020-12-04T11:32:49Z

Thank you sincerely for your help. I have solved this problem. In addition, I would like to ask my multi label image task (there are nine kinds of tags in total, each picture may have one or two, three, four kinds of tags，There is no dependency between these tags
). Is this imbalance described in your paper? Can I use your loss function for this task?

ghost · 2020-12-07T03:50:45Z

Sincerely thank you for taking some time out of your busy work to answer this question. I am a deep learning beginner.In your article:"In typical multi label datasets, each picture contains only a few positive labels, and many negative ones.", ”In my multi label classification dataset, there are ten kings of tags in total, each picture may have one or two, three, four kings of tags. Does this not be too extreme, also belong to the situation mentioned in your article, can I use ASL?

mrT23 · 2020-12-07T06:33:52Z

I am not sure. my best advice would be "try and see".

the datasets that we used in the article are probably larger than yours. however, loss function is one the of critical components in deep learning, and you would do wisely to try and find the best one for your problem.

This is an integral part of the way experienced deep learning practitioners reach top results - they test many things, and look for thee "big money". proper loss can be one of those things, although your specific problem might indeed not be the best candidate for ASL.

ghost · 2020-12-07T06:37:51Z

OK, thank you for your help

davidas1 · 2021-02-24T08:51:22Z

ASL preforms sigmoid

BCEWithLogitsLoss does no perform sigmoid

Thought it would be good to clarify something, as this issue is linked in the repo's README - both loss functions mentioned above perform Sigmoid internally. The difference between the results is due to different reduction - BCEWithLogitsLoss does mean reduction by default, while ASL always returns the sum.

@mrT23 - Do you have any intuition about why you sum the loss instead of averaging? This should make the loss (and other hyperparameters like learning rate) dependant on the batch size and number of classes.
I'm trying ASL on a multi-task multi-label problem (training multiple heads, each with its own loss), and thinking about what is the best way to reduce the losses from the different heads.

mrT23 · 2021-02-24T09:18:20Z

@davidas1
i was bothered with this question for ~1 year (on other losses as well) until I realized the following truth -
In Adam optimizer, it does not matter if we do sum or average!

you can understand this just from looking at adam update rule:

https://towardsdatascience.com/adam-latest-trends-in-deep-learning-optimization-6be9a291375c

if you still ponder about it, i can further explain

mrT23 · 2021-03-08T13:43:31Z

since in adam optimizer you divide gradient by the standard deviation, the actual update is not changed if you multiply or divide the loss by a constant factor (sum vs avg)

…

On Mon, Mar 8, 2021 at 2:15 PM bendanzzc ***@***.***> wrote: *Adam* o Could you further explain why it does not matter if we do sum or average？ Thanks a lot — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#22 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFBXQDFQRA3WVH7FXD2E3HLTCS5VVANCNFSM4UJBVXLA> .

Chen-Song · 2021-05-26T06:14:17Z

@davidas1 Hi, I see you say 'I'm trying ASL on a multi-task multi-label problem (training multiple heads, each with its own loss), and thinking about what is the best way to reduce the losses from the different heads'. Is this strategy effective for multi-task multi-label problem?

csEylLee · 2023-06-17T13:09:30Z

ASL preforms sigmoid

BCEWithLogitsLoss does no perform sigmoid

I think you were wrong. BCEWithLogitsLoss also performs sigmoid. When I set reduction='sum', the output loss of BCEWithLogitsLoss is equal to ASL.

YUNIyx · 2024-02-23T05:39:14Z

@mrT23 I have tried gamma_neg=2, gamma_pos=1 and gamma _ neg = 4, gamma _ pos = 1. The latter is better, but it is still not as good as the cross entropy loss function.
If it is modified to gamma _ neg = 5 and gamma _ pos = 1, will it have a better effect?

mrT23 mentioned this issue Dec 7, 2020

在使用BCE损失函数训练好的模型上，使用ASL微调模型，loss变大。 #24

Closed

mrT23 closed this as completed Dec 9, 2020

mrT23 reopened this Feb 24, 2021

mrT23 closed this as completed Feb 28, 2021

mrT23 mentioned this issue Sep 12, 2023

Can this be used for multilabel dataset in classification #107

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In my dataset, the loss of ALS is very large, and it is normal to use other loss functions #22

In my dataset, the loss of ALS is very large, and it is normal to use other loss functions #22

ghost commented Dec 1, 2020 •

edited by ghost

Loading

mrT23 commented Dec 1, 2020 •

edited

Loading

ghost commented Dec 2, 2020

mrT23 commented Dec 2, 2020

ghost commented Dec 4, 2020

ghost commented Dec 7, 2020

mrT23 commented Dec 7, 2020

ghost commented Dec 7, 2020

davidas1 commented Feb 24, 2021

mrT23 commented Feb 24, 2021

mrT23 commented Mar 8, 2021 via email

Chen-Song commented May 26, 2021

csEylLee commented Jun 17, 2023 •

edited

Loading

YUNIyx commented Feb 23, 2024

In my dataset, the loss of ALS is very large, and it is normal to use other loss functions #22

In my dataset, the loss of ALS is very large, and it is normal to use other loss functions #22

Comments

ghost commented Dec 1, 2020 • edited by ghost Loading

mrT23 commented Dec 1, 2020 • edited Loading

ghost commented Dec 2, 2020

mrT23 commented Dec 2, 2020

ghost commented Dec 4, 2020

ghost commented Dec 7, 2020

mrT23 commented Dec 7, 2020

ghost commented Dec 7, 2020

davidas1 commented Feb 24, 2021

mrT23 commented Feb 24, 2021

mrT23 commented Mar 8, 2021 via email

Chen-Song commented May 26, 2021

csEylLee commented Jun 17, 2023 • edited Loading

YUNIyx commented Feb 23, 2024

ghost commented Dec 1, 2020 •

edited by ghost

Loading

mrT23 commented Dec 1, 2020 •

edited

Loading

csEylLee commented Jun 17, 2023 •

edited

Loading