-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In my dataset, the loss of ALS is very large, and it is normal to use other loss functions #22
Comments
our default params for ASL are for highly imbalanced multi label datasets. i suggest you try gradually variants of ASL, and make sure results are logical and consistent (1) (2) than try simple focal loss: (3) try now ASL: (4) also try the 'disable_torch_grad_focal_loss' mode, it can stabilize results: |
Hello, Thank you for your reply I used a simple example to test and found that BCEloss can't be reproduced. What's the problem? import torch pred = torch.from_numpy(pred).float() crition1 = torch.nn.BCEWithLogitsLoss() crition2 = AsymmetricLoss(gamma_neg = 0,gamma_pos = 0,clip = 0,disable_torch_grad_focal_loss=True) loss2 = crition2(pred, label) crition3 = AsymmetricLossOptimized(gamma_neg = 0,gamma_pos = 0,clip = 0) tensor(0.7193) |
ASL preforms sigmoid BCEWithLogitsLoss does no perform sigmoid |
Thank you sincerely for your help. I have solved this problem. In addition, I would like to ask my multi label image task (there are nine kinds of tags in total, each picture may have one or two, three, four kinds of tags,There is no dependency between these tags |
Sincerely thank you for taking some time out of your busy work to answer this question. I am a deep learning beginner.In your article:"In typical multi label datasets, each picture contains only a few positive labels, and many negative ones.", ”In my multi label classification dataset, there are ten kings of tags in total, each picture may have one or two, three, four kings of tags. Does this not be too extreme, also belong to the situation mentioned in your article, can I use ASL? |
I am not sure. my best advice would be "try and see". the datasets that we used in the article are probably larger than yours. however, loss function is one the of critical components in deep learning, and you would do wisely to try and find the best one for your problem. This is an integral part of the way experienced deep learning practitioners reach top results - they test many things, and look for thee "big money". proper loss can be one of those things, although your specific problem might indeed not be the best candidate for ASL. |
OK, thank you for your help |
Thought it would be good to clarify something, as this issue is linked in the repo's README - both loss functions mentioned above perform Sigmoid internally. The difference between the results is due to different reduction - BCEWithLogitsLoss does mean reduction by default, while ASL always returns the sum. @mrT23 - Do you have any intuition about why you sum the loss instead of averaging? This should make the loss (and other hyperparameters like learning rate) dependant on the batch size and number of classes. |
@davidas1 you can understand this just from looking at adam update rule: if you still ponder about it, i can further explain |
since in adam optimizer you divide gradient by the standard deviation, the
actual update is not changed if you multiply or divide the loss by a
constant factor (sum vs avg)
…On Mon, Mar 8, 2021 at 2:15 PM bendanzzc ***@***.***> wrote:
*Adam* o
Could you further explain why it does not matter if we do sum or average?
Thanks a lot
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#22 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFBXQDFQRA3WVH7FXD2E3HLTCS5VVANCNFSM4UJBVXLA>
.
|
@davidas1 Hi, I see you say 'I'm trying ASL on a multi-task multi-label problem (training multiple heads, each with its own loss), and thinking about what is the best way to reduce the losses from the different heads'. Is this strategy effective for multi-task multi-label problem? |
I think you were wrong. BCEWithLogitsLoss also performs sigmoid. When I set reduction='sum', the output loss of BCEWithLogitsLoss is equal to ASL. |
@mrT23 I have tried gamma_neg=2, gamma_pos=1 and gamma _ neg = 4, gamma _ pos = 1. The latter is better, but it is still not as good as the cross entropy loss function. |
Hello, thank you very much and your team's contribution in this respect, I intend to apply this loss function to my image multi label classification model (only label, no border label),
loss_function=AsymmetricLoss()
logits = net(images.to(device))
loss = loss_function(logits,labels.to(device))
I haven't changed your ALS loss function at all. At first, the loss was 156. Finally, it dropped to 4, ACC = 0. What's the matter? Why did the loss value just start to be more than 100, and still be 4 after training, and the accuracy rate is zero?When I use BCEloss, it's perfectly normal
train loss: 100%[->]4.9414
[epoch 1] train_loss: 21.409 test_accuracy: 0.000
train loss: 100%[->]5.7753
The text was updated successfully, but these errors were encountered: