-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use of logits instead of softmax activations for OS scoring #7
Comments
Hi, thanks for pointing out this bug! Fig 6 was generated using a separate script which controlled properly for the Softmax operation. However, we cannot now remember if the main results (Tab 1) were evaluated with |
Hi, Right, thanks a lot for being so responsive and taking care of this. In the meantime, can I ask another question about evaluation? Namely, about the Since you get your results from reading them off the logs after training, I just wanted to double-check if you are indeed rebalancing the test set in some sense, since I don't recall reading this in the paper? And, what exactly would be the way in which you balance the test set? Also, do you know if the other papers for which you report results in your paper (apart of ARPL, which you re-train and re-test) use this kind of rebalancing? Thanks!! Adrian |
Hi! Thanks for your patience. We re-trained the models with the implementation from this repo and got the following results. The numbers are slightly boosted and we will update the ArXiv paper when we release the next version. All models are evaluated using Regarding previous papers, they often don't specify but I imagine they do not perform the rebalancing (it does not change the numbers too much as AUROC is relatively robust to misbalance).
|
Hey Sagar! Congratulations on improving your results, your work in this paper is indeed a impressive, I appreciate it a lot. I am now trying to reproduce now your results on fine-grained datasets, I will probably be writing you again if I find some difficulties, I hope that's fine for you! Thanks, Adrian |
Hi again,
I read from the paper that "[...] we propose the use of the maximum logit rather than softmax probability for the open-set scoring rule. Logits are the raw outputs of the final linear layer in a deep classifier, while the softmax operation involves a normalization such that the outputs can be interpreted as a probability vector summing to one. As the softmax operation normalizes out much of the feature magnitude information present in the logits, we find logits lead to better open-set detection results" . Then you have figure 6c that shows AUROC on the test set(s) and how it evolves as training goes on, using both max-logits and max-softmax for scoring, showing how it might be better to use max-of-logits.
However, the ARPL code for the Softmax loss (found here), which you are inheriting and using for testing, is a bit weird: it calls logits to the post-softmax activation, see here.
Since you are taking the (false) logits from calling the criterion (here) during testing, and then you have a few lines below the option of (re-)applying softmax to them if we are running with
'use_softmax_in_eval
, I am wondering if what you are calling in your experiments from the paper "logits" are actuallysoftmax(logits)
, and what you call softmax activations are indeedsoftmax(softmax(logits))
?Thanks!
Adrian
The text was updated successfully, but these errors were encountered: