You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been reviewing how the Gumbel-Softmax[1] trick was used and both the paper and the code suggest that the "relevance scores are interpreted as log probabilities"[2] but how come the output of a convolutional layer is interpreted as being a strictly negative quantity? (This is unlikely to break training but silently yield suboptimal performance due to inaccurate approximate sampling from the discrete distribution)
Please let me know, maybe there is a subtle intuition or training dynamic at play here that I am missing. Thanks!
Thanks for releasing the code!
I have been reviewing how the Gumbel-Softmax[1] trick was used and both the paper and the code suggest that the "relevance scores are interpreted as log probabilities"[2] but how come the output of a convolutional layer is interpreted as being a strictly negative quantity? (This is unlikely to break training but silently yield suboptimal performance due to inaccurate approximate sampling from the discrete distribution)
Please let me know, maybe there is a subtle intuition or training dynamic at play here that I am missing. Thanks!
[1] https://arxiv.org/pdf/1611.01144.pdf (Equation 1)
[2] https://arxiv.org/pdf/1711.11503.pdf (Section 3.3, page 5)
The text was updated successfully, but these errors were encountered: