Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tutorial -- Clarify on how we perform optimization #90

Closed
nicolay-r opened this issue Mar 18, 2021 · 1 comment
Closed

Tutorial -- Clarify on how we perform optimization #90

nicolay-r opened this issue Mar 18, 2021 · 1 comment
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@nicolay-r
Copy link
Owner

nicolay-r commented Mar 18, 2021

logits_unscaled, logits_unscaled_dropped = self.init_logits_unscaled(context_embedding)

NOTE:
This should be moved and clarified into another repository, which is related to benchmark results for RuSentRel-1.2

@nicolay-r nicolay-r added the documentation Improvements or additions to documentation label Mar 18, 2021
@nicolay-r nicolay-r self-assigned this Mar 18, 2021
@nicolay-r
Copy link
Owner Author

nicolay-r commented Mar 24, 2021

We may refer to this work:
https://arxiv.org/abs/2006.13730
which is relies on this paper in terms of SGD application, bags teminology, instances selection within bags:
https://www.aclweb.org/anthology/D15-1203.pdf

Since the later already provides the correct description.
The slight problem in paper that is describes MAX towards the labels rather than bags.
So for sample gradients within bags we adopt avg function, where the main assumption is that we take into account other synonymous attitudes.
We use this feature in earlier works (https://github.com/nicolay-r/sentiment-pcnn/tree/clls-2018)
Anyway, since in last research we adopt BagSize = 1, it means that we do not exploit this feature.

in the original approach https://www.aclweb.org/anthology/D15-1203.pdf,
authors select a best instance j-th within a bag, where best denotes a max value of p(y_i|m_i,j) across all other values within a bag. This way we obtain Loss function on bags level and hence use the result value in order to update Theta using stochastic SGD (using AdaDelta)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

1 participant