-
Notifications
You must be signed in to change notification settings - Fork 432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecate HFCrossEntropy and Perplexity #1857
Conversation
LanguageCrossEntropy init requires
can we instead do something like:
this will remove the need for needing to pass I'm also not sure if there is a fundamental diff between these and our generic CE metric. |
It looks like the difference is that |
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't merge until review from NLP person as well. LGTM for eng side
@abhi-mosaic @vchiley one of you mind taking a look to approve from the NLP side? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! After this gets merged and released I will change our imports in examples/[llm, bert]
, likely on 0.13 release
@dakinggg lets hold until after 12.1 |
@mvpatel2000 yeah, that was my plan |
What does this PR do?
This PR adds
DeprecationWarning
s toHFCrossEntropy
andPerplexity
, as the separation between these andLanguageCrossEntropy
is confusing. To mitigate removing these, this PR also adds support forMapping
input toLanguageCrossEntropy.update
and addsLanguagePerplexity(LanguageCrossEntropy)
.More context:
There is a slight difference between
LanguageCrossEntropy
andHFCrossEntropy
due to how the loss is reduced. This creates confusion in the examples repo, which usesLanguageCrossEntropy
andPerplexity
. There is a possible small cost to this change, becauseHFCrossEntropy
usesoutput['loss']
(if available) from HF rather than recomputing the loss.LanguageCrossEntropy
will always recompute the loss so that the reduction is consistent andLanguagePerplexity
always matchesLanguageCrossEntropy
. The examples repo was always returning the logits fromforward
already, so this slight cost was already present in the examples repo.What issue(s) does this change relate to?
Closes CO-1616
Before submitting
pre-commit
on your change? (see thepre-commit
section of prerequisites)