Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate HFCrossEntropy and Perplexity #1857

Merged
merged 32 commits into from
Feb 6, 2023
Merged

Conversation

dakinggg
Copy link
Contributor

@dakinggg dakinggg commented Jan 5, 2023

What does this PR do?

This PR adds DeprecationWarnings to HFCrossEntropy and Perplexity, as the separation between these and LanguageCrossEntropy is confusing. To mitigate removing these, this PR also adds support for Mapping input to LanguageCrossEntropy.update and adds LanguagePerplexity(LanguageCrossEntropy).

More context:
There is a slight difference between LanguageCrossEntropy and HFCrossEntropy due to how the loss is reduced. This creates confusion in the examples repo, which uses LanguageCrossEntropy and Perplexity. There is a possible small cost to this change, because HFCrossEntropy uses output['loss'] (if available) from HF rather than recomputing the loss. LanguageCrossEntropy will always recompute the loss so that the reduction is consistent and LanguagePerplexity always matches LanguageCrossEntropy. The examples repo was always returning the logits from forward already, so this slight cost was already present in the examples repo.

What issue(s) does this change relate to?

Closes CO-1616

Before submitting

  • Have you read the contributor guidelines?
  • Was this change discussed/approved in a GitHub issue first? It is much more likely to be merged if so.
  • Did you update any related docs and document your change?
  • Did you update any related tests and add any new tests related to your change? (see testing)
  • Did you run the tests locally to make sure they pass?
  • Did you run pre-commit on your change? (see the pre-commit section of prerequisites)

Sorry, something went wrong.

@dakinggg dakinggg requested a review from a team as a code owner January 5, 2023 22:35
@vchiley
Copy link
Contributor

vchiley commented Jan 6, 2023

LanguageCrossEntropy init requires vocab_size; its only used here like this:

        assert isinstance(output, Tensor)
        output = output.view(-1, self.vocab_size)
        target = target.view(-1)
        losses = self.loss_fn(output, target)

can we instead do something like:

        assert isinstance(output, Tensor)
        target = target.view(-1)
        output = output.view(target.size(0), -1)
        losses = self.loss_fn(output, target)

this will remove the need for needing to pass vocab_size.

I'm also not sure if there is a fundamental diff between these and our generic CE metric.

@dakinggg
Copy link
Contributor Author

dakinggg commented Jan 6, 2023

It looks like the difference is that CrossEntropy supports one hot targets, does not do the reshaping for you, and (with this PR) does not support Mapping input. On your comment about eliminating the need for vocab_size, yeah that seems good.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Contributor

@mvpatel2000 mvpatel2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't merge until review from NLP person as well. LGTM for eng side

@dakinggg
Copy link
Contributor Author

@abhi-mosaic @vchiley one of you mind taking a look to approve from the NLP side?

Copy link
Contributor

@abhi-mosaic abhi-mosaic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! After this gets merged and released I will change our imports in examples/[llm, bert], likely on 0.13 release

@mvpatel2000
Copy link
Contributor

@dakinggg lets hold until after 12.1

@dakinggg
Copy link
Contributor Author

dakinggg commented Feb 1, 2023

@mvpatel2000 yeah, that was my plan

@dakinggg dakinggg enabled auto-merge (squash) February 6, 2023 18:48
@dakinggg dakinggg merged commit 48d40f9 into mosaicml:dev Feb 6, 2023
@dakinggg dakinggg deleted the xent branch September 9, 2023 22:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants