Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TF: GPT2 with native embedding layers #23436

Merged
merged 4 commits into from
May 18, 2023
Merged

TF: GPT2 with native embedding layers #23436

merged 4 commits into from
May 18, 2023

Conversation

gante
Copy link
Member

@gante gante commented May 17, 2023

What does this PR do?

This PR continues the (paused) goal of deprecating our custom TF embedding layers and related code. Previously, we have converted encoder-decoder models (e.g. here), removing TFSharedEmbeddings there and making the necessary adaptations.

In this PR, I make the necessary adaptations for GPT2. The goal is for you, the reviewers, to raise objections in this PR :D All slow tests for TF GPT2 are passing.

Then, the following sequence of PRs will be opened:

  1. Remove TFSharedEmbeddings from the other decoder-only models
  2. Remove other uses of TFSharedEmbeddings in the codebase (e.g. in tests)
  3. Remove resize_token_embeddings and all related functions (it is only used to resize our models' embeddings instantiated with TFSharedEmbeddings)
  4. Remove the slow decorator from test_save_load_after_resize_token_embeddings, which will be fixed as a consequence of these changes 🙌

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented May 17, 2023

The documentation is not available anymore as the PR was closed or merged.

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice - thanks for adding!

src/transformers/modeling_tf_utils.py Outdated Show resolved Hide resolved
Copy link
Member

@Rocketknight1 Rocketknight1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks clean to me!

@gante gante merged commit db13634 into huggingface:main May 18, 2023
@gante gante deleted the tf_resize branch May 18, 2023 13:46
sheonhan pushed a commit to sheonhan/transformers that referenced this pull request Jun 1, 2023
gojiteji pushed a commit to gojiteji/transformers that referenced this pull request Jun 5, 2023
novice03 pushed a commit to novice03/transformers that referenced this pull request Jun 23, 2023
@arivero
Copy link

arivero commented Apr 4, 2024

I noticed this change one year later, while doing a demo of LoRA in the embedding layer :-D Nice, but I am always afraid of doing tensorflow operations outside of keras layers. Usually masks are lost, as well as any subtle things that depend of chaining layers. It is not a problem in transformers because the -100 trick is the standard mask and the loss is aware of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants