Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added resources on albert model #20697

Closed
wants to merge 42 commits into from
Closed

Added resources on albert model #20697

wants to merge 42 commits into from

Conversation

JuheonChu
Copy link
Contributor

What does this PR do?

Co-author: @adia Wu [email protected]
Fixes #20055

Before submitting

  • [o] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • [o] Did you read the contributor guideline,
    Pull Request section?
  • [o] Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • [o] Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • [o] Did you write any new necessary tests?

Who can review?

@stevhliu @younesbelkada

JuheonChu and others added 28 commits November 24, 2022 01:34
Successful raising errors and exceptions on the revised code in test_modeling_distilbert.py .

Co-credit: @Batese2001
…y to defined condition that asserts statements (Co-author: Batese2001)
…y to defined condition that asserts statements (Co-author: Batese2001)
@JuheonChu JuheonChu mentioned this pull request Dec 9, 2022
Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much @JuheonChu for adding the resources for ALBERT !
I left a couple of comments, the main ones being reverting the changes that you did by mistake on modeling_distilbert.py. Also make sure that the text you added is rendered correctly! Let us know if you need any help

Comment on lines +26 to +33
_Increasing model size when pretraining natural language representations often results in improved performance on
downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations,
longer training times, and unexpected model degradation. To address these problems, we present two parameter-reduction
techniques to lower memory consumption and increase the training speed of BERT. Comprehensive empirical evidence shows
that our proposed methods lead to models that scale much better compared to the original BERT. We also use a
self-supervised loss that focuses on modeling inter-sentence coherence, and show it consistently helps downstream tasks
with multi-sentence inputs. As a result, our best model establishes new state-of-the-art results on the GLUE, RACE, and
SQuAD benchmarks while having fewer parameters compared to BERT-large.*
SQuAD benchmarks while having fewer parameters compared to BERT-large._
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you revert these changes? 🙏

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean deleting "_"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means you can leave the asterisks * instead of using an underscore _

docs/source/en/model_doc/albert.mdx Outdated Show resolved Hide resolved
Comment on lines 145 to 144
# Have an even number of multi heads that divide the dimensions
if self.dim % self.n_heads != 0:
# Raise value errors for even multi-head attention nodes
raise ValueError(f"self.n_heads: {self.n_heads} must divide self.dim: {self.dim} evenly")
assert self.dim % self.n_heads == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also revert the changes here ? It seems that you have deleted this by mistake

@JuheonChu
Copy link
Contributor Author

Thank you @younesbelkada ! Would you mind if I ask you how I can pass the
ci/circleci: tests_pipelines_tf tests?

I tried
pip3 install --upgrade pip
pip3 install --upgrade tensorflow

Juheon Chu and others added 3 commits December 9, 2022 05:27
Co-authored-by: Adia Wu <[email protected]>
Co-authored-by: mollerup23 <[email protected]>
Co-authored-by: Adia Wu <[email protected]>
Co-authored-by: mollerup23 <[email protected]>
Co-authored-by: Adia Wu <[email protected]>
Co-authored-by: mollerup23 <[email protected]>
@sgugger
Copy link
Collaborator

sgugger commented Dec 9, 2022

Thanks for your PR! Could you focus it solely on the new resources added? There are multiple changes that are not desired.

Juheon Chu and others added 2 commits December 9, 2022 12:09
Co-authored-by: Adia Wu <[email protected]>
Co-authored-by: mollerup23 <[email protected]>
@JuheonChu
Copy link
Contributor Author

Thank you! Will try!
So, I deleted those undesired behaviors, and now I will look for more resources to add!

Copy link
Member

@stevhliu stevhliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution!

I think it might be easier to open a new PR with changes only to the albert.mdx file because now the modeling_distilbert.py has been deleted and we don't want that!

Comment on lines +26 to +33
_Increasing model size when pretraining natural language representations often results in improved performance on
downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations,
longer training times, and unexpected model degradation. To address these problems, we present two parameter-reduction
techniques to lower memory consumption and increase the training speed of BERT. Comprehensive empirical evidence shows
that our proposed methods lead to models that scale much better compared to the original BERT. We also use a
self-supervised loss that focuses on modeling inter-sentence coherence, and show it consistently helps downstream tasks
with multi-sentence inputs. As a result, our best model establishes new state-of-the-art results on the GLUE, RACE, and
SQuAD benchmarks while having fewer parameters compared to BERT-large.*
SQuAD benchmarks while having fewer parameters compared to BERT-large._
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means you can leave the asterisks * instead of using an underscore _

@@ -67,104 +110,84 @@ This model was contributed by [lysandre](https://huggingface.co/lysandre). This

## AlbertModel

[[autodoc]] AlbertModel
- forward
[[autodoc]] AlbertModel - forward
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can leave this alone as well and allow the forward method to be listed under the AlbertModel object. Same comment applies to all the other objects changed below :)

@JuheonChu
Copy link
Contributor Author

Do you mind if I open a new Pull Request in order to contain only meaningful commits?

@stevhliu
Copy link
Member

stevhliu commented Dec 9, 2022

Yes please, that'd be great!

@JuheonChu JuheonChu closed this Dec 9, 2022
@JuheonChu JuheonChu deleted the added-resources-on-ALBERT-model branch December 9, 2022 19:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Model resources contribution
4 participants