-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added resources on albert model #20697
Added resources on albert model #20697
Conversation
Successful raising errors and exceptions on the revised code in test_modeling_distilbert.py . Co-credit: @Batese2001
…y to defined condition that asserts statements (Co-author: Batese2001)
…y to defined condition that asserts statements (Co-author: Batese2001)
… having the even number of multi heads
… having the even number of multi heads
Co-authored-by: [email protected]
Co-authored-by: Adia Wu <[email protected]>
Co-authored-by: Adia Wu <[email protected]>
Co-authored-by: Adia Wu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much @JuheonChu for adding the resources for ALBERT !
I left a couple of comments, the main ones being reverting the changes that you did by mistake on modeling_distilbert.py
. Also make sure that the text you added is rendered correctly! Let us know if you need any help
_Increasing model size when pretraining natural language representations often results in improved performance on | ||
downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations, | ||
longer training times, and unexpected model degradation. To address these problems, we present two parameter-reduction | ||
techniques to lower memory consumption and increase the training speed of BERT. Comprehensive empirical evidence shows | ||
that our proposed methods lead to models that scale much better compared to the original BERT. We also use a | ||
self-supervised loss that focuses on modeling inter-sentence coherence, and show it consistently helps downstream tasks | ||
with multi-sentence inputs. As a result, our best model establishes new state-of-the-art results on the GLUE, RACE, and | ||
SQuAD benchmarks while having fewer parameters compared to BERT-large.* | ||
SQuAD benchmarks while having fewer parameters compared to BERT-large._ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you revert these changes? 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that mean deleting "_"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means you can leave the asterisks *
instead of using an underscore _
# Have an even number of multi heads that divide the dimensions | ||
if self.dim % self.n_heads != 0: | ||
# Raise value errors for even multi-head attention nodes | ||
raise ValueError(f"self.n_heads: {self.n_heads} must divide self.dim: {self.dim} evenly") | ||
assert self.dim % self.n_heads == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also revert the changes here ? It seems that you have deleted this by mistake
Co-authored-by: Younes Belkada <[email protected]>
Co-authored-by: Adia Wu <[email protected]> Co-authored-by: mollerup23 <[email protected]>
…JuheonChu/transformers into added-resources-on-ALBERT-model
Co-authored-by: Adia Wu <[email protected]> Co-authored-by: mollerup23 <[email protected]>
Co-authored-by: Adia Wu <[email protected]> Co-authored-by: mollerup23 <[email protected]>
Thank you @younesbelkada ! Would you mind if I ask you how I can pass the I tried |
Co-authored-by: Adia Wu <[email protected]> Co-authored-by: mollerup23 <[email protected]>
Co-authored-by: Adia Wu <[email protected]> Co-authored-by: mollerup23 <[email protected]>
Co-authored-by: Adia Wu <[email protected]> Co-authored-by: mollerup23 <[email protected]>
Thanks for your PR! Could you focus it solely on the new resources added? There are multiple changes that are not desired. |
Co-authored-by: Adia Wu <[email protected]> Co-authored-by: mollerup23 <[email protected]>
Thank you! Will try! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution!
I think it might be easier to open a new PR with changes only to the albert.mdx
file because now the modeling_distilbert.py
has been deleted and we don't want that!
_Increasing model size when pretraining natural language representations often results in improved performance on | ||
downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations, | ||
longer training times, and unexpected model degradation. To address these problems, we present two parameter-reduction | ||
techniques to lower memory consumption and increase the training speed of BERT. Comprehensive empirical evidence shows | ||
that our proposed methods lead to models that scale much better compared to the original BERT. We also use a | ||
self-supervised loss that focuses on modeling inter-sentence coherence, and show it consistently helps downstream tasks | ||
with multi-sentence inputs. As a result, our best model establishes new state-of-the-art results on the GLUE, RACE, and | ||
SQuAD benchmarks while having fewer parameters compared to BERT-large.* | ||
SQuAD benchmarks while having fewer parameters compared to BERT-large._ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means you can leave the asterisks *
instead of using an underscore _
@@ -67,104 +110,84 @@ This model was contributed by [lysandre](https://huggingface.co/lysandre). This | |||
|
|||
## AlbertModel | |||
|
|||
[[autodoc]] AlbertModel | |||
- forward | |||
[[autodoc]] AlbertModel - forward |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can leave this alone as well and allow the forward
method to be listed under the AlbertModel
object. Same comment applies to all the other objects changed below :)
Do you mind if I open a new Pull Request in order to contain only meaningful commits? |
Yes please, that'd be great! |
What does this PR do?
Co-author: @adia Wu [email protected]
Fixes #20055
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@stevhliu @younesbelkada