-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distributed optimizer support for BERT #5305
Conversation
Running on 4 A40s for 20 steps (in BF16 with O2 optimizations):
The memory savings are actually more than I'd expect: the model has 55.4M parameters, so I'd expect ZeRO with 4-way data-parallelism would save |
@timmoon10 can you please resolve merge conflicts |
edd25e0
to
66c4a00
Compare
Signed-off-by: Tim Moon <[email protected]>
66c4a00
to
c95b17e
Compare
Running on 4 A100s for 20 steps (in BF16 with O2 optimizations):
I think this PR is good to go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
Signed-off-by: Tim Moon <[email protected]> Signed-off-by: Tim Moon <[email protected]> Signed-off-by: Hainan Xu <[email protected]>
Signed-off-by: Tim Moon <[email protected]> Signed-off-by: Tim Moon <[email protected]> Signed-off-by: Hainan Xu <[email protected]>
Signed-off-by: Tim Moon <[email protected]> Signed-off-by: Tim Moon <[email protected]>
Signed-off-by: Tim Moon <[email protected]> Signed-off-by: Tim Moon <[email protected]> Signed-off-by: andrusenkoau <[email protected]>
What does this PR do ?
Add support for the distributed Adam optimizer to BERT.
Collection: NLP
Changelog
Usage
Set the optimizer to
distributed_fused_adam
in the config file:NeMo/examples/nlp/language_modeling/conf/megatron_bert_config.yaml
Line 118 in 79a22ea
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Pinging @shanmugamr1992.
Additional Information