-
Notifications
You must be signed in to change notification settings - Fork 27.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Deepspeed] Allow HF optimizer and scheduler to be passed to deepspeed #10464
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a very good proposal, @cli99.
So are you saying DeepSpeed will handle any optimizer and scheduler if these are passed directly as an object?
One more thing we need is a test, it probably would be easier for you if I added it since I need to rework a few things to support this test. But you will need to switch it to normal PR from draft for me to be able to push into it. We usually don't use Draft but edit the PR title to start with [WIP]
work in progress.
src/transformers/trainer_utils.py
Outdated
ZERO_DP_2 = "zero2" | ||
ZERO_DP_3 = "zero3" | ||
OFFLOAD = "offload" | ||
ZERO_DP_2 = "zero_dp_2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for catching those discrepancies.
Ideally let's try not to cross unrelated areas in the same PR, but I suppose it's OK for now.
OK, 2 tests added and no, this doesn't work w/o neither the default optimizer nor the default scheduler. e.g. if you comment out the I didn't have time to investigate as it's late, so just sharing the outputs at the moment - will look closer tomorrow. I think both are issues on the DeepSpeed side, but I could be wrong. Also note that the normal CI doesn't run these tests, so green doesn't say anything about those.
|
OK, so I had a look at the first failing test. These 2 can't be separated the way it was done, since the optimizer is needed to init the scheduler. But we don't have it yet if it's Deepspeed that creates the optimizer. So we have a chicken-n-egg problem here. Unless deepspeed provides a new API to handle that. So probably at the moment we can only support one of: 1, 2, 3 and not 4.
Note I added a new test for the combo 2 and renamed all tests to match, so now we have:
|
This deepspeed PR deepspeedai/DeepSpeed#827 fixes the issues. The following tests would pass. Shall we put a check in HF to disallow the case HF scheduler + DS optimizer? |
I tested with your deepspeedai/DeepSpeed#827 PR tree and indeed
now pass. awesome!
Correct! Please let me know if you will be taking care of it or you'd rather me finish this up. Either way works. Also will need to update the docs to reflect this new more flexible reality. I can take care of that. We will need to wait for a new release from your side to commit this PR and add a requirement for that new version. I already have this check setup in another PR waiting for this new release: #9624 There is another PR by @jeffra today that also needs a new release first before we can update our tree. |
Great. Can you please add the check for the case HF scheduler + DS optimizer? Since you are updating the docs, I think it makes more sense for you to do it. I will work with @jeffra to push the deepspeed PRs into the new release. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your PR!
It makes sense to split the optimizer and scheduler creation as you did, and since it's in a BC compatible way I have no worries. Would you mind adding those two new methods in the documentation of the Trainer
(top of the file main_classes/trainer.rst)?
Co-authored-by: Sylvain Gugger <[email protected]>
@cli99, I made further changes to your original code
Please let me know if I broke anything in your original plan. I have also updated the docs extensively. They look a bit scary at the moment and will need a rework down the road. My main goal here is to prevent from user getting subtle errors, so setting command line arguments to override DS config. Hope it makes sense. |
@sgugger, I made more doc updates - if you get a chance please kindly skim over them? Thank you! I think we will merge this on Monday when deepspeed==0.3.13 is planned to be released. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one nit in the documentation. Thanks for all the work!
I'm on top of this - we are waiting for a new DeepSpeed release required by this PR. Thank you, @sgugger |
huggingface#10464) * pass hf optimizer and scheduler to deepspeed if not specified in ds config * pass hf optimizer and scheduler to deepspeed if not specified in ds config * update * make init_deepspeed support config dict * fix docstring formatting * clean up trainer's comments * add new tests * fix type * composit argparse doesn't work * style * add a new test, rename others * document new functionality * complete tests, add docs * style * correct level * Apply suggestions from code review Co-authored-by: Sylvain Gugger <[email protected]> * add new methods to the doc * must tell DS we are using a non-native optimizer * add protection against cpu_offload + HF optimizer combo * fix the cli overrides * sync docs + tests * restore AdamW * better docs * need new version * no longer needed * remove outdate information * refactor duplicated code Co-authored-by: Stas Bekman <[email protected]> Co-authored-by: Stas Bekman <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]>
Use HF optimizer and/or scheduler unless specified in deepspeed config
If HF is already creating an optimizer and LR scheduler, we should not try to match that config/implementation in a ds_config.json instead we pass it to deepspeed.initialize(..., lr_scheduler=hf_lr_scheduler)
create_optimizer
orcreate_cheduler
(after splitting it) to create an optimizer or scheduler. Then HF optimizer and scheduler are passed it to deepspeed.initialize().DeepSpeed can handle any optimizer and scheduler if these are passed directly to deepspeed.initialize() as an object.
Due to the chicken-n-egg init problem, the valid combinations are:
but if
cpu_offload
is used all bets are off - we can only use DS optim/sched.added by @stas00 below:
Added:
cpu_offload
- add testblocking event: waiting for a new release 0.3.13 from DeepSpeed.
@sgugger