Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Including support for Deepspeed 0.8.0 #14506

Merged
merged 5 commits into from
Feb 1, 2023
Merged

Conversation

ajindal1
Copy link
Contributor

Description

Including Support for Deepspeed 0.8.0.

Motivation and Context

Deepspeed 0.8.0 has a bug fix and mlfow integration.

@ytaous
Copy link
Contributor

ytaous commented Jan 31, 2023

Can u pls test it locally to make sure both 0.7.3 and 0.8.0 would run thru your code wo/ issue?
thx

@ajindal1
Copy link
Contributor Author

@ytaous Yes, I have already verified that it works for both 0.7.3 and 0.8.0.

ytaous
ytaous previously approved these changes Jan 31, 2023
pass
else:
if not get_accelerator().device_name().startswith("cuda"):
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For better debugging, please also add a warning.warn here to specify the reason of failure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, let's avoid warning unless it is really a warning. For debugging, we should probably add logging capabilities and log as debug or info.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's kind of warning or error. When it goes there, it means user requires to modify the optimizer, if we cannot do that, we should tell/warn user the reason.

@@ -39,14 +39,22 @@ def can_be_modified(self):
# it's safe to update the version supporting list. Otherwise, or the file is moved or renamed,
# we need to check the implementation of these functions in detail.
ds_version = Version(deepspeed.__version__)
if ds_version > Version("0.7.3") or ds_version < Version("0.4.0"):
if ds_version > Version("0.8.0") or ds_version < Version("0.4.0"):
warnings.warn(
"Skip modifying optimizer because of unsupported DeepSpeed version {}, "
"supported version: 0.4.0 - 0.7.3.".format(deepspeed.__version__),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to update this to 0.8.0? Can we create a variable with version number for start and end versions and reference that instead of making changes everywhere?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's only place needs the update, thus there's no need to create start/end variables

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The warning message has 0.7.3 in it. I think we missed updating that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"supported version: 0.4.0 - 0.7.3.".format(deepspeed.__version__),
"supported version: 0.4.0 - 0.8.0.".format(deepspeed.__version__),

Comment on lines 52 to 53
except ImportError as e:
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under what circumstances would this import fail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import would fail for older deepspeed versions (I think for version less than 0.8.0). This is because get_accelerator was introduced recently by deepspeed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the new DS code would use a factory to create the accelerator.
https://github.com/delock/DeepSpeedSYCLSupport/blob/e9687254eae72608ed8c76a185d5f6cffe3fd6b6/accelerator/real_accelerator.py#L49
There might be case where "XPU_Accelerator" package is unavailable. Adding you to a chat thread.

@ajindal1 ajindal1 merged commit 6fa4555 into main Feb 1, 2023
@ajindal1 ajindal1 deleted the abjindal/update_deepspeed_version branch February 1, 2023 14:19
@baijumeswani baijumeswani added the training issues related to ONNX Runtime training; typically submitted using template label Feb 1, 2023
@faxu faxu added the triage:approved Approved for cherrypicks for release label Feb 1, 2023
rui-ren pushed a commit that referenced this pull request Feb 3, 2023
### Description
Including Support for Deepspeed 0.8.0.



### Motivation and Context
Deepspeed 0.8.0 has a bug fix and mlfow integration.
rui-ren pushed a commit that referenced this pull request Feb 3, 2023
### Description
Including Support for Deepspeed 0.8.0.



### Motivation and Context
Deepspeed 0.8.0 has a bug fix and mlfow integration.
rui-ren pushed a commit that referenced this pull request Feb 3, 2023
### Description
Including Support for Deepspeed 0.8.0.



### Motivation and Context
Deepspeed 0.8.0 has a bug fix and mlfow integration.
rui-ren pushed a commit that referenced this pull request Feb 3, 2023
### Description
Including Support for Deepspeed 0.8.0.



### Motivation and Context
Deepspeed 0.8.0 has a bug fix and mlfow integration.
@faxu faxu removed the release:1.14 label Feb 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
training issues related to ONNX Runtime training; typically submitted using template triage:approved Approved for cherrypicks for release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants