Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Including support for Deepspeed 0.8.0 #14506

Merged
merged 5 commits into from
Feb 1, 2023
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 14 additions & 1 deletion orttraining/orttraining/python/training/optim/_ds_modifier.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,14 +39,27 @@ def can_be_modified(self):
# it's safe to update the version supporting list. Otherwise, or the file is moved or renamed,
# we need to check the implementation of these functions in detail.
ds_version = Version(deepspeed.__version__)
if ds_version > Version("0.7.3") or ds_version < Version("0.4.0"):
if ds_version > Version("0.8.0") or ds_version < Version("0.4.0"):
warnings.warn(
"Skip modifying optimizer because of unsupported DeepSpeed version {}, "
"supported version: 0.4.0 - 0.7.3.".format(deepspeed.__version__),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to update this to 0.8.0? Can we create a variable with version number for start and end versions and reference that instead of making changes everywhere?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's only place needs the update, thus there's no need to create start/end variables

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The warning message has 0.7.3 in it. I think we missed updating that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"supported version: 0.4.0 - 0.7.3.".format(deepspeed.__version__),
"supported version: 0.4.0 - 0.8.0.".format(deepspeed.__version__),

UserWarning,
)
return False

try:
from deepspeed.accelerator import get_accelerator
except ImportError as e:
warnings.warn("Unable to import get_accelerator from deepspeed.accelerator", UserWarning)
pass
github-advanced-security[bot] marked this conversation as resolved.
Fixed
Show resolved Hide resolved
else:
if not get_accelerator().device_name().startswith("cuda"):
warnings.warn(
"Skip modifying optimizer as device is not supported, "
"device name: {}".format(get_accelerator().device_name()), UserWarning,
)
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For better debugging, please also add a warning.warn here to specify the reason of failure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, let's avoid warning unless it is really a warning. For debugging, we should probably add logging capabilities and log as debug or info.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's kind of warning or error. When it goes there, it means user requires to modify the optimizer, if we cannot do that, we should tell/warn user the reason.


return self.check_requirements(
["has_overflow_serial", "get_grad_norm_direct", "has_overflow_partitioned_grads_serial"],
require_apex=False,
Expand Down