Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ds nvme #34444

Merged
merged 14 commits into from
Nov 21, 2024
Merged

Fix ds nvme #34444

merged 14 commits into from
Nov 21, 2024

Conversation

eljandoubi
Copy link
Contributor

What does this PR do?

Fixes #34429

Who can review?

Models:

Integrations:

@eljandoubi
Copy link
Contributor Author

@SunMarc @muellerzr Do you have any feedback?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@ArthurZucker
Copy link
Collaborator

pinging @muellerzr !

Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @eljandoubi , thanks for the contribution. Could you explain why this is needed ? Thanks !

@eljandoubi
Copy link
Contributor Author

Hey @SunMarc. When using NVMe offloading, recursively calling deepspeed.zero.Init causes a pinning error. Because Transformers and accelerate do not support multiple models, deepspeed.zero.Init should be called at most once per process.

@SunMarc
Copy link
Member

SunMarc commented Nov 5, 2024

Thanks for the additional context ! I think that we do support multiple models with deepspeed https://huggingface.co/docs/accelerate/en/usage_guides/deepspeed_multiple_model . cc @muellerzr

@eljandoubi
Copy link
Contributor Author

Anyway, we might use a context manager solution to prevent embedded calls to DeepSpeed.zero.Init.

@eljandoubi
Copy link
Contributor Author

@ArthurZucker @SunMarc @muellerzr Is there any update on the pull request?

Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ! I think you pushed some unrelated changes. Could you revert those ?

@eljandoubi
Copy link
Contributor Author

@SunMarc can you underscore them?

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, let's revert notebook changes, and add a one liner about what we are adding!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this and all ipynb are unrelated

@@ -223,6 +224,15 @@ def set_quantized_state():
finally:
_is_quantized = False

@contextmanager
def set_zero3_state():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have a small comment documentation about that (explaning the issue we fixed with this basically) 😉

Copy link
Contributor

@muellerzr muellerzr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Same note as Arthur, otherwise LG2M!

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks 🤗

@ArthurZucker
Copy link
Collaborator

Just missing make style

@SunMarc SunMarc merged commit d6a5c23 into huggingface:main Nov 21, 2024
24 checks passed
BernardZach pushed a commit to BernardZach/transformers that referenced this pull request Dec 5, 2024
* skip nested deepspeed.zero.Init call

* make fixup

* solve conflict

* solve conflict

* put back local

* use context mangers instead of local thread

* Skip recursive calls to deepspeed.zero.Init

* Skip recursive calls to deepspeed.zero.Init

* back to old notebooks

* make style
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

deepspeed zero3 NVMe offload is not working
5 participants