Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

document deepspeed.initialize() #644

Merged
merged 5 commits into from
Jan 8, 2021
Merged

Conversation

stas00
Copy link
Collaborator

@stas00 stas00 commented Jan 7, 2021

As a follow up to the very helpful suggestion I received at #633 I propose this doc update.

Thank you!

Fixes: #633

@jeffra

@jeffra
Copy link
Collaborator

jeffra commented Jan 7, 2021

We currently briefly reference this call related to MPI/AzureML environments (https://www.deepspeed.ai/getting-started/#mpi-and-azureml-compatibility) but thanks for adding it higher up since it seems others will also find this useful.

@stas00
Copy link
Collaborator Author

stas00 commented Jan 7, 2021

Honestly I'd have never looked there unless I had MPI or Azure ;)

Should I tweak the doc to add that it uses the NCCL backend?

@jeffra
Copy link
Collaborator

jeffra commented Jan 7, 2021

Haha totally understandable, it was kind of hidden away it seems :) It also needs to be added to our RTD documentation as well. I'll do that right now actually. You can choose the backend as a parameter, but it does default to NCCL and that's primarily what we've tested everything with across the project. Feel free to add some text about NCCL though.

@stas00
Copy link
Collaborator Author

stas00 commented Jan 7, 2021

Thank you, @jeffra

I added the note about the default, but I couldn't find init_distributed's documentation so I couldn't link to it for further details.

@jeffra
Copy link
Collaborator

jeffra commented Jan 7, 2021

Just added it to RTD docs with this PR #645

@jeffra
Copy link
Collaborator

jeffra commented Jan 7, 2021

If you want to add a link to it, it's live now: https://deepspeed.readthedocs.io/en/latest/initialize.html#distributed-initialization

@stas00
Copy link
Collaborator Author

stas00 commented Jan 8, 2021

That's excellent. Added the link.

@jeffra jeffra merged commit 828d75b into microsoft:master Jan 8, 2021
@stas00 stas00 deleted the dist-init branch January 8, 2021 18:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

WarmupDecayLR.params.total_num_steps - total or per gpu?
2 participants