Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline parallel training engine. #392

Merged
merged 45 commits into from
Sep 10, 2020

Conversation

ShadenSmith
Copy link
Contributor

No description provided.

Shaden Smith and others added 30 commits September 2, 2020 14:39
* cleaning pipe logging

* Fixes checkpointing with non-float activations.

* less verbose output

* improve pipeline installation

* Improves startup time and reduces logging.

* reduces logging

* reduces progress reporting

* removing test-pipe/

* DSE commit?

* trying out new pip dependency resolver

* specify torchvision version for compatibility

* pip upgrade-strategy

* quiet installation

* pre-install torch with pip

* wrong pip options

* more wrong pip options lol

* torch version macro

* fp16 paramdict build fail

* only fused lamb

* improving timers

Co-authored-by: Shaden Smith <[email protected]>
* Tied module indexing bugfix.

* Train and inference pipeline schedules.

* Move code quality tests to Azure-hosted agents. (deepspeedai#368)
@ShadenSmith ShadenSmith added the enhancement New feature or request label Sep 10, 2020
@ShadenSmith ShadenSmith added documentation Improvements or additions to documentation website Edits to the DeepSpeed website(s) labels Sep 10, 2020
@ShadenSmith ShadenSmith merged commit 65c2f97 into deepspeedai:master Sep 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request website Edits to the DeepSpeed website(s)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants