Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ZeRO-Offload blog post link #401

Merged
merged 22 commits into from
Sep 10, 2020
Merged
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
9d6110e
Update installation instructions
tjruwase Sep 4, 2020
e525e68
Format fix
tjruwase Sep 4, 2020
d492642
Merge branch 'master' into olruwase/docs
Sep 6, 2020
326573c
Merge branch 'master' of github.com:microsoft/DeepSpeed into olruwase…
tjruwase Sep 6, 2020
15f80a2
Merge branch 'olruwase/docs' of github.com:microsoft/DeepSpeed into o…
tjruwase Sep 6, 2020
e3f93df
ZeRO tutorial
tjruwase Sep 9, 2020
7cf4898
Merge branch 'master' of github.com:microsoft/DeepSpeed into olruwase…
tjruwase Sep 9, 2020
f532570
Format fixes
tjruwase Sep 9, 2020
2707f03
Merge branch 'master' into olruwase/docs
tjruwase Sep 9, 2020
9a9eda4
Merge branch 'master' into olruwase/docs
tjruwase Sep 10, 2020
12e0312
Merge branch 'master' of github.com:microsoft/DeepSpeed into olruwase…
tjruwase Sep 10, 2020
b10a11c
ZeRO-Offload
tjruwase Sep 10, 2020
c8ae5c8
Merge branch 'olruwase/docs' of github.com:microsoft/DeepSpeed into o…
tjruwase Sep 10, 2020
6dd6276
ZeRO and ZeRO-Offload tutorials
tjruwase Sep 10, 2020
d61c679
Update navigation page
tjruwase Sep 10, 2020
934684d
Format fixes
tjruwase Sep 10, 2020
4b90869
Merge branch 'master' into olruwase/docs
jeffra Sep 10, 2020
2f745e7
Add yuxhe feedback
tjruwase Sep 10, 2020
2b81602
Merge branch 'master' of github.com:microsoft/DeepSpeed into olruwase…
tjruwase Sep 10, 2020
481f743
Merge branch 'olruwase/docs' of github.com:microsoft/DeepSpeed into o…
tjruwase Sep 10, 2020
35eeb1d
Merge branch 'master' of github.com:microsoft/DeepSpeed into olruwase…
tjruwase Sep 10, 2020
6bd6171
Fix blog post link
tjruwase Sep 10, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/_tutorials/zero-offload.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ We recommend that you read the tutorials on [Getting Started](/getting-started/)
ZeRO-Offload is a ZeRO optimization that offloads the optimizer memory and computation from the GPU to the host CPU. ZeRO-Offload enables large models with up to 13 billion parameters to be efficiently trained on a single GPU. In this tutorial we will use ZeRO-Offload to train a 10-billion parameter GPT-2 model in DeepSpeed. Furthermore, *using ZeRO-Offload in a DeepSpeed model is quick and easy because all you need is to change a few configurations in the DeepSpeed configuration json*. No code changes are needed.

## ZeRO-Offload Overview
For large model training, optimizers such as [Adam](https://arxiv.org/abs/1412.6980), can consume a significant amount of GPU compute and memory. ZeRO-Offload reduces the GPU compute and memory requirements of such models by leveraging compute and memory resources on the host CPU to execute the optimizer. Furthermore, to prevent the optimizer from becoming a bottleneck, ZeRO-Offload uses DeepSpeed's highly optimized CPU implementation of Adam called [DeeSpeedCPUAdam](https://github.com/microsoft/DeepSpeed/tree/master/deepspeed/ops/adam). DeepSpeedCPUAdam is 5X--7X faster than the standard PyTorch implementation. To deep dive into the design and performance of ZeRO-Offload, please see our blog post [[XXXX]()].
For large model training, optimizers such as [Adam](https://arxiv.org/abs/1412.6980), can consume a significant amount of GPU compute and memory. ZeRO-Offload reduces the GPU compute and memory requirements of such models by leveraging compute and memory resources on the host CPU to execute the optimizer. Furthermore, to prevent the optimizer from becoming a bottleneck, ZeRO-Offload uses DeepSpeed's highly optimized CPU implementation of Adam called [DeeSpeedCPUAdam](https://github.com/microsoft/DeepSpeed/tree/master/deepspeed/ops/adam). DeepSpeedCPUAdam is 5X--7X faster than the standard PyTorch implementation. To deep dive into the design and performance of ZeRO-Offload, please see our [blog post](https://www.microsoft.com/en-us/research/blog/deepspeed-extreme-scale-model-training-for-everyone/#toc-heading-3).

## Training Environment
For this tutorial, we will configure a 10 billion parameter GPT-2 model using the DeepSpeed [Megatron-LM](https://github.com/microsoft/DeepSpeedExamples/tree/master/Megatron-LM) GPT-2 code. We advise stepping through the Megatron-LM [tutorial](/megatron/) if you have not previously done so. We will use a single [NVIDIA Tesla V100-SXM3 Tensor Core GPU](https://www.nvidia.com/en-us/data-center/v100/) with 32GB RAM for this exercise.
Expand Down