Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Website edits #398

Merged
merged 13 commits into from
Sep 10, 2020
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ information [here](https://innovation.microsoft.com/en-us/exploring-ai-at-scale)
* [Up to 5x less communication and 3.4x faster training through 1-bit Adam](https://www.deepspeed.ai/news/2020/09/08/onebit-adam-news.html)
* [10x bigger model training on a single GPU with ZeRO-Offload](https://www.deepspeed.ai/news/2020/09/08/ZeRO-Offload.html)
* [2020/08/07] [DeepSpeed Microsoft Research Webinar](https://note.microsoft.com/MSR-Webinar-DeepSpeed-Registration-On-Demand.html) is now available on-demand
* [2020/07/24] [DeepSpeed Microsoft Research Webinar](https://note.microsoft.com/MSR-Webinar-DeepSpeed-Registration-On-Demand.html) on August 6th, 2020
* [2020/07/24] [DeepSpeed Microsoft Research Webinar](https://note.microsoft.com/MSR-Webinar-DeepSpeed-Registration-On-Demand.html) on August 6th, 2020
[![DeepSpeed webinar](docs/assets/images/webinar-aug2020.png)](https://note.microsoft.com/MSR-Webinar-DeepSpeed-Registration-Live.html)
* [2020/05/19] [ZeRO-2 & DeepSpeed: Shattering Barriers of Deep Learning Speed & Scale](https://www.microsoft.com/en-us/research/blog/zero-2-deepspeed-shattering-barriers-of-deep-learning-speed-scale/)
* [2020/05/19] [An Order-of-Magnitude Larger and Faster Training with ZeRO-2](https://www.deepspeed.ai/news/2020/05/18/zero-stage2.html)
Expand Down Expand Up @@ -88,7 +88,7 @@ overview](https://www.deepspeed.ai/features/) for descriptions and usage.
* Support 10B model training on a single GPU
* [Ultra-fast dense transformer kernels](https://www.deepspeed.ai/news/2020/05/18/bert-record.html)
* [Sparse attention](https://www.deepspeed.ai/news/2020/09/08/sparse-attention.html)
* Memory- and compute-efficient sparse kernels
* Memory- and compute-efficient sparse kernels
* Support 10x long sequences than dense
* Flexible support to different sparse structures
* [1-bit Adam](https://www.deepspeed.ai/news/2020/09/08/onebit-adam-blog-post.html)
Expand Down
23 changes: 15 additions & 8 deletions docs/_pages/features.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,17 +30,22 @@ deepspeed --hostfile=<hostfile> \
```
The script `<client_entry.py>` will execute on the resources specified in `<hostfile>`.

## Pipeline Parallelism
DeepSpeed provides [pipeline parallelism](/tutorials/pipeline/) for memory-
and communication- efficient training. DeepSpeed supports a hybrid
combination of data, model, and pipeline parallelism and has scaled to over
[one trillion parameters using 3D parallelism]({{ site.press_release_v3 }}).
Pipeline parallelism can also improve communication efficiency and has
accelerated training by up to 7x on low-banwdith clusters.

## Model Parallelism

## Model Parallelism
### Support for Custom Model Parallelism
DeepSpeed supports all forms of model parallelism including tensor slicing based
approaches such as the [Megatron-LM](https://github.com/NVIDIA/Megatron-LM), or
pipelined parallelism approaches such as
[PipeDream](https://github.com/msr-fiddle/pipedream) and
[GPipe](https://github.com/kakaobrain/torchgpipe). It does so by only requiring the model
parallelism framework to provide a *model parallelism unit* (`mpu`) that implements a few
bookkeeping functionalities:
DeepSpeed supports all forms of model parallelism including tensor slicing
based approaches such as the
[Megatron-LM](https://github.com/NVIDIA/Megatron-LM). It does so by only
requiring the model parallelism framework to provide a *model parallelism
unit* (`mpu`) that implements a few bookkeeping functionalities:

```python
mpu.get_model_parallel_rank()
Expand All @@ -57,6 +62,8 @@ DeepSpeed is fully compatible with [Megatron](https://github.com/NVIDIA/Megatron
Please see the [Megatron-LM tutorial](/tutorials/megatron/) for details.




## The Zero Redundancy Optimizer
The Zero Redundancy Optimizer ([ZeRO](https://arxiv.org/abs/1910.02054)) is at
the heart of DeepSpeed and enables large model training at a scale that is
Expand Down
2 changes: 1 addition & 1 deletion docs/_posts/2020-09-09-ZeRO-Offload.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ new_post: true
date: 2020-09-09 00:00:00
---

We introduce a new technology called ZeRO-Offload to enable **10X bigger model training on a single GPU**. ZeRO-Offload extends ZeRO-2 to leverage both CPU and GPU memory for training large models. Using a machine with **a single GPU**, our users now can run **models of up to 13 billion parameters** without running out of memory, 10x bigger than the existing approaches, while obtaining competitive throughput. This feature democratizes multi-billion-parameter model training and opens the window for many deep learning practitioners to explore bigger and better models.
We introduce a new technology called ZeRO-Offload to enable **10X bigger model training on a single GPU**. ZeRO-Offload extends ZeRO-2 to leverage both CPU and GPU memory for training large models. Using a machine with **a single GPU**, our users now can run **models of up to 13 billion parameters** without running out of memory, 10x bigger than the existing approaches, while obtaining competitive throughput. This feature democratizes multi-billion-parameter model training and opens the window for many deep learning practitioners to explore bigger and better models.

* For more information on ZeRO-Offload, see our [press release]( {{ site.press_release_v3 }} ).
* For more information on how to use ZeRO-Offload, see our [ZeRO-Offload tutorial](https://www.deepspeed.ai/tutorials/zero-offload/).
Expand Down
4 changes: 2 additions & 2 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ information [here](https://innovation.microsoft.com/en-us/exploring-ai-at-scale)
# What's New?
* [2020/09/10] [DeepSpeed: Extreme-scale model training for everyone]({{ site.press_release_v3 }})
* [Powering 10x longer sequences and 6x faster execution through DeepSpeed Sparse Attention](https://www.deepspeed.ai/news/2020/09/08/sparse-attention-news.html)
* [Training a trillion parameters with pipeline parallelism](https://www.deepspeed.ai/news/2020/09/09/pipeline-parallelism.html)
* [Training a trillion parameters with pipeline parallelism](https://www.deepspeed.ai/news/2020/09/08/pipeline-parallelism.html)
* [Up to 5x less communication and 3.4x faster training through 1-bit Adam](https://www.deepspeed.ai/news/2020/09/08/onebit-adam-news.html)
* [10x bigger model training on a single GPU with ZeRO-Offload](https://www.deepspeed.ai/news/2020/09/08/ZeRO-Offload.html)

Expand Down Expand Up @@ -168,7 +168,7 @@ Below we provide a brief feature list, see our detailed [feature overview](https
* Support 10B model training on a single GPU
* [Ultra-fast dense transformer kernels](https://www.deepspeed.ai/news/2020/05/18/bert-record.html)
* [Sparse attention](https://www.deepspeed.ai/news/2020/09/08/sparse-attention.html)
* Memory- and compute-efficient sparse kernels
* Memory- and compute-efficient sparse kernels
* Support 10x long sequences than dense
* Flexible support to different sparse structures
* [1-bit Adam](https://www.deepspeed.ai/news/2020/09/08/onebit-adam-blog-post.html)
Expand Down