Skip to content

Commit

Permalink
Merge branch 'main' into loadams/update-transformers
Browse files Browse the repository at this point in the history
  • Loading branch information
loadams authored Feb 7, 2025
2 parents bbf2fc0 + d100a85 commit a41d5fb
Show file tree
Hide file tree
Showing 15 changed files with 48 additions and 47 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/nv-a6000-fastgen.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ jobs:
python -m pip install .
- name: Install deepspeed
run: |
git clone --depth=1 https://github.com/microsoft/DeepSpeed
git clone --depth=1 https://github.com/deepspeedai/DeepSpeed
cd DeepSpeed
python -m pip install .
ds_report
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/nv-v100-legacy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ jobs:
- name: Install dependencies
run: |
pip install git+https://github.com/microsoft/DeepSpeed.git@lekurile/bloom_v_check
pip install git+https://github.com/deepspeedai/DeepSpeed.git@lekurile/bloom_v_check
pip install git+https://github.com/huggingface/transformers.git
pip install -U accelerate
ds_report
Expand Down
2 changes: 1 addition & 1 deletion CODEOWNERS
Original file line number Diff line number Diff line change
@@ -1 +1 @@
* @tohtana @tjruwase @awan-10 @loadams
* @tohtana @tjruwase @loadams
37 changes: 19 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
[![Formatting](https://github.com/microsoft/DeepSpeed-MII/actions/workflows/formatting.yml/badge.svg?branch=main)](https://github.com/microsoft/DeepSpeed-MII/actions/workflows/formatting.yml)
[![nv-v100-legacy](https://github.com/microsoft/DeepSpeed-MII/actions/workflows/nv-v100-legacy.yml/badge.svg?branch=main)](https://github.com/microsoft/DeepSpeed-MII/actions/workflows/nv-v100-legacy.yml)
[![nv-a6000-fastgen](https://github.com/microsoft/DeepSpeed-MII/actions/workflows/nv-a6000-fastgen.yml/badge.svg?branch=main)](https://github.com/microsoft/DeepSpeed-MII/actions/workflows/nv-a6000-fastgen.yml)
[![License Apache 2.0](https://badgen.net/badge/license/apache2.0/blue)](https://github.com/Microsoft/DeepSpeed/blob/master/LICENSE)
[![Formatting](https://github.com/deepspeedai/DeepSpeed-MII/actions/workflows/formatting.yml/badge.svg?branch=main)](https://github.com/deepspeedai/DeepSpeed-MII/actions/workflows/formatting.yml)
[![nv-v100-legacy](https://github.com/deepspeedai/DeepSpeed-MII/actions/workflows/nv-v100-legacy.yml/badge.svg?branch=main)](https://github.com/deepspeedai/DeepSpeed-MII/actions/workflows/nv-v100-legacy.yml)
[![nv-a6000-fastgen](https://github.com/deepspeedai/DeepSpeed-MII/actions/workflows/nv-a6000-fastgen.yml/badge.svg?branch=main)](https://github.com/deepspeedai/DeepSpeed-MII/actions/workflows/nv-a6000-fastgen.yml)
[![License Apache 2.0](https://badgen.net/badge/license/apache2.0/blue)](https://github.com/deepspeedai/DeepSpeed/blob/master/LICENSE)
[![PyPI version](https://badge.fury.io/py/deepspeed-mii.svg)](https://pypi.org/project/deepspeed-mii/)
<!-- [![Documentation Status](https://readthedocs.org/projects/deepspeed/badge/?version=latest)](https://deepspeed.readthedocs.io/en/latest/?badge=latest) -->

Expand All @@ -12,8 +12,8 @@

## Latest News

* [2024/01] [DeepSpeed-FastGen: Introducting Mixtral, Phi-2, and Falcon support with major performance and feature enhancements.](https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-fastgen/2024-01-19)
* [2023/11] [DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference](https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-fastgen)
* [2024/01] [DeepSpeed-FastGen: Introducting Mixtral, Phi-2, and Falcon support with major performance and feature enhancements.](https://github.com/deepspeedai/DeepSpeed/tree/master/blogs/deepspeed-fastgen/2024-01-19)
* [2023/11] [DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference](https://github.com/deepspeedai/DeepSpeed/tree/master/blogs/deepspeed-fastgen)
* [2022/11] [Stable Diffusion Image Generation under 1 second w. DeepSpeed MII](mii/legacy/examples/benchmark/txt2img)
* [2022/10] [Announcing DeepSpeed Model Implementations for Inference (MII)](https://www.deepspeed.ai/2022/10/10/mii.html)

Expand All @@ -33,7 +33,7 @@

Introducing MII, an open-source Python library designed by DeepSpeed to democratize powerful model inference with a focus on high-throughput, low latency, and cost-effectiveness.

* MII features include blocked KV-caching, continuous batching, Dynamic SplitFuse, tensor parallelism, and high-performance CUDA kernels to support fast high throughput text-generation for LLMs such as Llama-2-70B, Mixtral (MoE) 8x7B, and Phi-2. The latest updates in v0.2 add new model families, performance optimizations, and feature enhancements. MII now delivers up to 2.5 times higher effective throughput compared to leading systems such as vLLM. For detailed performance results please see our [latest DeepSpeed-FastGen blog](https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-fastgen/2024-01-19) and [DeepSpeed-FastGen release blog](https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-fastgen).
* MII features include blocked KV-caching, continuous batching, Dynamic SplitFuse, tensor parallelism, and high-performance CUDA kernels to support fast high throughput text-generation for LLMs such as Llama-2-70B, Mixtral (MoE) 8x7B, and Phi-2. The latest updates in v0.2 add new model families, performance optimizations, and feature enhancements. MII now delivers up to 2.5 times higher effective throughput compared to leading systems such as vLLM. For detailed performance results please see our [latest DeepSpeed-FastGen blog](https://github.com/deepspeedai/DeepSpeed/tree/master/blogs/deepspeed-fastgen/2024-01-19) and [DeepSpeed-FastGen release blog](https://github.com/deepspeedai/DeepSpeed/tree/master/blogs/deepspeed-fastgen).

<div align="center">
<img src="docs/images/fastgen-24-01-hero-light.png#gh-light-mode-only" width="850px">
Expand All @@ -58,7 +58,7 @@ MII provides accelerated text-generation inference through the use of four key t
* Dynamic SplitFuse
* High Performance CUDA Kernels

For a deeper dive into understanding these features please [refer to our blog](https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-fastgen) which also includes a detailed performance analysis.
For a deeper dive into understanding these features please [refer to our blog](https://github.com/deepspeedai/DeepSpeed/tree/master/blogs/deepspeed-fastgen) which also includes a detailed performance analysis.

## MII Legacy

Expand All @@ -78,14 +78,14 @@ In the past, MII introduced several [key performance optimizations](https://www.
</div>


Figure 1: MII architecture, showing how MII automatically optimizes OSS models using DS-Inference before deploying them. DeepSpeed-FastGen optimizations in the figure have been published in [our blog post](https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-fastgen).
Figure 1: MII architecture, showing how MII automatically optimizes OSS models using DS-Inference before deploying them. DeepSpeed-FastGen optimizations in the figure have been published in [our blog post](https://github.com/deepspeedai/DeepSpeed/tree/master/blogs/deepspeed-fastgen).

Under-the-hood MII is powered by [DeepSpeed-Inference](https://github.com/microsoft/deepspeed). Based on the model architecture, model size, batch size, and available hardware resources, MII automatically applies the appropriate set of system optimizations to minimize latency and maximize throughput.
Under-the-hood MII is powered by [DeepSpeed-Inference](https://github.com/deepspeedai/DeepSpeed). Based on the model architecture, model size, batch size, and available hardware resources, MII automatically applies the appropriate set of system optimizations to minimize latency and maximize throughput.


# Supported Models

MII currently supports over 37,000 models across eight popular model architectures. We plan to add additional models in the near term, if there are specific model architectures you would like supported please [file an issue](https://github.com/microsoft/DeepSpeed-MII/issues) and let us know. All current models leverage Hugging Face in our backend to provide both the model weights and the model's corresponding tokenizer. For our current release we support the following model architectures:
MII currently supports over 37,000 models across eight popular model architectures. We plan to add additional models in the near term, if there are specific model architectures you would like supported please [file an issue](https://github.com/deepspeedai/DeepSpeed-MII/issues) and let us know. All current models leverage Hugging Face in our backend to provide both the model weights and the model's corresponding tokenizer. For our current release we support the following model architectures:

model family | size range | ~model count
------ | ------ | ------
Expand Down Expand Up @@ -120,7 +120,7 @@ The fasest way to get started is with our [PyPI release of DeepSpeed-MII](https:
pip install deepspeed-mii
```

For ease of use and significant reduction in lengthy compile times that many projects require in this space we distribute a pre-compiled python wheel covering the majority of our custom kernels through a new library called [DeepSpeed-Kernels](https://github.com/microsoft/DeepSpeed-Kernels). We have found this library to be very portable across environments with NVIDIA GPUs with compute capabilities 8.0+ (Ampere+), CUDA 11.6+, and Ubuntu 20+. In most cases you shouldn't even need to know this library exists as it is a dependency of DeepSpeed-MII and will be installed with it. However, if for whatever reason you need to compile our kernels manually please see our [advanced installation docs](https://github.com/microsoft/DeepSpeed-Kernels#source).
For ease of use and significant reduction in lengthy compile times that many projects require in this space we distribute a pre-compiled python wheel covering the majority of our custom kernels through a new library called [DeepSpeed-Kernels](https://github.com/deepspeedai/DeepSpeed-Kernels). We have found this library to be very portable across environments with NVIDIA GPUs with compute capabilities 8.0+ (Ampere+), CUDA 11.6+, and Ubuntu 20+. In most cases you shouldn't even need to know this library exists as it is a dependency of DeepSpeed-MII and will be installed with it. However, if for whatever reason you need to compile our kernels manually please see our [advanced installation docs](https://github.com/deepspeedai/DeepSpeed-Kernels#source).

## Non-Persistent Pipeline

Expand Down Expand Up @@ -321,13 +321,14 @@ Users can also control the generation characteristics for individual prompts (i.

# Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
This project welcomes contributions and suggestions.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide
a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
provided by the bot. You will only need to do this once across all repos using our CLA.
DeepSpeed-MII has adopted the [DCO](https://en.wikipedia.org/wiki/Developer_Certificate_of_Origin). All deepspeedai repos require a DCO.
(DeepSpeed previously used a CLA which is being replaced with DCO).

DCO is provided by including a sign-off-by line in commit messages. Using the `-s` flag for `git commit` will automatically append this line.
For example, running `git commit -s -m 'commit info.'` will produce a commit that has the message `commit info. Signed-off-by: My Name <my_email@my_company.com>.`
The DCO bot will ensure commits are signed with an email address that matches the commit author before they are eligible to be merged.

This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
Expand Down
10 changes: 5 additions & 5 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,15 @@ democratize powerful model inference with a focus on high-throughput, low
latency, and cost-effectiveness.

MII v0.1 introduced several features as part of our `DeepSpeed-FastGen release
<https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-fastgen>`_
<https://github.com/deepspeedai/DeepSpeed/tree/master/blogs/deepspeed-fastgen>`_
such as blocked KV-caching, continuous batching, Dynamic SplitFuse, tensor
parallelism, and high-performance CUDA kernels to support fast high throughput
text-generation with LLMs. The latest version of MII delivers up to 2.5 times
higher effective throughput compared to leading systems such as vLLM. For
detailed performance results please see our `DeepSpeed-FastGen release blog
<https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-fastgen>`_
<https://github.com/deepspeedai/DeepSpeed/tree/master/blogs/deepspeed-fastgen>`_
and the `latest DeepSpeed-FastGen blog
<https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-fastgen/2024-01-19>`_.
<https://github.com/deepspeedai/DeepSpeed/tree/master/blogs/deepspeed-fastgen/2024-01-19>`_.

MII-Legacy
----------
Expand All @@ -32,9 +32,9 @@ We first `announced MII <https://www.deepspeed.ai/2022/10/10/mii.html>`_ in
of DeepSpeed-FastGen. MII-Legacy, which covers all prior releases up to v0.0.9,
provides support for running inference for a wide variety of language model
tasks. We also support accelerating `text2image models like Stable Diffusion
<https://github.com/Microsoft/DeepSpeed-MII/tree/main/mii/legacy/examples/benchmark/txt2img>`_.
<https://github.com/deepspeedai/DeepSpeed-MII/tree/main/mii/legacy/examples/benchmark/txt2img>`_.
For more details on our previous releases please see our `legacy APIs
<https://github.com/Microsoft/DeepSpeed-MII/tree/main/mii/legacy/>`_.
<https://github.com/deepspeedai/DeepSpeed-MII/tree/main/mii/legacy/>`_.


Contents
Expand Down
4 changes: 2 additions & 2 deletions docs/source/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,11 @@ pip to install from source:

.. code-block:: console
(.venv) $ pip install git+https://github.com/Microsoft/DeepSpeed-MII.git
(.venv) $ pip install git+https://github.com/deepspeedai/DeepSpeed-MII.git
Or you can clone the repository and install:

.. code-block:: console
(.venv) $ git clone https://github.com/Microsoft/DeepSpeed-MII.git
(.venv) $ git clone https://github.com/deepspeedai/DeepSpeed-MII.git
(.venv) $ pip install ./DeepSpeed-MII
2 changes: 1 addition & 1 deletion examples/README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
# MII Examples
Please see [DeepSpeedExamples](https://github.com/microsoft/DeepSpeedExamples/tree/master/inference/mii) for a few examples on using MII.
Please see [DeepSpeedExamples](https://github.com/deepspeedai/DeepSpeedExamples/tree/master/inference/mii) for a few examples on using MII.
4 changes: 2 additions & 2 deletions mii/aml_related/templates.py
Original file line number Diff line number Diff line change
Expand Up @@ -165,8 +165,8 @@
RUN /opt/miniconda/envs/amlenv/bin/pip install torch torchvision --index-url https://download.pytorch.org/whl/cu113 && \
/opt/miniconda/envs/amlenv/bin/pip install -r "$BUILD_DIR/requirements.txt" && \
/opt/miniconda/envs/amlenv/bin/pip install azureml-inference-server-http && \
/opt/miniconda/envs/amlenv/bin/pip install git+https://github.com/microsoft/DeepSpeed.git && \
/opt/miniconda/envs/amlenv/bin/pip install git+https://github.com/microsoft/DeepSpeed-MII.git && \
/opt/miniconda/envs/amlenv/bin/pip install git+https://github.com/deepspeedai/DeepSpeed.git && \
/opt/miniconda/envs/amlenv/bin/pip install git+https://github.com/deepspeedai/DeepSpeed-MII.git && \
/opt/miniconda/envs/amlenv/bin/pip install git+https://github.com/huggingface/transformers.git
Expand Down
8 changes: 4 additions & 4 deletions mii/legacy/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<!-- [![Build Status](https://github.com/microsoft/deepspeed-mii/workflows/Build/badge.svg)](https://github.com/microsoft/DeepSpeed-MII/actions) -->
[![Formatting](https://github.com/microsoft/DeepSpeed-MII/actions/workflows/formatting.yml/badge.svg)](https://github.com/microsoft/DeepSpeed-MII/actions/workflows/formatting.yml)
[![License Apache 2.0](https://badgen.net/badge/license/apache2.0/blue)](https://github.com/Microsoft/DeepSpeed/blob/master/LICENSE)
<!-- [![Build Status](https://github.com/deepspeedai/DeepSpeed-mii/workflows/Build/badge.svg)](https://github.com/deepspeedai/DeepSpeed-MII/actions) -->
[![Formatting](https://github.com/deepspeedai/DeepSpeed-MII/actions/workflows/formatting.yml/badge.svg)](https://github.com/deepspeedai/DeepSpeed-MII/actions/workflows/formatting.yml)
[![License Apache 2.0](https://badgen.net/badge/license/apache2.0/blue)](https://github.com/deepspeedai/DeepSpeed/blob/master/LICENSE)
[![PyPI version](https://badge.fury.io/py/deepspeed-mii.svg)](https://pypi.org/project/deepspeed-mii/)
<!-- [![Documentation Status](https://readthedocs.org/projects/deepspeed/badge/?version=latest)](https://deepspeed.readthedocs.io/en/latest/?badge=latest) -->

Expand Down Expand Up @@ -195,7 +195,7 @@ result = generator.query({"query": ["DeepSpeed is", "Seattle is"]}, do_sample=Tr

```

You can find a complete example [here]("https://github.com/microsoft/DeepSpeed-MII/tree/main/examples/non_persistent")
You can find a complete example [here]("https://github.com/deepspeedai/DeepSpeed-MII/tree/main/examples/non_persistent")

Any HTTP client can be used to call the APIs. An example of using curl is:
```bash
Expand Down
4 changes: 2 additions & 2 deletions mii/legacy/aml_related/templates.py
Original file line number Diff line number Diff line change
Expand Up @@ -165,8 +165,8 @@
RUN /opt/miniconda/envs/amlenv/bin/pip install torch torchvision --index-url https://download.pytorch.org/whl/cu113 && \
/opt/miniconda/envs/amlenv/bin/pip install -r "$BUILD_DIR/requirements.txt" && \
/opt/miniconda/envs/amlenv/bin/pip install azureml-inference-server-http && \
/opt/miniconda/envs/amlenv/bin/pip install git+https://github.com/microsoft/DeepSpeed.git && \
/opt/miniconda/envs/amlenv/bin/pip install git+https://github.com/microsoft/DeepSpeed-MII.git && \
/opt/miniconda/envs/amlenv/bin/pip install git+https://github.com/deepspeedai/DeepSpeed.git && \
/opt/miniconda/envs/amlenv/bin/pip install git+https://github.com/deepspeedai/DeepSpeed-MII.git && \
/opt/miniconda/envs/amlenv/bin/pip install git+https://github.com/huggingface/transformers.git
Expand Down
4 changes: 2 additions & 2 deletions mii/legacy/docs/GPT-NeoX.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,15 +18,15 @@ source ./MII-GPT-NeoX/bin/activate

## Install MII
```bash
git clone https://github.com/microsoft/DeepSpeed-MII.git
git clone https://github.com/deepspeedai/DeepSpeed-MII.git
cd DeepSpeed-MII
pip install .[local]
pip install .
```

## Install DeepSpeed-GPT-NeoX
```bash
git clone -b ds-updates https://github.com/microsoft/deepspeed-gpt-neox.git
git clone -b ds-updates https://github.com/deepspeedai/DeepSpeed-gpt-neox.git
cd deepspeed-gpt-neox
pip install -r requirements/requirements-inference.txt
pip install .
Expand Down
Loading

0 comments on commit a41d5fb

Please sign in to comment.