-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Eagle Speculative Sampling examples (#11104)
* Eagle Speculative Sampling examples * rm multi-gpu and ray content * updated README to include Arc A770
- Loading branch information
1 parent
fabc395
commit ab476c7
Showing
10 changed files
with
1,396 additions
and
0 deletions.
There are no files selected for viewing
49 changes: 49 additions & 0 deletions
49
python/llm/example/CPU/Speculative-Decoding/Eagle/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
# Eagle - Speculative Sampling using IPEX-LLM on Intel CPUs | ||
In this directory, you will find the examples on how IPEX-LLM accelerate inference with speculative sampling using EAGLE (Extrapolation Algorithm for Greater Language-model Efficiency), a speculative sampling method that improves text generation speed) on Intel CPUs. See [here](https://arxiv.org/abs/2401.15077) to view the paper and [here](https://github.com/SafeAILab/EAGLE) for more info on EAGLE code. | ||
|
||
## Requirements | ||
To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information. | ||
|
||
## Example - EAGLE Speculative Sampling with IPEX-LLM on MT-bench | ||
In this example, we run inference for a Llama2 model to showcase the speed of EAGLE with IPEX-LLM on MT-bench data on Intel CPUs. | ||
|
||
### 1. Install | ||
We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#). | ||
|
||
After installing conda, create a Python environment for IPEX-LLM: | ||
```bash | ||
conda create -n llm python=3.11 # recommend to use Python 3.11 | ||
conda activate llm | ||
|
||
pip install --pre --upgrade ipex-llm[all] --extra-index-url https://download.pytorch.org/whl/cpu | ||
pip install intel_extension_for_pytorch==2.1.0 | ||
pip install -r requirements.txt | ||
pip install eagle-llm | ||
``` | ||
|
||
### 2. Configures IPEX-LLM environment variables for Linux | ||
|
||
> [!NOTE] | ||
> Skip this step if you are running on Windows. | ||
```bash | ||
# set IPEX-LLM env variables | ||
source ipex-llm-init | ||
|
||
``` | ||
### 3. Running Example | ||
You can test the speed of EAGLE speculative sampling with ipex-llm on MT-bench using the following command. | ||
```bash | ||
python -m evaluation.gen_ea_answer_llama2chat\ | ||
--ea-model-path [path of EAGLE weight]\ | ||
--base-model-path [path of the original model]\ | ||
--enable-ipex-llm\ | ||
``` | ||
Please refer to [here](https://github.com/SafeAILab/EAGLE#eagle-weights) for the complete list of available EAGLE weights. | ||
|
||
The above command will generate a .jsonl file that records the generation results and wall time. Then, you can use evaluation/speed.py to calculate the speed. | ||
```bash | ||
python -m evaluation.speed\ | ||
--base-model-path [path of the original model]\ | ||
--jsonl-file [pathname of the .jsonl file]\ | ||
``` | ||
|
80 changes: 80 additions & 0 deletions
80
python/llm/example/CPU/Speculative-Decoding/Eagle/data/mt_bench/question.jsonl
Large diffs are not rendered by default.
Oops, something went wrong.
Oops, something went wrong.