Skip to content

Commit

Permalink
Merge branch 'master' into eot_token_bugfix
Browse files Browse the repository at this point in the history
  • Loading branch information
loadams authored Oct 29, 2024
2 parents f370d3c + 5a61193 commit 44da9cd
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 1 deletion.
2 changes: 1 addition & 1 deletion inference/huggingface/zero_inference/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ deepspeed --num_gpus 1 run_model.py --model bigscience/bloom-7b1 --batch-size 8
Here is an example of running `meta-llama/Llama-2-7b-hf` with Zero-Inference using 4-bit model weights and offloading kv cache to CPU:

```sh
deepspeed --num_gpus 1 run_model.py --model meta-llama/Llama-2-7b-hf` --batch-size 8 --prompt-len 512 --gen-len 32 --cpu-offload --quant-bits 4 --kv-offload
deepspeed --num_gpus 1 run_model.py --model meta-llama/Llama-2-7b-hf --batch-size 8 --prompt-len 512 --gen-len 32 --cpu-offload --quant-bits 4 --kv-offload
```

## Performance Tuning Tips
Expand Down
3 changes: 3 additions & 0 deletions training/cifar/cifar10_deepspeed.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import argparse
import os

import deepspeed
import torch
Expand Down Expand Up @@ -279,6 +280,8 @@ def test(model_engine, testset, local_device, target_dtype, test_batch_size=4):
def main(args):
# Initialize DeepSpeed distributed backend.
deepspeed.init_distributed()
_local_rank = int(os.environ.get("LOCAL_RANK"))
get_accelerator().set_device(_local_rank)

########################################################################
# Step1. Data Preparation.
Expand Down

0 comments on commit 44da9cd

Please sign in to comment.