Skip to content

Commit

Permalink
Apply suggestions from code review
Browse files Browse the repository at this point in the history
Co-authored-by: Agnieszka Ciborowska <[email protected]>
  • Loading branch information
Kevin Musgrave and aciborowska authored Feb 26, 2024
1 parent 1d0c924 commit d4656da
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 14 deletions.
4 changes: 2 additions & 2 deletions blog/llm-finetuning-2/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Finetuning Mistral-7B using LoRA and DeepSpeed

In this demo, we finetune the [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) using [LoRA](https://arxiv.org/abs/2106.09685) and [DeepSpeed](https://github.com/microsoft/DeepSpeed). We ran LoRA on two 80 GB A100 GPUs, and DeepSpeed on two, four, and eight 80 GB A100 GPUs.
In this demo, we finetune [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) using [LoRA](https://arxiv.org/abs/2106.09685) and [DeepSpeed](https://github.com/microsoft/DeepSpeed). We ran LoRA on two 80 GB A100 GPUs, and DeepSpeed on two, four, and eight 80 GB A100 GPUs.

To get started, first install Determined on your local machine:
```bash
Expand All @@ -25,7 +25,7 @@ Change configuration options in `distributed.yaml`. Some important options are:
- `per_device_train_batch_size`: the batch size per GPU.


DeepSpeed configuration options are in the `ds_configs` folder.
DeepSpeed configuration files are in the `ds_configs` folder.

## Testing

Expand Down
12 changes: 0 additions & 12 deletions blog/llm-finetuning-2/finetune.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,10 +93,6 @@ def compute_metrics(eval_preds):
decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)
decoded_preds = tokenizer.batch_decode(preds, skip_special_tokens=True)

for l, p in zip(decoded_labels, decoded_preds):
if l != p:
logging.error(f"decoded_label:{l}")
logging.error(f"decoded_pred:{p}")

bleu_score = bleu.compute(predictions=decoded_preds, references=decoded_labels)
accuracy = acc.compute(predictions=preds[~mask], references=labels[~mask])
Expand All @@ -114,7 +110,6 @@ def compute_metrics(eval_preds):

model = get_peft_model(model, peft_config)

logging.error(f"dataset={dataset['train'][0]}")

trainer = Trainer(
args=training_args,
Expand All @@ -128,13 +123,6 @@ def compute_metrics(eval_preds):
)

trainer.add_callback(det_callback)
# we need to comment this one out, since it will lead to the following error:
# [parameter_offload.py:86:_apply_to_tensors_only] A module has unknown inputs or outputs type (<class 'transformers.cache_utils.DynamicCache'>)
# and the tensors embedded in it cannot be detected. The ZeRO-3 hooks designed to trigger before or after backward pass of the module relies on
# knowing the input and output tensors and therefore may not get triggered properly.
# The error happens due to deepspeed initialization happening in the trainer.train(), hence call on eval fails.

# trainer.evaluate()

trainer.train()

Expand Down

0 comments on commit d4656da

Please sign in to comment.