From 36ccfb30864323535f7f5ea05a6fd2eed9b2201d Mon Sep 17 00:00:00 2001 From: "Andrei.Aksionov" Date: Fri, 21 Jul 2023 13:31:57 +0300 Subject: [PATCH 1/2] Fix missing brackets, code block language and trailing spaces --- tutorials/download_llama_2.md | 4 ++-- tutorials/finetune_adapter.md | 20 +++++++++++--------- tutorials/finetune_full.md | 15 ++++++++------- tutorials/finetune_lora.md | 15 ++++++++------- tutorials/inference.md | 4 +++- tutorials/oom.md | 4 ++-- tutorials/tpus.md | 3 ++- 7 files changed, 36 insertions(+), 29 deletions(-) diff --git a/tutorials/download_llama_2.md b/tutorials/download_llama_2.md index edcbfe713f..d835771ae7 100644 --- a/tutorials/download_llama_2.md +++ b/tutorials/download_llama_2.md @@ -29,8 +29,8 @@ meta-llama/Llama-2-70b-chat-hf In order to use a specific checkpoint, for instance [Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf), download the weights and convert the checkpoint to the lit-gpt format. -This requires that you've been granted access to the weights on the HuggingFace hub. You can do so by following the steps at https://huggingface.co/meta-llama/Llama-2-7b. -After access is granted, you can find your HF hub token in https://huggingface.co/settings/tokens. +This requires that you've been granted access to the weights on the HuggingFace hub. You can do so by following the steps at . +After access is granted, you can find your HF hub token in . ```bash pip install huggingface_hub diff --git a/tutorials/finetune_adapter.md b/tutorials/finetune_adapter.md index 90acb8185a..e5dc4c6871 100644 --- a/tutorials/finetune_adapter.md +++ b/tutorials/finetune_adapter.md @@ -30,7 +30,7 @@ python finetune/adapter.py --checkpoint_dir checkpoints/stabilityai/stablelm-bas or for Adapter V2 -```bash +```bash python finetune/adapter_v2.py --checkpoint_dir checkpoints/stabilityai/stablelm-base-alpha-3b ``` @@ -40,6 +40,7 @@ Depending on the available GPU memory, you can also tune the `micro_batch_size` To fit Adapter V2 to 12GB memory set micro_batch_size = 2. For example, the following settings will let you finetune the model in under 1 hour: + ```python devices = 4 micro_batch_size = 4 @@ -78,27 +79,29 @@ python generate/adapter.py \ or for Adapter V2 -```bash +```bash python generate/adapter_v2.py \ --prompt "Recommend a movie to watch on the weekend." \ --checkpoint_dir checkpoints/stabilityai/stablelm-base-alpha-3b ``` Output: -``` + +```text A good movie to watch on the weekend would be The Lion King, since it's a classic family film that everyone can enjoy... ``` + If your GPU supports `bfloat16`, the script will automatically use it. ## Tune on your dataset With only a few modifications, you can prepare and train on your own instruction dataset. -1. Create a json file in which each row holds one instruction-response pair. - A row has an entry for 'instruction', 'input', and 'output', where 'input' is optional an can be +1. Create a json file in which each row holds one instruction-response pair. + A row has an entry for 'instruction', 'input', and 'output', where 'input' is optional an can be the empty string if the instruction doesn't require a context. Below is an example json file: - ``` + ```json [ { "instruction": "Arrange the given numbers in ascending order.", @@ -123,7 +126,7 @@ With only a few modifications, you can prepare and train on your own instruction ``` 5. Run `finetune/adapter.py` by passing in the location of your data (and optionally other parameters): - + ```bash python finetune/adapter.py \ --data_dir data/mydata/ \ @@ -131,8 +134,7 @@ With only a few modifications, you can prepare and train on your own instruction --out_dir data/mydata-finetuned ``` - ## Troubleshooting If you run into a CUDA error "Expected is_sm80 to be true, but got false", uncomment the line -`torch.backends.cuda.enable_flash_sdp(False)` in the finetune script (see https://github.com/Lightning-AI/lit-llama/issues/101). +`torch.backends.cuda.enable_flash_sdp(False)` in the finetune script (see ). diff --git a/tutorials/finetune_full.md b/tutorials/finetune_full.md index 400b22491c..55e3eddf3d 100644 --- a/tutorials/finetune_full.md +++ b/tutorials/finetune_full.md @@ -53,20 +53,22 @@ python generate/full.py \ ``` Output: -``` + +```text A good movie to watch on the weekend would be The Lion King, since it's a classic family film that everyone can enjoy... ``` + If your GPU supports `bfloat16`, the script will automatically use it. ## Tune on your dataset With only a few modifications, you can prepare and train on your own instruction dataset. -1. Create a json file in which each row holds one instruction-response pair. - A row has an entry for 'instruction', 'input', and 'output', where 'input' is optional an can be +1. Create a json file in which each row holds one instruction-response pair. + A row has an entry for 'instruction', 'input', and 'output', where 'input' is optional an can be the empty string if the instruction doesn't require a context. Below is an example json file: - ``` + ```json [ { "instruction": "Arrange the given numbers in ascending order.", @@ -91,7 +93,7 @@ With only a few modifications, you can prepare and train on your own instruction ``` 5. Run `finetune/full.py` by passing in the location of your data (and optionally other parameters): - + ```bash python finetune/full.py \ --data_dir data/mydata/ \ @@ -99,8 +101,7 @@ With only a few modifications, you can prepare and train on your own instruction --out_dir data/mydata-finetuned ``` - ## Troubleshooting If you run into a CUDA error "Expected is_sm80 to be true, but got false", uncomment the line -`torch.backends.cuda.enable_flash_sdp(False)` in the finetune script (see https://github.com/Lightning-AI/lit-llama/issues/101). +`torch.backends.cuda.enable_flash_sdp(False)` in the finetune script (see ). diff --git a/tutorials/finetune_lora.md b/tutorials/finetune_lora.md index d244383d1f..c05148acde 100644 --- a/tutorials/finetune_lora.md +++ b/tutorials/finetune_lora.md @@ -45,8 +45,10 @@ You can test the finetuned model with your own instructions by running: ```bash python generate/lora.py --prompt "Recommend a movie to watch on the weekend." ``` + Output: -``` + +```text I would recommend the movie The Martian (2015). It is a sci-fi movie starring Matt Damon that follows the story of... ``` @@ -56,11 +58,11 @@ If your GPU supports `bfloat16`, you can additionally pass `--precision bf16-tru With only a few modifications, you can prepare and train on your own instruction dataset. -1. Create a json file in which each row holds one instruction-response pair. - A row has an entry for 'instruction', 'input', and 'output', where 'input' is optional an can be +1. Create a json file in which each row holds one instruction-response pair. + A row has an entry for 'instruction', 'input', and 'output', where 'input' is optional an can be the empty string if the instruction doesn't require a context. Below is an example json file: - ``` + ```json [ { "instruction": "Arrange the given numbers in ascending order.", @@ -85,13 +87,12 @@ With only a few modifications, you can prepare and train on your own instruction ``` 5. Run `finetune/lora.py` by passing in the location of your data (and optionally other parameters): - + ```bash python finetune/lora.py --data_dir data/mydata/ --out_dir out/myexperiment ``` - ## Troubleshooting If you run into a CUDA error "Expected is_sm80 to be true, but got false", uncomment the line -`torch.backends.cuda.enable_flash_sdp(False)` in the script below (see https://github.com/Lightning-AI/lit-llama/issues/101). +`torch.backends.cuda.enable_flash_sdp(False)` in the script below (see ). diff --git a/tutorials/inference.md b/tutorials/inference.md index 7f62387278..bb21cef4db 100644 --- a/tutorials/inference.md +++ b/tutorials/inference.md @@ -5,8 +5,10 @@ We demonstrate how to run inference (next token prediction) with the GPT base mo ```bash python generate/base.py --prompt "Hello, my name is" --checkpoint_dir checkpoints/stabilityai/stablelm-base-alpha-3b ``` + Output: -``` + +```text Hello, my name is Levi Durrer, I'm an Austrian journalist - Chairman of the Press Blair Party, with 37 years in the Press Blair International, and two years in the Spectre of Austerity for the other. I'm crossing my fingers that you will feel ``` diff --git a/tutorials/oom.md b/tutorials/oom.md index 383dc3be93..b472067a54 100644 --- a/tutorials/oom.md +++ b/tutorials/oom.md @@ -1,10 +1,10 @@ -## Dealing with out-of-memory (OOM) errors: +## Dealing with out-of-memory (OOM) errors If you got this error while running a script ```bash OutOfMemoryError: CUDA out of memory. Tried to allocate 2.22 GiB. GPU 0 has a total capacty of 79.15 GiB of which 228.38 MiB is free. Including non-PyTorch memory, this process -has 78.93 GiB memory in use. Of the allocated memory 76.28 GiB is allocated by PyTorch, and 2.14 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory +has 78.93 GiB memory in use. Of the allocated memory 76.28 GiB is allocated by PyTorch, and 2.14 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF ``` diff --git a/tutorials/tpus.md b/tutorials/tpus.md index c547cee4cf..86092fd570 100644 --- a/tutorials/tpus.md +++ b/tutorials/tpus.md @@ -59,7 +59,8 @@ You'll notice that afterwards, generation times drop to ~2s. Coming soon. > **Warning** -> When you are done, remember to delete your instance +> When you are done, remember to delete your instance +> > ```shell > gcloud compute tpus tpu-vm delete lit-gpt --zone=us-central2-b > ``` From 478fb2a87f2c415483838c8f6a3c076a83650e41 Mon Sep 17 00:00:00 2001 From: "Andrei.Aksionov" Date: Fri, 21 Jul 2023 18:43:42 +0300 Subject: [PATCH 2/2] Add a note that Windows users should use WSL2 for bnb. --- tutorials/finetune_adapter.md | 2 +- tutorials/finetune_full.md | 2 +- tutorials/finetune_lora.md | 2 +- tutorials/quantize.md | 1 + 4 files changed, 4 insertions(+), 3 deletions(-) diff --git a/tutorials/finetune_adapter.md b/tutorials/finetune_adapter.md index e5dc4c6871..62b2b529c6 100644 --- a/tutorials/finetune_adapter.md +++ b/tutorials/finetune_adapter.md @@ -101,7 +101,7 @@ With only a few modifications, you can prepare and train on your own instruction A row has an entry for 'instruction', 'input', and 'output', where 'input' is optional an can be the empty string if the instruction doesn't require a context. Below is an example json file: - ```json + ```text [ { "instruction": "Arrange the given numbers in ascending order.", diff --git a/tutorials/finetune_full.md b/tutorials/finetune_full.md index 55e3eddf3d..ad44d6d658 100644 --- a/tutorials/finetune_full.md +++ b/tutorials/finetune_full.md @@ -68,7 +68,7 @@ With only a few modifications, you can prepare and train on your own instruction A row has an entry for 'instruction', 'input', and 'output', where 'input' is optional an can be the empty string if the instruction doesn't require a context. Below is an example json file: - ```json + ```text [ { "instruction": "Arrange the given numbers in ascending order.", diff --git a/tutorials/finetune_lora.md b/tutorials/finetune_lora.md index c05148acde..f448605f6c 100644 --- a/tutorials/finetune_lora.md +++ b/tutorials/finetune_lora.md @@ -62,7 +62,7 @@ With only a few modifications, you can prepare and train on your own instruction A row has an entry for 'instruction', 'input', and 'output', where 'input' is optional an can be the empty string if the instruction doesn't require a context. Below is an example json file: - ```json + ```text [ { "instruction": "Arrange the given numbers in ascending order.", diff --git a/tutorials/quantize.md b/tutorials/quantize.md index a9dcf9be88..46fe9cd527 100644 --- a/tutorials/quantize.md +++ b/tutorials/quantize.md @@ -24,6 +24,7 @@ To reduce the memory requirements further, Lit-GPT supports several quantization Enabled with [bitsandbytes](https://github.com/TimDettmers/bitsandbytes). Check out the [paper](https://arxiv.org/abs/2305.14314v1) to learn more about how it works. > **Note**: `bitsandbytes` only supports `CUDA` devices and the `Linux` operating system. +Windows users should use [WSL2](https://learn.microsoft.com/en-us/windows/ai/directml/gpu-cuda-in-wsl). Uses the normalized float 4 (nf4) data type. This is recommended over "fp4" based on the paper's experimental results and theoretical analysis.