Lightning-AI · carmocca · Jul 21, 2023 · Jul 21, 2023 · Jul 21, 2023 · Jul 21, 2023
@@ -29,8 +29,8 @@ meta-llama/Llama-2-70b-chat-hf
 
 In order to use a specific checkpoint, for instance [Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf), download the weights and convert the checkpoint to the lit-gpt format.
 
-This requires that you've been granted access to the weights on the HuggingFace hub. You can do so by following the steps at https://huggingface.co/meta-llama/Llama-2-7b.
-After access is granted, you can find your HF hub token in https://huggingface.co/settings/tokens. 
+This requires that you've been granted access to the weights on the HuggingFace hub. You can do so by following the steps at <https://huggingface.co/meta-llama/Llama-2-7b>.
+After access is granted, you can find your HF hub token in <https://huggingface.co/settings/tokens>.
 
 ```bash
 pip install huggingface_hub

@@ -30,7 +30,7 @@ python finetune/adapter.py --checkpoint_dir checkpoints/stabilityai/stablelm-bas
 
 or for Adapter V2
 
-```bash 
+```bash
 python finetune/adapter_v2.py --checkpoint_dir checkpoints/stabilityai/stablelm-base-alpha-3b
 ```
 
@@ -40,6 +40,7 @@ Depending on the available GPU memory, you can also tune the `micro_batch_size`
 To fit Adapter V2 to 12GB memory set micro_batch_size = 2.
 
 For example, the following settings will let you finetune the model in under 1 hour:
+
 ```python
 devices = 4
 micro_batch_size = 4
@@ -78,27 +79,29 @@ python generate/adapter.py \
 
 or for Adapter V2
 
-```bash 
+```bash
 python generate/adapter_v2.py \
     --prompt "Recommend a movie to watch on the weekend." \
     --checkpoint_dir checkpoints/stabilityai/stablelm-base-alpha-3b
 ```
 
 Output:
-```
+
+```text
 A good movie to watch on the weekend would be The Lion King, since it's a classic family film that everyone can enjoy...
 ```
+
 If your GPU supports `bfloat16`, the script will automatically use it.
 
 ## Tune on your dataset
 
 With only a few modifications, you can prepare and train on your own instruction dataset.
 
-1. Create a json file in which each row holds one instruction-response pair. 
-   A row has an entry for 'instruction', 'input', and 'output', where 'input' is optional an can be 
+1. Create a json file in which each row holds one instruction-response pair.
+   A row has an entry for 'instruction', 'input', and 'output', where 'input' is optional an can be
    the empty string if the instruction doesn't require a context. Below is an example json file:
 
-    ```
+    ```text
     [
         {
             "instruction": "Arrange the given numbers in ascending order.",
@@ -123,16 +126,15 @@ With only a few modifications, you can prepare and train on your own instruction
     ```
 
 5. Run `finetune/adapter.py` by passing in the location of your data (and optionally other parameters):
-   
+
     ```bash
     python finetune/adapter.py \
         --data_dir data/mydata/ \
         --checkpoint_dir checkpoints/stabilityai/stablelm-base-alpha-3b \
         --out_dir data/mydata-finetuned
     ```
 
-
 ## Troubleshooting
 
 If you run into a CUDA error "Expected is_sm80 to be true, but got false", uncomment the line
-`torch.backends.cuda.enable_flash_sdp(False)` in the finetune script (see https://github.com/Lightning-AI/lit-llama/issues/101).
+`torch.backends.cuda.enable_flash_sdp(False)` in the finetune script (see <https://github.com/Lightning-AI/lit-llama/issues/101>).
@@ -53,20 +53,22 @@ python generate/full.py \
 ```
 
 Output:
-```
+
+```text
 A good movie to watch on the weekend would be The Lion King, since it's a classic family film that everyone can enjoy...
 ```
+
 If your GPU supports `bfloat16`, the script will automatically use it.
 
 ## Tune on your dataset
 
 With only a few modifications, you can prepare and train on your own instruction dataset.
 
-1. Create a json file in which each row holds one instruction-response pair. 
-   A row has an entry for 'instruction', 'input', and 'output', where 'input' is optional an can be 
+1. Create a json file in which each row holds one instruction-response pair.
+   A row has an entry for 'instruction', 'input', and 'output', where 'input' is optional an can be
    the empty string if the instruction doesn't require a context. Below is an example json file:
 
-    ```
+    ```text
     [
         {
             "instruction": "Arrange the given numbers in ascending order.",
@@ -91,16 +93,15 @@ With only a few modifications, you can prepare and train on your own instruction
     ```
 
 5. Run `finetune/full.py` by passing in the location of your data (and optionally other parameters):
-   
+
     ```bash
     python finetune/full.py \
         --data_dir data/mydata/ \
         --checkpoint_dir checkpoints/tiiuae/falcon-7b \
         --out_dir data/mydata-finetuned
     ```
 
-
 ## Troubleshooting
 
 If you run into a CUDA error "Expected is_sm80 to be true, but got false", uncomment the line
-`torch.backends.cuda.enable_flash_sdp(False)` in the finetune script (see https://github.com/Lightning-AI/lit-llama/issues/101).
+`torch.backends.cuda.enable_flash_sdp(False)` in the finetune script (see <https://github.com/Lightning-AI/lit-llama/issues/101>).
@@ -45,8 +45,10 @@ You can test the finetuned model with your own instructions by running:
 ```bash
 python generate/lora.py --prompt "Recommend a movie to watch on the weekend."
 ```
+
 Output:
-```
+
+```text
 I would recommend the movie The Martian (2015). It is a sci-fi movie starring Matt Damon that follows the story of...
 ```
 
@@ -56,11 +58,11 @@ If your GPU supports `bfloat16`, you can additionally pass `--precision bf16-tru
 
 With only a few modifications, you can prepare and train on your own instruction dataset.
 
-1. Create a json file in which each row holds one instruction-response pair. 
-   A row has an entry for 'instruction', 'input', and 'output', where 'input' is optional an can be 
+1. Create a json file in which each row holds one instruction-response pair.
+   A row has an entry for 'instruction', 'input', and 'output', where 'input' is optional an can be
    the empty string if the instruction doesn't require a context. Below is an example json file:
 
-    ```
+    ```text
     [
         {
             "instruction": "Arrange the given numbers in ascending order.",
@@ -85,13 +87,12 @@ With only a few modifications, you can prepare and train on your own instruction
     ```
 
 5. Run `finetune/lora.py` by passing in the location of your data (and optionally other parameters):
-   
+
     ```bash
     python finetune/lora.py --data_dir data/mydata/ --out_dir out/myexperiment
     ```
 
-
 ## Troubleshooting
 
 If you run into a CUDA error "Expected is_sm80 to be true, but got false", uncomment the line
-`torch.backends.cuda.enable_flash_sdp(False)` in the script below (see https://github.com/Lightning-AI/lit-llama/issues/101).
+`torch.backends.cuda.enable_flash_sdp(False)` in the script below (see <https://github.com/Lightning-AI/lit-llama/issues/101>).
@@ -5,8 +5,10 @@ We demonstrate how to run inference (next token prediction) with the GPT base mo
 ```bash
 python generate/base.py --prompt "Hello, my name is" --checkpoint_dir checkpoints/stabilityai/stablelm-base-alpha-3b
 ```
+
 Output:
-```
+
+```text
 Hello, my name is Levi Durrer, I'm an Austrian journalist - Chairman of the Press Blair Party, with 37 years in the Press Blair International, and two years in the Spectre of Austerity for the other. I'm crossing my fingers that you will feel
 ```
 

@@ -1,10 +1,10 @@
-## Dealing with out-of-memory (OOM) errors:
+## Dealing with out-of-memory (OOM) errors
 
 If you got this error while running a script
 
 ```bash
 OutOfMemoryError: CUDA out of memory. Tried to allocate 2.22 GiB. GPU 0 has a total capacty of 79.15 GiB of which 228.38 MiB is free. Including non-PyTorch memory, this process
-has 78.93 GiB memory in use. Of the allocated memory 76.28 GiB is allocated by PyTorch, and 2.14 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory 
+has 78.93 GiB memory in use. Of the allocated memory 76.28 GiB is allocated by PyTorch, and 2.14 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory
 is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
 ```
 

@@ -24,6 +24,7 @@ To reduce the memory requirements further, Lit-GPT supports several quantization
 Enabled with [bitsandbytes](https://github.com/TimDettmers/bitsandbytes). Check out the [paper](https://arxiv.org/abs/2305.14314v1) to learn more about how it works.
 
 > **Note**: `bitsandbytes` only supports `CUDA` devices and the `Linux` operating system.
+Windows users should use [WSL2](https://learn.microsoft.com/en-us/windows/ai/directml/gpu-cuda-in-wsl).
 
 Uses the normalized float 4 (nf4) data type. This is recommended over "fp4" based on the paper's experimental results and theoretical analysis.
 

@@ -59,7 +59,8 @@ You'll notice that afterwards, generation times drop to ~2s.
 Coming soon.
 
 > **Warning**
-> When you are done, remember to delete your instance 
+> When you are done, remember to delete your instance
+>
 > ```shell
 > gcloud compute tpus tpu-vm delete lit-gpt --zone=us-central2-b
 > ```
-Original file line number
+Diff line change
@@ Expand Up @@
     ```bash
     python generate/base.py --prompt "Hello, my name is" --checkpoint_dir checkpoints/stabilityai/stablelm-base-alpha-3b
     ```
     Output:
-    ```
+    ```text
     Hello, my name is Levi Durrer, I'm an Austrian journalist - Chairman of the Press Blair Party, with 37 years in the Press Blair International, and two years in the Spectre of Austerity for the other. I'm crossing my fingers that you will feel
     ```
@@ Expand Down @@