Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tutorial markdowns small fixes #295

Merged
merged 3 commits into from
Jul 21, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions tutorials/download_llama_2.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,8 @@ meta-llama/Llama-2-70b-chat-hf

In order to use a specific checkpoint, for instance [Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf), download the weights and convert the checkpoint to the lit-gpt format.

This requires that you've been granted access to the weights on the HuggingFace hub. You can do so by following the steps at https://huggingface.co/meta-llama/Llama-2-7b.
After access is granted, you can find your HF hub token in https://huggingface.co/settings/tokens.
This requires that you've been granted access to the weights on the HuggingFace hub. You can do so by following the steps at <https://huggingface.co/meta-llama/Llama-2-7b>.
After access is granted, you can find your HF hub token in <https://huggingface.co/settings/tokens>.

```bash
pip install huggingface_hub
Expand Down
20 changes: 11 additions & 9 deletions tutorials/finetune_adapter.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ python finetune/adapter.py --checkpoint_dir checkpoints/stabilityai/stablelm-bas

or for Adapter V2

```bash
```bash
python finetune/adapter_v2.py --checkpoint_dir checkpoints/stabilityai/stablelm-base-alpha-3b
```

Expand All @@ -40,6 +40,7 @@ Depending on the available GPU memory, you can also tune the `micro_batch_size`
To fit Adapter V2 to 12GB memory set micro_batch_size = 2.

For example, the following settings will let you finetune the model in under 1 hour:

```python
devices = 4
micro_batch_size = 4
Expand Down Expand Up @@ -78,27 +79,29 @@ python generate/adapter.py \

or for Adapter V2

```bash
```bash
python generate/adapter_v2.py \
--prompt "Recommend a movie to watch on the weekend." \
--checkpoint_dir checkpoints/stabilityai/stablelm-base-alpha-3b
```

Output:
```

```text
A good movie to watch on the weekend would be The Lion King, since it's a classic family film that everyone can enjoy...
```

If your GPU supports `bfloat16`, the script will automatically use it.

## Tune on your dataset

With only a few modifications, you can prepare and train on your own instruction dataset.

1. Create a json file in which each row holds one instruction-response pair.
A row has an entry for 'instruction', 'input', and 'output', where 'input' is optional an can be
1. Create a json file in which each row holds one instruction-response pair.
A row has an entry for 'instruction', 'input', and 'output', where 'input' is optional an can be
the empty string if the instruction doesn't require a context. Below is an example json file:

```
```text
[
{
"instruction": "Arrange the given numbers in ascending order.",
Expand All @@ -123,16 +126,15 @@ With only a few modifications, you can prepare and train on your own instruction
```

5. Run `finetune/adapter.py` by passing in the location of your data (and optionally other parameters):

```bash
python finetune/adapter.py \
--data_dir data/mydata/ \
--checkpoint_dir checkpoints/stabilityai/stablelm-base-alpha-3b \
--out_dir data/mydata-finetuned
```


## Troubleshooting

If you run into a CUDA error "Expected is_sm80 to be true, but got false", uncomment the line
`torch.backends.cuda.enable_flash_sdp(False)` in the finetune script (see https://github.com/Lightning-AI/lit-llama/issues/101).
`torch.backends.cuda.enable_flash_sdp(False)` in the finetune script (see <https://github.com/Lightning-AI/lit-llama/issues/101>).
15 changes: 8 additions & 7 deletions tutorials/finetune_full.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,20 +53,22 @@ python generate/full.py \
```

Output:
```

```text
A good movie to watch on the weekend would be The Lion King, since it's a classic family film that everyone can enjoy...
```

If your GPU supports `bfloat16`, the script will automatically use it.

## Tune on your dataset

With only a few modifications, you can prepare and train on your own instruction dataset.

1. Create a json file in which each row holds one instruction-response pair.
A row has an entry for 'instruction', 'input', and 'output', where 'input' is optional an can be
1. Create a json file in which each row holds one instruction-response pair.
A row has an entry for 'instruction', 'input', and 'output', where 'input' is optional an can be
the empty string if the instruction doesn't require a context. Below is an example json file:

```
```text
[
{
"instruction": "Arrange the given numbers in ascending order.",
Expand All @@ -91,16 +93,15 @@ With only a few modifications, you can prepare and train on your own instruction
```

5. Run `finetune/full.py` by passing in the location of your data (and optionally other parameters):

```bash
python finetune/full.py \
--data_dir data/mydata/ \
--checkpoint_dir checkpoints/tiiuae/falcon-7b \
--out_dir data/mydata-finetuned
```


## Troubleshooting

If you run into a CUDA error "Expected is_sm80 to be true, but got false", uncomment the line
`torch.backends.cuda.enable_flash_sdp(False)` in the finetune script (see https://github.com/Lightning-AI/lit-llama/issues/101).
`torch.backends.cuda.enable_flash_sdp(False)` in the finetune script (see <https://github.com/Lightning-AI/lit-llama/issues/101>).
15 changes: 8 additions & 7 deletions tutorials/finetune_lora.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,10 @@ You can test the finetuned model with your own instructions by running:
```bash
python generate/lora.py --prompt "Recommend a movie to watch on the weekend."
```

Output:
```

```text
I would recommend the movie The Martian (2015). It is a sci-fi movie starring Matt Damon that follows the story of...
```

Expand All @@ -56,11 +58,11 @@ If your GPU supports `bfloat16`, you can additionally pass `--precision bf16-tru

With only a few modifications, you can prepare and train on your own instruction dataset.

1. Create a json file in which each row holds one instruction-response pair.
A row has an entry for 'instruction', 'input', and 'output', where 'input' is optional an can be
1. Create a json file in which each row holds one instruction-response pair.
A row has an entry for 'instruction', 'input', and 'output', where 'input' is optional an can be
the empty string if the instruction doesn't require a context. Below is an example json file:

```
```text
[
{
"instruction": "Arrange the given numbers in ascending order.",
Expand All @@ -85,13 +87,12 @@ With only a few modifications, you can prepare and train on your own instruction
```

5. Run `finetune/lora.py` by passing in the location of your data (and optionally other parameters):

```bash
python finetune/lora.py --data_dir data/mydata/ --out_dir out/myexperiment
```


## Troubleshooting

If you run into a CUDA error "Expected is_sm80 to be true, but got false", uncomment the line
`torch.backends.cuda.enable_flash_sdp(False)` in the script below (see https://github.com/Lightning-AI/lit-llama/issues/101).
`torch.backends.cuda.enable_flash_sdp(False)` in the script below (see <https://github.com/Lightning-AI/lit-llama/issues/101>).
4 changes: 3 additions & 1 deletion tutorials/inference.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,10 @@ We demonstrate how to run inference (next token prediction) with the GPT base mo
```bash
python generate/base.py --prompt "Hello, my name is" --checkpoint_dir checkpoints/stabilityai/stablelm-base-alpha-3b
```

Output:
```

```text
Hello, my name is Levi Durrer, I'm an Austrian journalist - Chairman of the Press Blair Party, with 37 years in the Press Blair International, and two years in the Spectre of Austerity for the other. I'm crossing my fingers that you will feel
```

Expand Down
4 changes: 2 additions & 2 deletions tutorials/oom.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
## Dealing with out-of-memory (OOM) errors:
## Dealing with out-of-memory (OOM) errors

If you got this error while running a script

```bash
OutOfMemoryError: CUDA out of memory. Tried to allocate 2.22 GiB. GPU 0 has a total capacty of 79.15 GiB of which 228.38 MiB is free. Including non-PyTorch memory, this process
has 78.93 GiB memory in use. Of the allocated memory 76.28 GiB is allocated by PyTorch, and 2.14 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory
has 78.93 GiB memory in use. Of the allocated memory 76.28 GiB is allocated by PyTorch, and 2.14 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory
is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
```

Expand Down
1 change: 1 addition & 0 deletions tutorials/quantize.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ To reduce the memory requirements further, Lit-GPT supports several quantization
Enabled with [bitsandbytes](https://github.com/TimDettmers/bitsandbytes). Check out the [paper](https://arxiv.org/abs/2305.14314v1) to learn more about how it works.

> **Note**: `bitsandbytes` only supports `CUDA` devices and the `Linux` operating system.
Windows users should use [WSL2](https://learn.microsoft.com/en-us/windows/ai/directml/gpu-cuda-in-wsl).
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like one of the users with a Windows machine managed to run bnb inside WSL2 (which is basically a Linux in a VM).


Uses the normalized float 4 (nf4) data type. This is recommended over "fp4" based on the paper's experimental results and theoretical analysis.

Expand Down
3 changes: 2 additions & 1 deletion tutorials/tpus.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,8 @@ You'll notice that afterwards, generation times drop to ~2s.
Coming soon.

> **Warning**
> When you are done, remember to delete your instance
> When you are done, remember to delete your instance
>
> ```shell
> gcloud compute tpus tpu-vm delete lit-gpt --zone=us-central2-b
> ```