Skip to content

Commit

Permalink
MD lint the /docs/* dir (NVIDIA#597)
Browse files Browse the repository at this point in the history
MD lint the /docs dir

Signed-off-by: Brent Salisbury <[email protected]>
  • Loading branch information
nerdalert authored Mar 27, 2024
1 parent 275b858 commit 3652034
Show file tree
Hide file tree
Showing 5 changed files with 184 additions and 66 deletions.
4 changes: 3 additions & 1 deletion .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
**Which issue is resolved by this Pull Request:**
# Changes

**Which issue is resolved by this Pull Request:**
Resolves #

**Description of your changes:**
15 changes: 11 additions & 4 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,17 @@
Workflow figure is generated using [PlantUML](https://plantuml.com/ditaa) with the [ditaa](https://ditaa.sourceforge.net).
To generate it yourself, the easiest way is to install the [PlantUML plugin in VS Code](https://marketplace.visualstudio.com/items?itemName=jebbs.plantuml) (with the prerequisite installed), open the file and click preview.
# Workflow PlantUML

Workflow figure is generated using [PlantUML](https://plantuml.com/ditaa) with
the [ditaa](https://ditaa.sourceforge.net).
To generate it yourself, the easiest way is to install the
[PlantUML plugin in VS Code](https://marketplace.visualstudio.com/items?itemName=jebbs.plantuml)
(with the prerequisite installed), open the file and click preview.

If you don't want to install the dependencies locally, you can use the following
settings to make the preview work with a remote render:

If you don't want to install the dependencies locally, you can use the following settings to make the preview work with a remote render:
```json
"plantuml.render": "PlantUMLServer",
"plantuml.server": "https://www.plantuml.com/plantuml",
```

[ASCIIFlow](https://asciiflow.com/#/) is a helpful tool to edit the source code.
[ASCIIFlow](https://asciiflow.com/#/) is a helpful tool to edit the source code.
36 changes: 24 additions & 12 deletions docs/containerization.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
# Putting `lab` in a Container AND making it go fast
# Putting `lab` in a Container AND making it go fast

Containerization of `lab` allows for portability and ease of setup. With this, users can now run lab on OpenShift to test the speed of `lab train` and `generate` using dedicated GPUs. This guide shows you how to put the `lab`CLI, all of its dependencies,
and your GPU into a container for an isolated and easily reproducible experience.
Containerization of `lab` allows for portability and ease of setup. With this,
users can now run lab on OpenShift to test the speed of `lab train` and `generate`
using dedicated GPUs. This guide shows you how to put the `lab`CLI, all of its
dependencies, and your GPU into a container for an isolated and easily reproducible
experience.


## Steps to build an image then run a container:
## Steps to build an image then run a container

**Containerfile:**

Expand All @@ -30,25 +32,35 @@ CMD ["/bin/bash"]

Or image: TBD (am I allowed to have a public image with references to lab in it?)

This containerfile is based on Nvidia's CUDA image, which lucky for us plugs directly into Podman via their `nvidia-container-toolkit`! The ubi9 base image does not have most packages installed. The bulk of the `containerfile` is spent configuring your system so `lab` can be installed and run properly. ubi9 as compared to ubuntu cannot install the entire nvidia-12-4 toolkit. This did not impact performance during testing.
This containerfile is based on Nvidia's CUDA image, which lucky for us plugs
directly into Podman via their `nvidia-container-toolkit`! The ubi9 base image
does not have most packages installed. The bulk of the `containerfile` is spent
configuring your system so `lab` can be installed and run properly. ubi9 as compared
to ubuntu cannot install the entire nvidia-12-4 toolkit. This did not impact
performance during testing.

1. Podman build –ssh=default -f <Containerfile_Path>
```shell
1. podman build –ssh=default -f <Containerfile_Path>
2. curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
3. sudo yum-config-manager --enable nvidia-container-toolkit-experimental
4. sudo dnf install -y nvidia-container-toolkit
5. sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
6. nvidia-ctk cdi list
Example output:
INFO[0000] Found 2 CDI devices
INFO[0000] Found 2 CDI devices
nvidia.com/gpu=0
nvidia.com/gpu=all
7. podman run --device nvidia.com/gpu=0 --security-opt=label=disable -it <IMAGE_ID>
```

Voila! You now have a container with CUDA and GPUs enabled!

#### Sources:
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html – nvidia container toolkit
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html podman
### Sources

[Nvidia Container Toolkit Install Guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)

[Podman Support for Container Device Interface](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html)

### Notes

#### Notes:
Thanks to Taj Salawu for figuring out how to pass the git ssh keys properly!
29 changes: 18 additions & 11 deletions docs/converting_GGUF.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,12 @@
<a name="model-convert-quant"></a>
# Optional: Converting a Model to GGUF and Quantizing

# Optional: Converting a Model to GGUF and Quantizing

The latest [llama.cpp](https://github.com/ggerganov/llama.cpp) framework
requires the model to be converted into [GGUF](https://medium.com/@sandyeep70/ggml-to-gguf-a-leap-in-language-model-file-formats-cd5d3a6058f9) format. [GGUF](https://medium.com/@sandyeep70/ggml-to-gguf-a-leap-in-language-model-file-formats-cd5d3a6058f9) is a quantization technique. [Quantization](https://www.tensorops.ai/post/what-are-quantized-llms) is a technique used to reduce the size of large neural networks, including large language models (LLMs) by modifying the precision of their weights. If you have a model already in GGUF format, you can skip this step.
The latest [llama.cpp](https://github.com/ggerganov/llama.cpp) framework
requires the model to be converted into [GGUF](https://medium.com/@sandyeep70/ggml-to-gguf-a-leap-in-language-model-file-formats-cd5d3a6058f9)
format. [GGUF](https://medium.com/@sandyeep70/ggml-to-gguf-a-leap-in-language-model-file-formats-cd5d3a6058f9)
is a quantization technique. [Quantization](https://www.tensorops.ai/post/what-are-quantized-llms)
is a technique used to reduce the size of large neural networks, including large
language models (LLMs) by modifying the precision of their weights. If you have a
model already in GGUF format, you can skip this step.

## Clone the llama.cpp repository

Expand Down Expand Up @@ -42,7 +45,8 @@ def write(self):

## Convert a model to GGUF

The following command converts a Hugging Face model (safetensors) to [GGUF](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md) format and saves it in your model directory with a `.gguf` extension.
The following command converts a Hugging Face model (safetensors) to [GGUF](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md)
format and saves it in your model directory with a `.gguf` extension.

```shell
export MODEL_DIR={model_directory}
Expand All @@ -53,9 +57,10 @@ python convert-hf-to-gguf.py $MODEL_DIR --outtype f16
## Quantize

Optionally, for smaller/faster models with varying loss of quality use a quantized model.
Optionally, for smaller/faster models with varying loss of quality use a
quantized model.

#### Make the llama.cpp binaries
### Make the llama.cpp binaries

Build binaries like `quantize` etc. for your environment.

Expand All @@ -65,15 +70,17 @@ make

#### Run quantize command


```shell
./quantize {model_directory}/{f16_gguf_model} <type>
```

For example, the following command converts the f16 GGUF model to a Q4_K_M quantized model and saves it in your model directory with a `<type>.gguf` suffix (e.g. ggml-model-Q4_K_M.gguf).
For example, the following command converts the f16 GGUF model to a Q4_K_M
quantized model and saves it in your model directory with a `<type>.gguf`
suffix (e.g. ggml-model-Q4_K_M.gguf).

```shell
./quantize $MODEL_DIR/ggml-model-f16.gguf Q4_K_M
```

> Tip: Use `./quantize help` for a list of quantization types with their relative size and output quality along with additional usage parameters.
> Tip: Use `./quantize help` for a list of quantization types with their
> relative size and output quality along with additional usage parameters.
Loading

0 comments on commit 3652034

Please sign in to comment.