Add the huggingface token parameter, and modify the file path in llam… #741

melodyliu1986 · 2024-08-08T07:32:12Z

I want to use the mistralai/Mistral-7B-Instruct-v0.2 models, and found there are no gguf files in HuggingFace, then I decided to use the ./convert_models functions to convert the model. I found there are some issues exist:

401 Client Error

huggingface_hub.utils._errors.GatedRepoError: 401 Client Error. (Request ID: Root=1-66b1862a-1bc229376e7f3f4020a3c951;60195d59-03d1-4f26-b3ce-d3b04c2fe2b4)
Cannot access gated repo for url https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/resolve/36d7e540e651b68dac59394d9c3381651df7fb01/.gitattributes

So I added the HF_TOKEN=<YOUR_HF_TOKEN_ID> parameter in the code.
Impacted files: README.md, download_huggingface.py, run.sh

No convert.py and quantize files under llama.cpp

python: can't open file '/opt/app-root/src/converter/llama.cpp/convert.py': [Errno 2] No such file or directory
run.sh: line 23: llama.cpp/quantize: No such file or directory

If we go to https://github.com/ggerganov/llama.cpp.git, we can find the convert.py has been deprecated and moved to examples/convert_legacy_llama.py. I am not sure if I should just keep the line "python llama.cpp/convert-hf-to-gguf.py /opt/app-root/src/converter/converted_models/$hf_model_url", I just replace the convert.py with the correct path. also for llama.cpp/quantize

Impacted file: run.sh

No image name was specified in the README.md

So I added "localhost/converter" in the "podman run" command.

Here is my testing after the modification:

$ podman run -it --rm -v models:/converter/converted_models -e HF_MODEL_URL=mistralai/Mistral-7B-Instruct-v0.2 -e HF_TOKEN=*** -e QUANTIZATION=Q4_K_M -e KEEP_ORIGINAL_MODEL="False" localhost/converter

README.md: 100%|███████████████████████████████████████████████████████████████████████████████████| 5.47k/5.47k [00:00<00:00, 21.9MB/s]
.gitattributes: 100%|██████████████████████████████████████████████████████████████████████████████| 1.52k/1.52k [00:00<00:00, 8.79MB/s]
model.safetensors.index.json: 100%|█████████████████████████████████████████████████████████████████| 25.1k/25.1k [00:00<00:00, 357kB/s]
config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████| 596/596 [00:00<00:00, 3.67MB/s]
generation_config.json: 100%|███████████████████████████████████████████████████████████████████████████| 111/111 [00:00<00:00, 621kB/s]
pytorch_model.bin.index.json: 100%|████████████████████████████████████████████████████████████████| 23.9k/23.9k [00:00<00:00, 72.1MB/s]
special_tokens_map.json: 100%|█████████████████████████████████████████████████████████████████████████| 414/414 [00:00<00:00, 6.70MB/s]
tokenizer.model: 100%|████████████████████████████████████████████████████████████████████████████████| 493k/493k [00:00<00:00, 861kB/s]
tokenizer_config.json: 100%|███████████████████████████████████████████████████████████████████████| 2.10k/2.10k [00:00<00:00, 12.7MB/s]
tokenizer.json: 100%|███████████████████████████████████████████████████████████████████████████████| 1.80M/1.80M [00:02<00:00, 630kB/s]
model-00001-of-00003.safetensors: 100%|████████████████████████████████████████████████████████████| 4.94G/4.94G [52:42<00:00, 1.56MB/s]
model-00003-of-00003.safetensors: 100%|██████████████████████████████████████████████████████████| 4.54G/4.54G [1:01:03<00:00, 1.24MB/s]
pytorch_model-00001-of-00003.bin: 100%|██████████████████████████████████████████████████████████| 4.94G/4.94G [1:05:53<00:00, 1.25MB/s]
pytorch_model-00002-of-00003.bin: 100%|██████████████████████████████████████████████████████████| 5.00G/5.00G [1:06:22<00:00, 1.26MB/s]
model-00002-of-00003.safetensors: 100%|██████████████████████████████████████████████████████████| 5.00G/5.00G [1:07:19<00:00, 1.24MB/s]
pytorch_model-00003-of-00003.bin: 100%|██████████████████████████████████████████████████████████| 5.06G/5.06G [1:07:36<00:00, 1.25MB/s]
Fetching 16 files: 100%|█████████
INFO:convert:Loading model file /opt/app-root/src/converter/converted_models/mistralai/Mistral-7B-Instruct-v0.2/model-00001-of-00003.safetensorsmodel-00002-of-00003.bin:  99%|██████████████████████████████████████████████████████████▍| 4.95G/5.00G [5:50:49<03:48, 222kB/s]
INFO:convert:Loading model file /opt/app-root/src/converter/converted_models/mistralai/Mistral-7B-Instruct-v0.2/model-00001-of-00003.safetensorsmodel-00002-of-00003.bin: 100%|███████████████████████████████████████████████████████████| 5.00G/5.00G [5:54:12<00:00, 229kB/s]
....
INFO:convert:Loading model file /opt/app-root/src/converter/converted_models/mistralai/Mistral-7B-Instruct-v0.2/model-00002-of-00003.safetensors
INFO:convert:Loading model file /opt/app-root/src/converter/converted_models/mistralai/Mistral-7B-Instruct-v0.2/model-00003-of-00003.safetensors
INFO:convert:params = Params(n_vocab=32000, n_embd=4096, n_layer=32, n_ctx=32768, n_ff=14336, n_head=32, n_head_kv=8, n_experts=None, n_experts_used=None, f_norm_eps=1e-05, rope_scaling_type=None, f_rope_freq_base=1000000.0, f_rope_scale=None, n_ctx_orig=None, rope_finetuned=None, ftype=None, path_model=PosixPath('/opt/app-root/src/converter/converted_models/mistralai/Mistral-7B-Instruct-v0.2'))
INFO:convert:Loaded vocab file PosixPath('/opt/app-root/src/converter/converted_models/mistralai/Mistral-7B-Instruct-v0.2/tokenizer.model'), type 'spm'
INFO:convert:model parameters count : (7241732096, 7241732096, 0) (7.2B)
INFO:convert:Vocab info: <SentencePieceVocab with 32000 base tokens and 0 added tokens>
INFO:convert:Special vocab info: <SpecialVocab with 0 merges, special tokens {'bos': 1, 'eos': 2, 'unk': 0}, add special tokens {'bos': True, 'eos': False}>
INFO:convert:Writing /opt/app-root/src/converter/converted_models/mistralai/Mistral-7B-Instruct-v0.2/Mistral-7B-Instruct-v0.2-F32.gguf, format 0
......
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:/opt/app-root/src/converter/converted_models/mistralai/Mistral-7B-Instruct-v0.2/Mistral-7B-Instruct-v0.2-F32.gguf: n_tensors = 291, total_size = 29.0G
INFO:convert:[  1/291] Writing tensor token_embd.weight                      | size  32000 x   4096  | type F32  | T+   0
INFO:convert:[  2/291] Writing tensor blk.0.attn_norm.weight                 | size   4096           | type F32  | T+   1
INFO:convert:[  3/291] Writing tensor blk.0.ffn_down.weight                  | size   4096 x  14336  | type F32  | T+   1
INFO:convert:[  4/291] Writing tensor blk.0.ffn_gate.weight                  | size  14336 x   4096  | type F32  | T+   1
....

(the log is too long)

rhatdan · 2024-08-08T20:29:43Z

You need to sign commit
git commit -a --amend -s
git push --force

melodyliu1986 · 2024-08-09T03:11:04Z

You need to sign commit git commit -a --amend -s git push --force
Done, please check again.

rhatdan · 2024-08-09T10:37:39Z

@MichaelClifford PTAL

rhatdan · 2024-08-09T10:39:09Z

Do you always need an HF_TOKEN? Can this still be used without it? If so can we make it optional in the podman run command? (I could be sadly mistaken).

melodyliu1986 · 2024-08-12T01:38:02Z

Do you always need an HF_TOKEN? Can this still be used without it? If so can we make it optional in the podman run command? (I could be sadly mistaken).

So how to do that? copy the HF_TOKEN into the image by Containerfile?

rhatdan · 2024-08-12T13:33:25Z

No I am questioning whether this change forces users to always have and specify a token. I don't really know how this all works, but it seems that if an image is available without a token, this change will force users to specify a token even if one does not exist.

convert_models/run.sh

convert_models/README.md

…a.cpp repo. Signed-off-by: Song Liu <[email protected]>

also the file name changed to "_" in https://github.com/ggerganov/llama.cpp, so change the file name from llama.cpp/convert-hf-to-gguf.py to llama.cpp/convert_hf_to_gguf.py Signed-off-by: Song Liu <[email protected]>

From the https://github.com/ggerganov/llama.cpp/blob/master/Makefile, it said "The 'quantize' binary is deprecated. Please use 'llama-quantize' instead." The command works after my testing using llama-quantize. Signed-off-by: Song Liu <[email protected]>

When building the `driver-toolkit` image, It is cumbersome to find kernel version that matches the future `nvidia-bootc` and `intel-bootc` images. However, the kernel version is stored as a label on the `rhel-bootc` images, which are exposed as the `FROM` variable in the Makefile. This change collects the kernel version using `skopeo inspect` and `jq`. The `DRIVER_TOOLKIT_BASE_IMAGE` variable is introduced in the Makefile to dissociate it from the `FROM` variable that is used as the `nvidia-bootc` and `intel-bootc` base image. The user can now specify something like: ```shell make nvidia-bootc \ FROM=quay.io/centos-bootc/centos-bootc:stream9 \ DRIVER_TOOLKIT_BASE_IMAGE=quay.io/centos/centos:stream9 ``` Also, the `VERSION` variable in `/etc/os-release` is the full version, so this change modifies the command to retrieve the `OS_VERSION_MAJOR` value. Signed-off-by: Fabien Dupont <[email protected]> Signed-off-by: Song Liu <[email protected]>

…G is specified Signed-off-by: Matthieu Bernardin <[email protected]> Signed-off-by: Song Liu <[email protected]>

Signed-off-by: Javi Polo <[email protected]> Signed-off-by: Song Liu <[email protected]>

Signed-off-by: lstocchi <[email protected]> Signed-off-by: Song Liu <[email protected]>

Intel has released the version `1.17.0-495` of their Gaudi drivers. They are available explicitly for RHEL 9.4 with a new `9.4` folder in the RPM repository. This change updates the arguments to use the new version from the new repository folder. Signed-off-by: Fabien Dupont <[email protected]> Signed-off-by: Song Liu <[email protected]>

Signed-off-by: axel7083 <[email protected]> Signed-off-by: Song Liu <[email protected]>

torchrun jobs create a number of children per GPU which can often exceed the 2k limit. Signed-off-by: Jason T. Greene <[email protected]> Signed-off-by: Song Liu <[email protected]>

The `nvidia-driver` package provides the firmware files for the given driver version. This change removes the copy of the firmware from the builder step and install the `nvidia-driver` package instead. This also allows a better tracability of the files in the final image. Signed-off-by: Fabien Dupont <[email protected]> Signed-off-by: Song Liu <[email protected]>

Signed-off-by: axel7083 <[email protected]> Signed-off-by: Song Liu <[email protected]>

Signed-off-by: Maysun J Faisal <[email protected]> Signed-off-by: Song Liu <[email protected]>

…ot exist Signed-off-by: Javi Polo <[email protected]> Signed-off-by: Song Liu <[email protected]>

Signed-off-by: Javi Polo <[email protected]> Signed-off-by: Song Liu <[email protected]>

Signed-off-by: axel7083 <[email protected]> fix: missing $ Signed-off-by: axel7083 <[email protected]> Signed-off-by: Song Liu <[email protected]>

The `/dev/nvswitchctl` device is created by the NVIDIA Fabric Manager service, so it cannot be a condition for the `nvidia-fabricmanager` service. Looking at the NVIDIA driver startup script for Kubernetes, the actual check is the presence of `/proc/driver/nvidia-nvswitch/devices` and the fact that it's not empty [1]. This change modifies the condition to `ConditionDirectoryNotEmpty=/proc/driver/nvidia-nvswitch/devices`, which verifies that a certain path exists and is a non-empty directory. [1] https://gitlab.com/nvidia/container-images/driver/-/blob/main/rhel9/nvidia-driver?ref_type=heads#L262-269 Signed-off-by: Fabien Dupont <[email protected]> Signed-off-by: Song Liu <[email protected]>

Signed-off-by: Song Liu <[email protected]>

melodyliu1986 · 2024-08-15T07:32:31Z

Made mistakes when sign-off, will fork a new branch to implement the changes.

melodyliu1986 requested review from MichaelClifford, rhatdan, sallyom, lmilbaum, cgwalters and Gregory-Pereira as code owners August 8, 2024 07:32

melodyliu1986 mentioned this pull request Aug 8, 2024

Update the convert_models/run.sh #724

Open

melodyliu1986 force-pushed the soliu-convert-model-branch branch 2 times, most recently from 5896f60 to 8cb2b68 Compare August 9, 2024 03:06

MichaelClifford requested changes Aug 12, 2024

View reviewed changes

convert_models/run.sh Outdated Show resolved Hide resolved

convert_models/README.md Outdated Show resolved Hide resolved

melodyliu1986 force-pushed the soliu-convert-model-branch branch from 8cb2b68 to ebf7b87 Compare August 15, 2024 05:29

melodyliu1986 requested a review from MichaelClifford August 15, 2024 06:45

Song Liu and others added 12 commits August 15, 2024 14:55

Add the huggingface token parameter, and modify the file path in llam…

0a73594

…a.cpp repo. Signed-off-by: Song Liu <[email protected]>

Update run.sh

e1c875d

also the file name changed to "_" in https://github.com/ggerganov/llama.cpp, so change the file name from llama.cpp/convert-hf-to-gguf.py to llama.cpp/convert_hf_to_gguf.py Signed-off-by: Song Liu <[email protected]>

Update run.sh

adb2097

From the https://github.com/ggerganov/llama.cpp/blob/master/Makefile, it said "The 'quantize' binary is deprecated. Please use 'llama-quantize' instead." The command works after my testing using llama-quantize. Signed-off-by: Song Liu <[email protected]>

feat(nvidia-bootc): Retag instructlab image if INSTRUCTLAB_IMAGE_RETA…

1a42721

…G is specified Signed-off-by: Matthieu Bernardin <[email protected]> Signed-off-by: Song Liu <[email protected]>

RHELAI-838 Enforce insights registration in ilab wrapper~

de6cfe1

Signed-off-by: Javi Polo <[email protected]> Signed-off-by: Song Liu <[email protected]>

fix: fix image name on chatbot ai-lab.yaml

112b706

Signed-off-by: lstocchi <[email protected]> Signed-off-by: Song Liu <[email protected]>

fix(whispercpp): set app ownership to user 1001

3a86a1b

Signed-off-by: axel7083 <[email protected]> Signed-off-by: Song Liu <[email protected]>

fix(whisper_cpp): missing ownership

53e3805

Signed-off-by: axel7083 <[email protected]> Signed-off-by: Song Liu <[email protected]>

fix: resolve fork failures during training runs

9bbff4b

torchrun jobs create a number of children per GPU which can often exceed the 2k limit. Signed-off-by: Jason T. Greene <[email protected]> Signed-off-by: Song Liu <[email protected]>

axel7083 and others added 9 commits August 15, 2024 14:55

chore(summarizer): adding markdown type file input

800a324

Signed-off-by: axel7083 <[email protected]> Signed-off-by: Song Liu <[email protected]>

fix: file type check

59c8a67

Signed-off-by: axel7083 <[email protected]> Signed-off-by: Song Liu <[email protected]>

Configure model for rag recipe

8c1c46b

Signed-off-by: Maysun J Faisal <[email protected]> Signed-off-by: Song Liu <[email protected]>

RHELAI-869: Hardcode XDG_RUNTIME_DIR path in case the variable does n…

3a44af2

…ot exist Signed-off-by: Javi Polo <[email protected]> Signed-off-by: Song Liu <[email protected]>

Move --target-arch to image builder args

be1e05e

Signed-off-by: Javi Polo <[email protected]> Signed-off-by: Song Liu <[email protected]>

chore: tag model_servers images on release

1d6fcc7

Signed-off-by: axel7083 <[email protected]> fix: missing $ Signed-off-by: axel7083 <[email protected]> Signed-off-by: Song Liu <[email protected]>

Update the HF_TOKEN to optional, and modify the UI.

a4bc15c

Signed-off-by: Song Liu <[email protected]>

Also modify the run.sh and README.md to make the HF_TOKEN para optional.

f366eb3

Signed-off-by: Song Liu <[email protected]>

melodyliu1986 force-pushed the soliu-convert-model-branch branch from fa17dcc to f366eb3 Compare August 15, 2024 06:56

melodyliu1986 closed this Aug 15, 2024

melodyliu1986 deleted the soliu-convert-model-branch branch August 15, 2024 08:15

melodyliu1986 mentioned this pull request Aug 15, 2024

Add the huggingface token parameter, and modify the file path in llama.cpp repo #761

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the huggingface token parameter, and modify the file path in llam… #741

Add the huggingface token parameter, and modify the file path in llam… #741

melodyliu1986 commented Aug 8, 2024

rhatdan commented Aug 8, 2024

melodyliu1986 commented Aug 9, 2024

rhatdan commented Aug 9, 2024

rhatdan commented Aug 9, 2024

melodyliu1986 commented Aug 12, 2024

rhatdan commented Aug 12, 2024

melodyliu1986 commented Aug 15, 2024

Add the huggingface token parameter, and modify the file path in llam… #741

Add the huggingface token parameter, and modify the file path in llam… #741

Conversation

melodyliu1986 commented Aug 8, 2024

rhatdan commented Aug 8, 2024

melodyliu1986 commented Aug 9, 2024

rhatdan commented Aug 9, 2024

rhatdan commented Aug 9, 2024

melodyliu1986 commented Aug 12, 2024

rhatdan commented Aug 12, 2024

melodyliu1986 commented Aug 15, 2024