Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the huggingface token parameter, and modify the file path in llam… #741

Closed

Conversation

melodyliu1986
Copy link
Contributor

I want to use the mistralai/Mistral-7B-Instruct-v0.2 models, and found there are no gguf files in HuggingFace, then I decided to use the ./convert_models functions to convert the model. I found there are some issues exist:

  1. 401 Client Error
huggingface_hub.utils._errors.GatedRepoError: 401 Client Error. (Request ID: Root=1-66b1862a-1bc229376e7f3f4020a3c951;60195d59-03d1-4f26-b3ce-d3b04c2fe2b4)
Cannot access gated repo for url https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/resolve/36d7e540e651b68dac59394d9c3381651df7fb01/.gitattributes

So I added the HF_TOKEN=<YOUR_HF_TOKEN_ID> parameter in the code.
Impacted files: README.md, download_huggingface.py, run.sh

  1. No convert.py and quantize files under llama.cpp
python: can't open file '/opt/app-root/src/converter/llama.cpp/convert.py': [Errno 2] No such file or directory
run.sh: line 23: llama.cpp/quantize: No such file or directory

If we go to https://github.com/ggerganov/llama.cpp.git, we can find the convert.py has been deprecated and moved to examples/convert_legacy_llama.py. I am not sure if I should just keep the line "python llama.cpp/convert-hf-to-gguf.py /opt/app-root/src/converter/converted_models/$hf_model_url", I just replace the convert.py with the correct path. also for llama.cpp/quantize

Impacted file: run.sh

  1. No image name was specified in the README.md

So I added "localhost/converter" in the "podman run" command.

Here is my testing after the modification:

$ podman run -it --rm -v models:/converter/converted_models -e HF_MODEL_URL=mistralai/Mistral-7B-Instruct-v0.2 -e HF_TOKEN=*** -e QUANTIZATION=Q4_K_M -e KEEP_ORIGINAL_MODEL="False" localhost/converter

README.md: 100%|███████████████████████████████████████████████████████████████████████████████████| 5.47k/5.47k [00:00<00:00, 21.9MB/s]
.gitattributes: 100%|██████████████████████████████████████████████████████████████████████████████| 1.52k/1.52k [00:00<00:00, 8.79MB/s]
model.safetensors.index.json: 100%|█████████████████████████████████████████████████████████████████| 25.1k/25.1k [00:00<00:00, 357kB/s]
config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████| 596/596 [00:00<00:00, 3.67MB/s]
generation_config.json: 100%|███████████████████████████████████████████████████████████████████████████| 111/111 [00:00<00:00, 621kB/s]
pytorch_model.bin.index.json: 100%|████████████████████████████████████████████████████████████████| 23.9k/23.9k [00:00<00:00, 72.1MB/s]
special_tokens_map.json: 100%|█████████████████████████████████████████████████████████████████████████| 414/414 [00:00<00:00, 6.70MB/s]
tokenizer.model: 100%|████████████████████████████████████████████████████████████████████████████████| 493k/493k [00:00<00:00, 861kB/s]
tokenizer_config.json: 100%|███████████████████████████████████████████████████████████████████████| 2.10k/2.10k [00:00<00:00, 12.7MB/s]
tokenizer.json: 100%|███████████████████████████████████████████████████████████████████████████████| 1.80M/1.80M [00:02<00:00, 630kB/s]
model-00001-of-00003.safetensors: 100%|████████████████████████████████████████████████████████████| 4.94G/4.94G [52:42<00:00, 1.56MB/s]
model-00003-of-00003.safetensors: 100%|██████████████████████████████████████████████████████████| 4.54G/4.54G [1:01:03<00:00, 1.24MB/s]
pytorch_model-00001-of-00003.bin: 100%|██████████████████████████████████████████████████████████| 4.94G/4.94G [1:05:53<00:00, 1.25MB/s]
pytorch_model-00002-of-00003.bin: 100%|██████████████████████████████████████████████████████████| 5.00G/5.00G [1:06:22<00:00, 1.26MB/s]
model-00002-of-00003.safetensors: 100%|██████████████████████████████████████████████████████████| 5.00G/5.00G [1:07:19<00:00, 1.24MB/s]
pytorch_model-00003-of-00003.bin: 100%|██████████████████████████████████████████████████████████| 5.06G/5.06G [1:07:36<00:00, 1.25MB/s]
Fetching 16 files: 100%|█████████
INFO:convert:Loading model file /opt/app-root/src/converter/converted_models/mistralai/Mistral-7B-Instruct-v0.2/model-00001-of-00003.safetensorsmodel-00002-of-00003.bin:  99%|██████████████████████████████████████████████████████████▍| 4.95G/5.00G [5:50:49<03:48, 222kB/s]
INFO:convert:Loading model file /opt/app-root/src/converter/converted_models/mistralai/Mistral-7B-Instruct-v0.2/model-00001-of-00003.safetensorsmodel-00002-of-00003.bin: 100%|███████████████████████████████████████████████████████████| 5.00G/5.00G [5:54:12<00:00, 229kB/s]
....
INFO:convert:Loading model file /opt/app-root/src/converter/converted_models/mistralai/Mistral-7B-Instruct-v0.2/model-00002-of-00003.safetensors
INFO:convert:Loading model file /opt/app-root/src/converter/converted_models/mistralai/Mistral-7B-Instruct-v0.2/model-00003-of-00003.safetensors
INFO:convert:params = Params(n_vocab=32000, n_embd=4096, n_layer=32, n_ctx=32768, n_ff=14336, n_head=32, n_head_kv=8, n_experts=None, n_experts_used=None, f_norm_eps=1e-05, rope_scaling_type=None, f_rope_freq_base=1000000.0, f_rope_scale=None, n_ctx_orig=None, rope_finetuned=None, ftype=None, path_model=PosixPath('/opt/app-root/src/converter/converted_models/mistralai/Mistral-7B-Instruct-v0.2'))
INFO:convert:Loaded vocab file PosixPath('/opt/app-root/src/converter/converted_models/mistralai/Mistral-7B-Instruct-v0.2/tokenizer.model'), type 'spm'
INFO:convert:model parameters count : (7241732096, 7241732096, 0) (7.2B)
INFO:convert:Vocab info: <SentencePieceVocab with 32000 base tokens and 0 added tokens>
INFO:convert:Special vocab info: <SpecialVocab with 0 merges, special tokens {'bos': 1, 'eos': 2, 'unk': 0}, add special tokens {'bos': True, 'eos': False}>
INFO:convert:Writing /opt/app-root/src/converter/converted_models/mistralai/Mistral-7B-Instruct-v0.2/Mistral-7B-Instruct-v0.2-F32.gguf, format 0
......
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:/opt/app-root/src/converter/converted_models/mistralai/Mistral-7B-Instruct-v0.2/Mistral-7B-Instruct-v0.2-F32.gguf: n_tensors = 291, total_size = 29.0G
INFO:convert:[  1/291] Writing tensor token_embd.weight                      | size  32000 x   4096  | type F32  | T+   0
INFO:convert:[  2/291] Writing tensor blk.0.attn_norm.weight                 | size   4096           | type F32  | T+   1
INFO:convert:[  3/291] Writing tensor blk.0.ffn_down.weight                  | size   4096 x  14336  | type F32  | T+   1
INFO:convert:[  4/291] Writing tensor blk.0.ffn_gate.weight                  | size  14336 x   4096  | type F32  | T+   1
....

(the log is too long)

@rhatdan
Copy link
Member

rhatdan commented Aug 8, 2024

You need to sign commit
git commit -a --amend -s
git push --force

@melodyliu1986 melodyliu1986 force-pushed the soliu-convert-model-branch branch 2 times, most recently from 5896f60 to 8cb2b68 Compare August 9, 2024 03:06
@melodyliu1986
Copy link
Contributor Author

You need to sign commit git commit -a --amend -s git push --force
Done, please check again.

@rhatdan
Copy link
Member

rhatdan commented Aug 9, 2024

@MichaelClifford PTAL

@rhatdan
Copy link
Member

rhatdan commented Aug 9, 2024

Do you always need an HF_TOKEN? Can this still be used without it? If so can we make it optional in the podman run command? (I could be sadly mistaken).

@melodyliu1986
Copy link
Contributor Author

Do you always need an HF_TOKEN? Can this still be used without it? If so can we make it optional in the podman run command? (I could be sadly mistaken).

So how to do that? copy the HF_TOKEN into the image by Containerfile?

@rhatdan
Copy link
Member

rhatdan commented Aug 12, 2024

No I am questioning whether this change forces users to always have and specify a token. I don't really know how this all works, but it seems that if an image is available without a token, this change will force users to specify a token even if one does not exist.

convert_models/run.sh Outdated Show resolved Hide resolved
convert_models/README.md Outdated Show resolved Hide resolved
Song Liu and others added 12 commits August 15, 2024 14:55
also the file name changed to "_" in https://github.com/ggerganov/llama.cpp, so change the file name from llama.cpp/convert-hf-to-gguf.py to llama.cpp/convert_hf_to_gguf.py

Signed-off-by: Song Liu <[email protected]>
From the https://github.com/ggerganov/llama.cpp/blob/master/Makefile, it said "The 'quantize' binary is deprecated. Please use 'llama-quantize' instead."

The command works after my testing using llama-quantize.

Signed-off-by: Song Liu <[email protected]>
When building the `driver-toolkit` image, It is cumbersome to find kernel
version that matches the future `nvidia-bootc` and `intel-bootc` images.
However, the kernel version is stored as a label on the `rhel-bootc`
images, which are exposed as the `FROM` variable in the Makefile.

This change collects the kernel version using `skopeo inspect` and `jq`.

The `DRIVER_TOOLKIT_BASE_IMAGE` variable is introduced in the Makefile
to dissociate it from the `FROM` variable that is used as the `nvidia-bootc`
and `intel-bootc` base image.

The user can now specify something like:

```shell
make nvidia-bootc \
    FROM=quay.io/centos-bootc/centos-bootc:stream9 \
    DRIVER_TOOLKIT_BASE_IMAGE=quay.io/centos/centos:stream9
```

Also, the `VERSION` variable in `/etc/os-release` is the full version, so
this change modifies the command to retrieve the `OS_VERSION_MAJOR`
value.

Signed-off-by: Fabien Dupont <[email protected]>
Signed-off-by: Song Liu <[email protected]>
…G is specified

Signed-off-by: Matthieu Bernardin <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Signed-off-by: lstocchi <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Intel has released the version `1.17.0-495` of their Gaudi drivers. They
are available explicitly for RHEL 9.4 with a new `9.4` folder in the RPM
repository. This change updates the arguments to use the new version
from the new repository folder.

Signed-off-by: Fabien Dupont <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Signed-off-by: axel7083 <[email protected]>
Signed-off-by: Song Liu <[email protected]>
torchrun jobs create a number of children per GPU which can
often exceed the 2k limit.

Signed-off-by: Jason T. Greene <[email protected]>
Signed-off-by: Song Liu <[email protected]>
The `nvidia-driver` package provides the firmware files for the given
driver version. This change removes the copy of the firmware from the
builder step and install the `nvidia-driver` package instead. This also
allows a better tracability of the files in the final image.

Signed-off-by: Fabien Dupont <[email protected]>
Signed-off-by: Song Liu <[email protected]>
axel7083 and others added 9 commits August 15, 2024 14:55
Signed-off-by: axel7083 <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Signed-off-by: Maysun J Faisal <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Signed-off-by: Javi Polo <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Signed-off-by: axel7083 <[email protected]>

fix: missing $

Signed-off-by: axel7083 <[email protected]>
Signed-off-by: Song Liu <[email protected]>
The `/dev/nvswitchctl` device is created by the NVIDIA Fabric Manager
service, so it cannot be a condition for the `nvidia-fabricmanager`
service.

Looking at the NVIDIA driver startup script for Kubernetes, the actual
check is the presence of `/proc/driver/nvidia-nvswitch/devices` and the
fact that it's not empty [1].

This change modifies the condition to
`ConditionDirectoryNotEmpty=/proc/driver/nvidia-nvswitch/devices`, which
verifies that a certain path exists and is a non-empty directory.

[1] https://gitlab.com/nvidia/container-images/driver/-/blob/main/rhel9/nvidia-driver?ref_type=heads#L262-269

Signed-off-by: Fabien Dupont <[email protected]>
Signed-off-by: Song Liu <[email protected]>
@melodyliu1986 melodyliu1986 force-pushed the soliu-convert-model-branch branch from fa17dcc to f366eb3 Compare August 15, 2024 06:56
@melodyliu1986
Copy link
Contributor Author

Made mistakes when sign-off, will fork a new branch to implement the changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants