-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the huggingface token parameter, and modify the file path in llam… #741
Add the huggingface token parameter, and modify the file path in llam… #741
Conversation
You need to sign commit |
5896f60
to
8cb2b68
Compare
|
@MichaelClifford PTAL |
Do you always need an HF_TOKEN? Can this still be used without it? If so can we make it optional in the podman run command? (I could be sadly mistaken). |
So how to do that? copy the HF_TOKEN into the image by Containerfile? |
No I am questioning whether this change forces users to always have and specify a token. I don't really know how this all works, but it seems that if an image is available without a token, this change will force users to specify a token even if one does not exist. |
8cb2b68
to
ebf7b87
Compare
…a.cpp repo. Signed-off-by: Song Liu <[email protected]>
also the file name changed to "_" in https://github.com/ggerganov/llama.cpp, so change the file name from llama.cpp/convert-hf-to-gguf.py to llama.cpp/convert_hf_to_gguf.py Signed-off-by: Song Liu <[email protected]>
From the https://github.com/ggerganov/llama.cpp/blob/master/Makefile, it said "The 'quantize' binary is deprecated. Please use 'llama-quantize' instead." The command works after my testing using llama-quantize. Signed-off-by: Song Liu <[email protected]>
When building the `driver-toolkit` image, It is cumbersome to find kernel version that matches the future `nvidia-bootc` and `intel-bootc` images. However, the kernel version is stored as a label on the `rhel-bootc` images, which are exposed as the `FROM` variable in the Makefile. This change collects the kernel version using `skopeo inspect` and `jq`. The `DRIVER_TOOLKIT_BASE_IMAGE` variable is introduced in the Makefile to dissociate it from the `FROM` variable that is used as the `nvidia-bootc` and `intel-bootc` base image. The user can now specify something like: ```shell make nvidia-bootc \ FROM=quay.io/centos-bootc/centos-bootc:stream9 \ DRIVER_TOOLKIT_BASE_IMAGE=quay.io/centos/centos:stream9 ``` Also, the `VERSION` variable in `/etc/os-release` is the full version, so this change modifies the command to retrieve the `OS_VERSION_MAJOR` value. Signed-off-by: Fabien Dupont <[email protected]> Signed-off-by: Song Liu <[email protected]>
…G is specified Signed-off-by: Matthieu Bernardin <[email protected]> Signed-off-by: Song Liu <[email protected]>
Signed-off-by: Javi Polo <[email protected]> Signed-off-by: Song Liu <[email protected]>
Signed-off-by: lstocchi <[email protected]> Signed-off-by: Song Liu <[email protected]>
Intel has released the version `1.17.0-495` of their Gaudi drivers. They are available explicitly for RHEL 9.4 with a new `9.4` folder in the RPM repository. This change updates the arguments to use the new version from the new repository folder. Signed-off-by: Fabien Dupont <[email protected]> Signed-off-by: Song Liu <[email protected]>
Signed-off-by: axel7083 <[email protected]> Signed-off-by: Song Liu <[email protected]>
Signed-off-by: axel7083 <[email protected]> Signed-off-by: Song Liu <[email protected]>
torchrun jobs create a number of children per GPU which can often exceed the 2k limit. Signed-off-by: Jason T. Greene <[email protected]> Signed-off-by: Song Liu <[email protected]>
The `nvidia-driver` package provides the firmware files for the given driver version. This change removes the copy of the firmware from the builder step and install the `nvidia-driver` package instead. This also allows a better tracability of the files in the final image. Signed-off-by: Fabien Dupont <[email protected]> Signed-off-by: Song Liu <[email protected]>
Signed-off-by: axel7083 <[email protected]> Signed-off-by: Song Liu <[email protected]>
Signed-off-by: axel7083 <[email protected]> Signed-off-by: Song Liu <[email protected]>
Signed-off-by: Maysun J Faisal <[email protected]> Signed-off-by: Song Liu <[email protected]>
…ot exist Signed-off-by: Javi Polo <[email protected]> Signed-off-by: Song Liu <[email protected]>
Signed-off-by: Javi Polo <[email protected]> Signed-off-by: Song Liu <[email protected]>
Signed-off-by: axel7083 <[email protected]> fix: missing $ Signed-off-by: axel7083 <[email protected]> Signed-off-by: Song Liu <[email protected]>
The `/dev/nvswitchctl` device is created by the NVIDIA Fabric Manager service, so it cannot be a condition for the `nvidia-fabricmanager` service. Looking at the NVIDIA driver startup script for Kubernetes, the actual check is the presence of `/proc/driver/nvidia-nvswitch/devices` and the fact that it's not empty [1]. This change modifies the condition to `ConditionDirectoryNotEmpty=/proc/driver/nvidia-nvswitch/devices`, which verifies that a certain path exists and is a non-empty directory. [1] https://gitlab.com/nvidia/container-images/driver/-/blob/main/rhel9/nvidia-driver?ref_type=heads#L262-269 Signed-off-by: Fabien Dupont <[email protected]> Signed-off-by: Song Liu <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Signed-off-by: Song Liu <[email protected]>
fa17dcc
to
f366eb3
Compare
Made mistakes when sign-off, will fork a new branch to implement the changes. |
I want to use the mistralai/Mistral-7B-Instruct-v0.2 models, and found there are no gguf files in HuggingFace, then I decided to use the ./convert_models functions to convert the model. I found there are some issues exist:
So I added the HF_TOKEN=<YOUR_HF_TOKEN_ID> parameter in the code.
Impacted files: README.md, download_huggingface.py, run.sh
If we go to https://github.com/ggerganov/llama.cpp.git, we can find the convert.py has been deprecated and moved to examples/convert_legacy_llama.py. I am not sure if I should just keep the line "python llama.cpp/convert-hf-to-gguf.py /opt/app-root/src/converter/converted_models/$hf_model_url", I just replace the convert.py with the correct path. also for llama.cpp/quantize
Impacted file: run.sh
So I added "localhost/converter" in the "podman run" command.
Here is my testing after the modification:
(the log is too long)