Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot load exllamav2 models #44

Closed
MrMojoR opened this issue Mar 11, 2024 · 8 comments
Closed

Cannot load exllamav2 models #44

MrMojoR opened this issue Mar 11, 2024 · 8 comments
Labels
bug Something isn't working

Comments

@MrMojoR
Copy link

MrMojoR commented Mar 11, 2024

This happens for the two consequent nightly versions, and I have also built an image from the 2024-03-10 snapshot version:
https://github.com/oobabooga/text-generation-webui/releases/tag/snapshot-2024-03-10 . The issue happens both of them.
This is the base-nvidia version.
When I try to load an exllamav2 modell, I receive this error message:

File "/app/modules/ui_model_menu.py", line 245, in load_model_wrapper shared.model, shared.tokenizer = load_model(selected_model, loader) File "/app/modules/models.py", line 87, in load_model output = load_func_map[loader](model_name) File "/app/modules/models.py", line 378, in ExLlamav2_HF_loader from modules.exllamav2_hf import Exllamav2HF File "/app/modules/exllamav2_hf.py", line 7, in from exllamav2 import ( File "/venv/lib/python3.10/site-packages/exllamav2/init.py", line 3, in from exllamav2.model import ExLlamaV2 File "/venv/lib/python3.10/site-packages/exllamav2/model.py", line 23, in from exllamav2.config import ExLlamaV2Config File "/venv/lib/python3.10/site-packages/exllamav2/config.py", line 2, in from exllamav2.fasttensors import STFile File "/venv/lib/python3.10/site-packages/exllamav2/fasttensors.py", line 5, in from exllamav2.ext import exllamav2_ext as ext_c File "/venv/lib/python3.10/site-packages/exllamav2/ext.py", line 15, in import exllamav2_ext ImportError: /venv/lib/python3.10/site-packages/exllamav2_ext.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c107WarningC1ESt7variantIJNS0_11UserWarningENS0_18DeprecationWarningEEERKNS_14SourceLocationESsb

I built an image from the official repo as well, and that worked flowlessly.
I think the issue could be this step from the official repository:
conda install -y -c "nvidia/label/cuda-12.1.1" cuda-runtime

I couldn't find this step in the Dockerfile here.
Thanks for the help!

@Atinoda
Copy link
Owner

Atinoda commented Mar 11, 2024

Thanks for reporting and I appreciate your building the official repo to verify! I had a quick look and can replicate the issue.

I guess that it may be a problem with the wheels for exllamav2.. will look into it further and see about building it from source in the image. The following commit may be the root of the issue: oobabooga/text-generation-webui@bde7f00.

@Atinoda Atinoda added the bug Something isn't working label Mar 11, 2024
@MrMojoR
Copy link
Author

MrMojoR commented Mar 11, 2024

I don't think, that that is the issue, HQQ loader did not work for me either. This was not working for some time, but I thought the original repo is faulty. Now I really wanted to upgrade to try out the exllamav2 0.15, which has some great memory management improvements.

@Atinoda
Copy link
Owner

Atinoda commented Mar 11, 2024

I will have to see when I have time to debug it properly. I do not think it is 'missing' the CUDA runtime - the step you suggested refers to setting up a conda environment, and this image uses venv. Have you successfully used the HQQ loader in the official image? If so, could you please point me to the model and settings you used? I will check that out as well, when I'm looking at the exllamav2 issue in more detail.

@MrMojoR
Copy link
Author

MrMojoR commented Mar 11, 2024

Yes, I have used one successfully, this was the model: https://huggingface.co/mobiuslabsgmbh/Mixtral-8x7B-Instruct-v0.1-hf-attn-4bit-moe-2bit-HQQ
There is only one setting, I used the pytorch backend.

@Atinoda
Copy link
Owner

Atinoda commented Mar 11, 2024

Thanks very much - tried it out and got an error about flash attention: /venv/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops15sum_IntList_out4callERKNS_6TensorEN3c1016OptionalArrayRefIlEEbSt8optionalINS5_10ScalarTypeEERS2_

@MrMojoR
Copy link
Author

MrMojoR commented Mar 11, 2024

This is again some C library error, I still suspect that somehow we miss that cuda runtime.

@Atinoda
Copy link
Owner

Atinoda commented Mar 11, 2024

Thank you for the heads up - it was a library version mismatch, and thankfully a simple fix! New stable images are building and will be up in about an hour.

@MrMojoR
Copy link
Author

MrMojoR commented Mar 11, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants