-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed to load 4-bits weights from HuggingFace #51
Comments
hi, try to set |
Thanks for the fast reply. I still have a similar error so. It was able to download the bin file but unable to load it I guess. Any idea what is the problem? Or any recommendation on the gptq version to use? EnvironmentPython 3.9.18 accelerate==0.24.1
aiofiles==23.2.1
aiohttp==3.8.6
aiosignal==1.3.1
altair==5.1.2
annotated-types==0.6.0
anyio==3.7.1
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
arrow==1.3.0
asttokens==2.4.1
async-lru==2.0.4
async-timeout==4.0.3
attrs==23.1.0
auto-gptq==0.5.0
Babel==2.13.1
beautifulsoup4==4.12.2
bleach==6.1.0
certifi==2023.7.22
cffi==1.16.0
charset-normalizer==3.3.2
click==8.1.7
comm==0.1.4
contourpy==1.2.0
cycler==0.12.1
datasets==2.14.6
debugpy==1.8.0
decorator==5.1.1
defusedxml==0.7.1
dill==0.3.7
einops==0.7.0
exceptiongroup==1.1.3
executing==2.0.1
fastapi==0.104.1
fastjsonschema==2.18.1
ffmpy==0.3.1
filelock==3.13.1
fonttools==4.44.0
fqdn==1.5.1
frozenlist==1.4.0
fsspec==2023.10.0
gekko==1.0.6
gradio==3.44.4
gradio_client==0.5.1
h11==0.14.0
httpcore==1.0.1
httpx==0.25.1
huggingface-hub==0.17.3
idna==3.4
importlib-metadata==6.8.0
importlib-resources==6.1.0
ipykernel==6.26.0
ipython==8.17.2
ipywidgets==8.1.1
isoduration==20.11.0
jedi==0.19.1
Jinja2==3.1.2
json5==0.9.14
jsonpointer==2.4
jsonschema==4.19.2
jsonschema-specifications==2023.7.1
jupyter==1.0.0
jupyter-console==6.6.3
jupyter-events==0.8.0
jupyter-lsp==2.2.0
jupyter_client==8.5.0
jupyter_core==5.5.0
jupyter_server==2.9.1
jupyter_server_terminals==0.4.4
jupyterlab==4.0.8
jupyterlab-pygments==0.2.2
jupyterlab-widgets==3.0.9
jupyterlab_server==2.25.0
kiwisolver==1.4.5
markdown2==2.4.10
MarkupSafe==2.1.3
matplotlib==3.8.1
matplotlib-inline==0.1.6
mistune==3.0.2
mpmath==1.3.0
multidict==6.0.4
multiprocess==0.70.15
nbclient==0.8.0
nbconvert==7.10.0
nbformat==5.9.2
nest-asyncio==1.5.8
networkx==3.2.1
notebook==7.0.6
notebook_shim==0.2.3
numpy==1.26.1
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.18.1
nvidia-nvjitlink-cu12==12.3.52
nvidia-nvtx-cu12==12.1.105
orjson==3.9.10
overrides==7.4.0
packaging==23.2
pandas==2.1.2
pandocfilters==1.5.0
parso==0.8.3
peft==0.6.0
pexpect==4.8.0
Pillow==10.1.0
platformdirs==3.11.0
polars==0.19.12
prometheus-client==0.18.0
prompt-toolkit==3.0.39
psutil==5.9.6
ptyprocess==0.7.0
pure-eval==0.2.2
pyarrow==14.0.0
pycparser==2.21
pydantic==2.4.2
pydantic_core==2.10.1
pydub==0.25.1
Pygments==2.16.1
pyparsing==3.1.1
python-dateutil==2.8.2
python-json-logger==2.0.7
python-multipart==0.0.6
pytz==2023.3.post1
PyYAML==6.0.1
pyzmq==25.1.1
qtconsole==5.5.0
QtPy==2.4.1
referencing==0.30.2
regex==2023.10.3
requests==2.31.0
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rouge==1.0.1
rpds-py==0.12.0
safetensors==0.4.0
semantic-version==2.10.0
Send2Trash==1.8.2
sentencepiece==0.1.99
six==1.16.0
sniffio==1.3.0
soupsieve==2.5
stack-data==0.6.3
starlette==0.27.0
sympy==1.12
terminado==0.17.1
timm==0.4.12
tinycss2==1.2.1
tokenizers==0.13.3
tomli==2.0.1
toolz==0.12.0
torch==2.1.0
torchvision==0.16.0
tornado==6.3.3
tqdm==4.66.1
traitlets==5.13.0
transformers==4.33.1
triton==2.1.0
types-python-dateutil==2.8.19.14
typing_extensions==4.8.0
tzdata==2023.3
uri-template==1.3.0
urllib3==2.0.7
uvicorn==0.24.0.post1
wcwidth==0.2.9
webcolors==1.13
webencodings==0.5.1
websocket-client==1.6.4
websockets==11.0.3
widgetsnbextension==4.0.9
XlsxWriter==3.1.2
xxhash==3.4.1
yarl==1.9.2
zipp==3.17.0 CodeI added the 'use_safetensors=False' import torch
from transformers import AutoModel, AutoTokenizer
import auto_gptq
from auto_gptq.modeling import BaseGPTQForCausalLM
auto_gptq.modeling._base.SUPPORTED_MODELS = ["InternLMXComposer"]
torch.set_grad_enabled(False)
class InternLMXComposerQForCausalLM(BaseGPTQForCausalLM):
layers_block_name = "internlm_model.model.layers"
outside_layer_modules = [
"query_tokens",
"flag_image_start",
"flag_image_end",
"visual_encoder",
"Qformer",
"internlm_model.model.embed_tokens",
"internlm_model.model.norm",
"internlm_proj",
"internlm_model.lm_head",
]
inside_layer_modules = [
["self_attn.k_proj", "self_attn.v_proj", "self_attn.q_proj"],
["self_attn.o_proj"],
["mlp.gate_proj"],
["mlp.up_proj"],
["mlp.down_proj"],
]
# init model and tokenizer
model = InternLMXComposerQForCausalLM.from_quantized(
"internlm/internlm-xcomposer-7b-4bit",
trust_remote_code=True,
device="cuda:0",
use_safetensors=False
)
model = model.eval()
tokenizer = AutoTokenizer.from_pretrained(
"internlm/internlm-xcomposer-7b-4bit", trust_remote_code=True
)
model.model.tokenizer = tokenizer
# example image
image = "examples/images/aiyinsitan.jpg"
# Multi-Turn Text-Image Dialogue
# 1st turn
text = 'Describe this image in detail.'
image = "./norway.png"
response, history = model.chat(text, image)
print(f"User: {text}")
print(f"Bot: {response}")
# The image features a black and white portrait of Albert Einstein, the famous physicist and mathematician.
# Einstein is seated in the center of the frame, looking directly at the camera with a serious expression on his face.
# He is dressed in a suit, which adds a touch of professionalism to his appearance. ErrorDownloading (…)_model-4bit-128g.bin: 100%|█████████████████████████████████████████████████████| 7.25G/7.25G [03:20<00:00, 36.1MB/s]
Traceback (most recent call last):
File "/mnt/bd/dev-pierre-oreistein-st/sandbox/test_internlm_vl/test_internlm_vl_4bits", line 35, in <module>
File "/home/pierre/.pyenv/versions/dev3.9/lib/python3.9/site-packages/auto_gptq/modeling/_base.py", line 847, in from_quantized
raise FileNotFoundError(f"Could not find a model in {model_name_or_path} with a name in {', '.join(searched_files)}. Please specify the argument model_basename to use a custom file name.")
FileNotFoundError: Could not find a model in internlm/internlm-xcomposer-7b-4bit with a name in gptq_model-4bit-128g.bin, gptq_model-4bit-128g.pt, model.pt. Please specify the argument model_basename to use a custom file name. |
It seems the download files are not found. Please try to download the https://huggingface.co/internlm/internlm-xcomposer-7b-4bit/tree/main to your local_path, and change the "internlm/internlm-xcomposer-7b-4bit" in the example code to your local_path. |
Description
Unable to load the quantized weights (4 bits) from HuggingFace
Code
The code is a direct copy from the file examples/example_chat_4bit_en.py
Error
Ideas
According to this similar issue I need to specify the model file. However, I was unable to find it on HuggingFace. Could you help me with this?
Thanks in advance for your help!
The text was updated successfully, but these errors were encountered: