-
Notifications
You must be signed in to change notification settings - Fork 494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] FileNotFoundError: Could not find model in TheBloke/WizardLM-*-uncensored-GPTQ #133
Comments
Ahh... it's Not a Bug My Friend. It'll Be Solved by Now. Feel free to close this Issue after Solving. |
Yeah you need This can be specified with eg:
I was going to extend This code will work: from transformers import AutoTokenizer, TextGenerationPipeline
from auto_gptq import AutoGPTQForCausalLM
MODEL = "TheBloke/WizardLM-7B-uncensored-GPTQ"
model_basename ="WizardLM-7B-uncensored-GPTQ-4bit-128g.compat.no-act-order"
import logging
logging.basicConfig(
format="%(asctime)s %(levelname)s [%(name)s] %(message)s", level=logging.INFO, datefmt="%Y-%m-%d %H:%M:%S"
)
device = "cuda:0"
tokenizer = AutoTokenizer.from_pretrained(MODEL, use_fast=True)
# download quantized model from Hugging Face Hub and load to the first GPU
model = AutoGPTQForCausalLM.from_quantized(MODEL,
model_basename=model_basename,
device=device,
use_safetensors=True,
use_triton=False)
# inference with model.generate
prompt = "Tell me about AI"
prompt_template=f'''### Human: {prompt}
### Assistant:'''
input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=256, min_new_tokens=100)
print(tokenizer.decode(output[0])) Output:
However You can't use AutoGPTQ with device='mps'. Only NVidia CUDA GPUs are supported. It may work to run on CPU only, but it will be very very slow. |
Thank you Tom. You're doing a huge favour to the community by providing all these quantizied models. And thank you guys, that was a stupid mistake of mine missing the parameter. I should have looked at this comment from the getgo: #91 (comment) |
Not to re-open a closed issue, as it doesn't seem this is an issue/bug. But I'm getting the same error: and here is how I have it defined in my script: I'm assuming the basename is wrong? But I can't identify what the correct basename might be.. |
remove |
I want to finetune a gptq model with lora, the related code as following:
) AND I ALWAYS GOT THE ERROR: Could not find model in TheBloke/StableBeluga2-70B-GPTQ IF I CHANCED the model_name_or_path and model_basename to point to other models' ,it will work normally. AND IT WILL WORK NORMALLY IF I used the model to infer like the following codes:
|
Do not pass For Transformers support, all models were renamed to But in fact you don't need to pass it at all; the correct value is now stored in So just remove |
@TheBloke THANKS FOR HELP |
I CHANGED THE CODE,BUT IT does not make any difference: |
@TheBloke THANKS |
I can install auto-gptq==0.4.2 normally, BUT I CANNOT install the auto-gptq from source. Is that related to the problem? |
from transformers import AutoTokenizer
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
model_name_or_path = "TheBloke/StableBeluga2-70B-GPTQ"
branch = "main"
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
use_safetensors=True,
revision=branch,
trust_remote_code=False,
device="cuda:0",
quantize_config=None) Change |
BY FOLLOW THIS 👍 |
model_name_or_path = "TheBloke/StableBeluga2-70B-GPTQ" |
model_name_or_path = "TheBloke/StableBeluga2-70B-GPTQ" peft_config = GPTQLoraConfig( tokenizer = AutoTokenizer.from_pretrained(tokenizer_name_or_path, quantize_config = BaseQuantizeConfig.from_pretrained(model_name_or_path)
) data = load_dataset("Abirate/english_quotes")data = load_dataset('/home/ubuntu/qlora/XXXXX/output-direct-input-output-format.jsonl') tokenizer.pad_token = tokenizer.eos_token |
Can you help me ? THANKS |
You'll need to add This code works fine to load the model and run inference on it - I just tested it: from transformers import AutoTokenizer
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
model_name_or_path = "TheBloke/StableBeluga2-70B-GPTQ"
branch = "main"
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
use_safetensors=True,
revision=branch,
inject_fused_attention=False,
trust_remote_code=False,
device="cuda:0",
quantize_config=None)
prompt = "Tell me about AI"
prompt_template=f'''### System:
You are a helpful assistant
### User:
{prompt}
### Assistant:
'''
print("\n\n*** Generate:")
input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, do_sample=True, temperature=0.7, max_new_tokens=128)
print(tokenizer.decode(output[0])) Output:
I have no experience of GPTQ training, so can't help with that. If you want to train a model, you could also try doing it through Transformers, rather than through AutoGPTQ. Here is a Colab notebook showing all the Transformers GPTQ methods, including PeFT training: https://colab.research.google.com/drive/1_TIrmuKOFhuRRiTWN94iLKUFu6ZX4ceb#scrollTo=td0bmYW_i_PB |
I can infer normally with your code ,BUT I CAN not FIGURE OUT WHY I CAN NOT FINETUNE .THANKS ! |
But I still fail on Auto-GPTQ 0.4.2...would you mind looking into this mzbac/AutoGPTQ-API#6? I am so confusing. |
@bonuschild This is fixed on main (if you build from source) and will be included in the next release. |
Such a good news! I will have a try and waiting for the new release. Do we have schedule on when to release next version? |
Tomorrow :D |
👍 awesome :) |
how did you solve this? |
Can you please help here. I have similar error when i use TheBloke/WizardLM-30B-Uncensored-GPTQ. I don't see error when I Use 7B model GGUF. This is from contant.py file where I select model: MODEL_ID = "TheBloke/WizardLM-30B-Uncensored-GPTQ" 2024-02-09 22:59:28,557 - INFO - run_localGPT.py:244 - Running on: cuda |
Describe the bug
Unable to load model directly from the repository using the example in README.md:
https://github.com/PanQiWei/AutoGPTQ/blob/810ed4de66e14035cafa938968633c23d57a0d79/README.md?plain=1#L166
Software version
Operating System: MacOS 13.3.1
CUDA Toolkit: None
Python: Python 3.10.11
AutoGPTQ: 0.2.1
PyTorch: 2.1.0.dev20230520
Transformers: 4.30.0.dev0
Accelerate: 0.20.0.dev0
To Reproduce
Running this script causes the error:
Expected behavior
I expect it to be downloaded from Hugging Face and run like specified in README.
Screenshots
If applicable, add screenshots to help explain your problem.
Error:
Additional context
I've also tried providing
model_name_or_path
as noted in #91But then I get the following:
Perhaps @TheBloke you could chime in :)
The text was updated successfully, but these errors were encountered: