-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Converting alpaca-native-GPTQ models into ggml models #442
Comments
I wrote a tool to add additional tokens to The token list:
would work with the script I wrote. |
@Ronsor I used your script and it looks like it did actually add the token on the tokenizer.model But now I have a new error... looks like the issue is more complex than I thought 😅
|
Looks like the |
On the
to
That results in those dimensions.
Which gives an error because we cannot concanetate those objects anymore. Here's a comparaison with the regular llama-7b-gptq model (that works well with the converter)
At this point I'm stuck, as I'm uncertain about which elements (groupings, scales, addends) to modify in order to achieve the desired concatenation |
@comex I'm not sure it was a good idea to convert your addends and scales into int32, those tensors have really small numbers and we're loosing all the informations like that: |
They're not 'really' int32s. Each int32 is actually 8 4-bit weights packed together. And they're not converted directly from float to integer; they have to be interpreted together with the addends and scales. |
maybe you are lucky with this one? |
Just tried, it fails with |
I spent some time today working on this but didn't finish. |
oobabooga merged a PR that makes the alpaca-7b-4bit-GPTQ-native works now That's funny it worked because it uses the exact same tokenizer model (the one with 32000 token) even though this model has one more. |
cool! do you see any significant improvements from GPTQ? |
PR is up; please try it and let me know if there are issues. The PR consists of a new script which is meant to replace the existing ones; run it with a command like: |
I just tried it and it works like a charm!! GPTQ quantized models will be the standard and thanks to you the CPU users can enjoy it aswell Thanks again for your really important work 😄 👍 |
@BadisG Did you notice an increase in model size after converting to ggml? The 7B one i converted went from 3.77GB to 5.39GB and inference is significantly slower, but it works. |
@BadisG Thanks for the info, at least now i know that it's not just me |
Hmm, it's probably because of the addends (aka zeros). The newer GPTQ-for-LLaMa format quantizes the addends, but llama.cpp doesn't support that, so the script dequantizes them. I didn't realize it would make that big of a difference in size; sounds like it would be useful to add native support for quantized addends to llama.cpp. But I don't know what you mean by "inference is significantly slower". Compared to what? If the comparison is to a GPU implementation then yes, llama.cpp will be slower. |
@comex Thank you for the explanation. About the slower inference, i forgot to mention that it was due to swap because i just have 8GB of ram. |
Yeah it's a bit slower when using the GPTQ: Regular RTN quantization:
GPTQ quantization:
Something like ~20% slower, that's probably expected because the RTN version has a size of 4.1 GB and the GPTQ version has a size of 5.2 GB (27% difference) |
…xtensions-4.7.1 Bump typing-extensions from 4.6.3 to 4.7.1
Expected Behavior
Hello,
I wanted to convert the alpaca-native 7b GPTQ file (pt file) into a ggml file with the convert-gptq-to-ggml.py script https://github.com/ggerganov/llama.cpp/blob/master/convert-gptq-to-ggml.py
Current Behavior
The problem is that I have this error
32000 is the tokenizer.vocab_size() (Number of tokens on the tokenizer.model)
32001 is the n_vocab (Number of tokens on the model)
The model that is trained with alpaca has 1 more token and it's this one:
"[PAD]": 32000
It looks like that if we want to convert the alpaca native GPTQ models we need to create a new tokenizer.model that has this "PAD" token in it.
The problem is that I have no idea how to do that... if someone can help me on this I'll appreciate!
The text was updated successfully, but these errors were encountered: