-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fine-tuned Qwen2 model inference error #19
Comments
Yeah, here are some ideas that I wrote for somebody else working on LLMs:
Notice the augmentation of the prompt -- this is done using python code in the tokenizer configuration. We can't run that so you may need some configuration to help with this. For example in the example repo:
Given the working python version you can do a few things:
Good luck and ask if you have questions! |
@davidkoski Thank you for your suggestion. I used my own integrated model to input the same Prompt and print out the encoded content, and indeed found differences. The original Prompt content is as follows:
The comparison results are as follows, the first line is the printout from swift, and the second line is the printout from mlx-lm:
I use the The comparison results are as follows, the first line is the printout from swift, and the second line is the printout from mlx-lm:
I am the From the data above, it can be basically determined that there was a problem with the Tokenizer during encoding. Analyzing the code reveals that the current logic replaces Qwen2Tokenizer with PreTrainedTokenizer. I suspect that there are some special treatments in Qwen2Tokenizer, and using PreTrainedTokenizer as a substitute would lead to some abnormal situations.
I am not familiar with this part and hope someone can support the complete logic in Qwen2Tokenizer. |
If you feed the python tokens in the swift model does it produce the expected output? Yeah, it may be that there is more to the Qwen tokenizer. There must be more than a hundred of them: https://github.com/huggingface/tokenizers The PreTrainedTokenizer is pretty generic and it seems to handle quite a bit but maybe not everything. |
If you look at the tokenizer.json you can see what some of those tokens are:
The tokens inside the tokenizer are split up like this:
I am not familiar with how the tokenizers work internally, but it looks like it isn't treating the markers properly though I can see the addedTokens being passed in. |
It looks like this sort of covers it: huggingface/swift-transformers#4 Not everything is built -- just enough to cover the cases they tried. Potentially one could contribute back to the project. I looked at the javascript implementation and can see how the added tokens are managed: In the
|
I pass an array into swift using mlx-lm encode and the output is as expected. |
I fine-tuned the model based on the Qwen/Qwen1.5-0.5B-Chat model and then fused the models. The final output when reasoning with the mlx-lm model is as expected (a specific URL link) and is roughly as follows:
When running
mlx-community/Qwen1.5-0.5B-Chat-4bit
from thellm-tool
command line, it worked fine. When I loaded my own fine-tuned fusion model, its inference was wrong and could not correctly predict the subsequent text generation, the effect is roughly as follows:I'm not quite sure what's wrong, is it possible to give directions for further troubleshooting and I'll make an attempt, many thanks.
The text was updated successfully, but these errors were encountered: