-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generated results are different between generating with padding and single batch, with QWEN #29936
Comments
The problem was in stopping criteria as shown in one issue. I changed my stopping criteria, returning True only if all input_ids have the stop_token_id. |
cc @gante and @zucchini-nlp for generation! |
@GennVa hey! If I understand correctly, you are trying to implement a stopping criteria for Second option is if your In version from Let me know if this helps :) |
@zucchini-nlp thanks for your answer. Is there a better method? (Can the EosTokenCriteria be used for this?) |
Hey @GennVa! I've checked Qwen model config, and seems that the workaround you were considering might not be needed. The token "151645" is already included as an "end-of-sequence" marker in config. That means when you use the outputs = model.generate(**inputs, max_new_tokens=2048) As for the other option I mentioned earlier, it's more suited for cases where you need custom stopping conditions for each sequence. But for this case, using simply |
Hey @zucchini-nlp thanks for all. And in generation:
It's now working adding also |
@GennVa Can you show what error you are getting for "pad token" and share a runnable minimal script? |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Hi, this issue is probably a duplicate but I'm using a Qwen model.
When I do single inference, I have no problem. The outputs end correctly with the ID token 151645 (eos token).
When I use inference batching, it seems that other outputs beyond the first of the batch are "truncated", i.e. all tokens are similar to the single inference, but a final part is cut.
That's my script. Max_new_tokens is high for my outputs ( I also changed it with a bigger number).
The stopping_criteria it's for the 151645 token.
I'm using Transformers 4.38.2 .
That's the second output of a batch of 2. 151643 is the pad token.
In the single inference (with the absence of the pad token) there are others that complete the output.
[..., 497, 330, 2870, 788, 61753, 5212, 13473, ..., 13989, 151645]
How could I solve it?
Thanks
The text was updated successfully, but these errors were encountered: