-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
if i use cache in gpt2 model from transformers , the logits are different if i do a forward pass from scratch #27040
Comments
Hi, thanks for the issue! |
@younesbelkada yes, it's the same expected behavior! @juanKersul I recommend reading the comment linked above if you'd like to understand why this difference exists :) |
@gante @younesbelkada thanks for the answer |
Mmm for |
@ArthurZucker I don't think the position IDs are a problem in the specific example above -- for batches with a single row without padding, when |
@juanKersul it is model and input-dependent, but as a rule of thumb, it is imperceptible in FP32, and quite small in 16-bit (but big enough to occasionally result in slightly different generated text) |
Ah right no padding so no problem |
@gante If I use multiple rows without using padding , do I have to do anything else with the positions ids? |
No, multiple rows without padding is also okay :) With padding, you must explicitly build the position ids (e.g. from the attention mask), otherwise you will get a performance drop |
System Info
transformers
version: 4.33.1Who can help?
@ArthurZucker
@younesbelkada
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
just run the code
Expected behavior
i expected return true
The text was updated successfully, but these errors were encountered: