-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TFOPTForCausalLM Attention mask size mismatch exception #24637
Comments
Yep, something is clearly being mangled in here. The
That's a lot of work, though - if you can wait, I'll get around to it in a few days! |
Unfortunately, I didn't manage to finish this before a holiday due to some more Falcon chaos - cc @gante if you get a chance, and if not I can take it when I get back! I identified the core problem as some confusion in the code about what the actual However, fixing this led to other problems - the expanded/combined attention mask code also gets a bit confused when This attention mask expansion code has been copied all around the codebase - I encountered in in PyTorch Falcon and BLOOM recently, where it also caused some problems. This might be worth doing a repo-wide refactor at some point, as I think the code is unclear and the variable names can be confusing, probably because it started as encoder-decoder code and is now being used to manage attention over past key-values. |
Unrelated to this issue but for tflite export I end up having to do something hacky anyway to pass a custom past_key_values_length value, since the shape is dynamic and code cannot depend on it during tflite export ( |
Hi @abb128, good point! That might be a sign that we should be using |
@abb128 I've filed a patch - please try it and let me know if it works for you! |
System Info
transformers
version: 4.30.2Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I'm trying to write my own decoding logic so I can export to TFLite (the app runs decoding logic itself, calling into the tflite model with past_key_values and input_ids but the code for that is a little more involved)
I'm not sure if I'm missing something important here but I was able to successfully export Whisper before with this sort of pattern
I've reduced the problem to this example:
Colab Link
Output
Expected behavior
I expect it to work like it does with GPT2
The text was updated successfully, but these errors were encountered: