-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecate legacy cache + use cache position #31491
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥 super nice! Thanks for porting this and adding the deprecation
I think in the futur we will want to use generation_config.cache_config.ma_length maybe instead of calling get_max_length() but anyways super good porting!
Co-authored-by: Arthur <[email protected]>
Yes, sounds good if we can start adopting cache config for all cache related arguments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💛
* tmp * update models * revert utils * delete * Update src/transformers/models/dbrx/modeling_dbrx.py Co-authored-by: Arthur <[email protected]> * modify warning msg --------- Co-authored-by: Arthur <[email protected]>
What does this PR do?
This PR deprecates legacy cache in all models that currently support Cache class. Also, these models now rely on cache position and update-causal-mask to get 4d attention mask. Tests are passing on my end, will activate slow tests with commit msg on PR later