-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Architecture improvements #65
Architecture improvements #65
Conversation
Pef/flash sdpa attention
…ing arguments for attn implementation
Any ETA on this to be completed (at least partially) and committed? |
Let's merge it ASAP, would you like to make a quick review first? It's been quite thoroughly "tested" since I've trained new checkpoints with the current architecture and the new one, as well as testing generation when evaluating the model Feel free to merge when you read the message if you don't make the review! |
@ylacombe is there any update on this? |
@sang-nguyen-ts it's just been merged ! |
Supersedes #50 and #55: