-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Code clean-ups #171
Code clean-ups #171
Conversation
Main difference in perf comes from the use of the context manager: For the rest, some code clean-ups. Before this PR, EuroLLM-9B finetuning the estimator:
This PR:
Before this PR, Encoder-Decoder training:
This PR:
|
…rue=yes we attend)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice clean-up for the new year!
Quite a few comments not necessarily all relevant.
Also, do we know where the main diffs in the test outputs come from? Is it just the attention backend related changes? Or maybe a few numerical differences ensuing from the modified operations here and there?
No description provided.