-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove .data usages in optimizations.py #23417
Remove .data usages in optimizations.py #23417
Conversation
@muellerzr the usage of the |
The documentation is not available anymore as the PR was closed or merged. |
This is a very old and deprecated implementation since it doesn't even follow the AdamW algorithm exactly. One should use The only reason it was kept is for BC for those who rely on exact results remaining exact after new p.s. no objections though to making it better... |
@stas00 Thanks for the reply. How about the adafactor then? |
oh, sorry, I didn't see it was Adafactor too. It's hard to see from the diff as it doesn't show the class names. This Adafactor is being used for sure, but its implementation is super old as well. So certainly it'd be a blessing to bring it up to more modern code standard. |
@stas00 Do you mind give this pr a review? Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think once .data
is removed the copy_
becomes somewhat difficult to understand in Adafactor's part of the diff. An explicit downcast would be more readable at the very end. But since p
has to remain the same pointer I couldn't quite think of a better way.
Otherwise looks good. Thank you for modernizing, @alanwaketan
Let me just invite @sgugger to have a quick look before we merge.
Thanks for your PR. Just to be sure though, is this all going to work with PyTorch 1.8+? 1.8 is the minimum version we offically support at the moment (for a couple more weeks at least, then 1.9 starting mid-June). |
I'm almost 100% sure it is the case. the whole direct Let me quickly test it with pt-1.8 |
At least the Adafactor test that we have is passing. |
Patched the optimizers
Patched the optimizers
Patched the optimizers
What does this PR do?
.data usages is deprecated in recent releases of PyTorch. See pytorch/pytorch#91093 (comment)
This change replace all .data usages in optimizations.py with modern alternatives.
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@connor-henderson @stas00