-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Whisper: move to tensor cpu before converting to np array at decode time #31954
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing!
Just a question about the properties of token_ids
token_ids = token_ids.numpy() | ||
if hasattr(token_ids, "numpy"): | ||
if "torch" in str(type(token_ids)): | ||
token_ids = token_ids.cpu().numpy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following from this - will token_ids
ever have a grad? In which case, this will also fail on the cpu
call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
token_ids
, the output of generate
, will not have gradients :) generate
is decorated with @no_grad
What does this PR do?
Follow up to #27818
pytest --doctest-modules src/transformers/models/whisper/generation_whisper.py -vv
started failing onmain
due to the PR above.In a nutshell, if Whisper was running on GPU, the generated tensors would also be on GPU. The new decoding code called
token_ids.numpy()
, which failed if thetoken_ids
tensor was on GPU. This PR moves it to the CPU before the numpy conversion :)cc @sanchit-gandhi