-
Notifications
You must be signed in to change notification settings - Fork 917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix subword_tokenize error when input contains no tokens #13320
Fix subword_tokenize error when input contains no tokens #13320
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but seems like there's some leftover code? nevermind. Looks good. ctrl-f wasn't working for some reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to verify this change works with Morpheus (we have a workaround that does something similar, but maybe not the same).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The empty result is identical to the one we use as a workaround in Morpheus.
/merge |
Description
Fixes a bug where an exception is thrown when there are no tokens in the entire input column. For example, the input column is filled with strings containing only whitespace.
This special case will return token-ids of all zeroes along with an attention mask of all zeros equivalent to
input.size() * max_sequence_length
.Closes #13300
Checklist