-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prefix match in first iteration of beam search OP #10231
Conversation
…t/onnxruntime into tlwu/gpt_beam_search_op
@tianleiwu Are you saying use the vocab_mask already present to prefix match instead of using it on every iteration? |
Right. I think we only need add a new attribute (like is_prefix_vocab_mask with default value 0) so that user could choose whether using it on first iteration or every iteration. |
/azp run onnxruntime-binary-size-checks-ci-pipeline |
Azure Pipelines successfully started running 1 pipeline(s). |
@tianleiwu The vocab_mask you have added is to limit the search space of the suggestions. This is a generic implementation which is an option at runtime. You can change the vocab_mask to change the search space/suggestions. Prefix_mask on first iteration however is to match the incomplete last word of the input. Indeed, it is very specific to an input string and tokenizer. Both can be used on a model as needed. |
It depends on the usage. When there is need to have both prefix matching and bad word list at the same time, we shall separate them. Otherwise, I think it is better to consolidate them since prefix mask is just like vocab_mask applied only to first iteration. |
@@ -9,6 +9,8 @@ namespace onnxruntime { | |||
namespace contrib { | |||
namespace transformers { | |||
|
|||
static int beam_search_iteration; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is better not be stored as a global variable. (Consider that two nodes executed at the same time, that means this code is not thread safe).
Could we pass the parameter in Process function instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created a backlog item for this:
https://msdata.visualstudio.com/Vienna/_sprints/backlog/ONNX%20Inference/Vienna/Nickel?workitem=1606691
Description: This PR targets required changes to allow prefix matching in beam search OP.
Motivation and Context
Why is this change required? What problem does it solve?
In generative models, prefix of the last word needs to be matched with the first word being generated.
If it fixes an open issue, please link to the issue here.
NA
** Tests :**
Attached test report
report.docx