Prefix match in first iteration of beam search OP #10231

viboga · 2022-01-10T08:41:41Z

Description: This PR targets required changes to allow prefix matching in beam search OP.

Motivation and Context

Why is this change required? What problem does it solve?
In generative models, prefix of the last word needs to be matched with the first word being generated.
If it fixes an open issue, please link to the issue here.
NA

** Tests :**
Attached test report
report.docx

…h_op

…t/onnxruntime into tlwu/gpt_beam_search_op

…h_op

viboga · 2022-01-12T13:26:30Z

Why not add an attribute to indicate that vocab_mask is used in first step only (default is false)? In this way, the change could be small.

@tianleiwu Are you saying use the vocab_mask already present to prefix match instead of using it on every iteration?

tianleiwu · 2022-01-13T17:14:24Z

Why not add an attribute to indicate that vocab_mask is used in first step only (default is false)? In this way, the change could be small.

@tianleiwu Are you saying use the vocab_mask already present to prefix match instead of using it on every iteration?

Right. I think we only need add a new attribute (like is_prefix_vocab_mask with default value 0) so that user could choose whether using it on first iteration or every iteration.

edgchen1 · 2022-01-13T18:38:53Z

/azp run onnxruntime-binary-size-checks-ci-pipeline

azure-pipelines · 2022-01-13T18:39:03Z

Azure Pipelines successfully started running 1 pipeline(s).

viboga · 2022-01-14T07:09:18Z

@tianleiwu
Wanted to make sure I understood your suggestion. The purpose of vocab_mask and prefix_mask is mutually exclusive.

The vocab_mask you have added is to limit the search space of the suggestions. This is a generic implementation which is an option at runtime. You can change the vocab_mask to change the search space/suggestions.

Prefix_mask on first iteration however is to match the incomplete last word of the input. Indeed, it is very specific to an input string and tokenizer. Both can be used on a model as needed.

onnxruntime/contrib_ops/cpu/transformers/beam_search.cc

onnxruntime/python/tools/transformers/convert_beam_search.py

onnxruntime/contrib_ops/cpu/transformers/logits_processor.h

onnxruntime/contrib_ops/cpu/transformers/logits_processor.cc

tianleiwu · 2022-01-18T18:04:55Z

@tianleiwu Wanted to make sure I understood your suggestion. The purpose of vocab_mask and prefix_mask is mutually exclusive.

The vocab_mask you have added is to limit the search space of the suggestions. This is a generic implementation which is an option at runtime. You can change the vocab_mask to change the search space/suggestions.

Prefix_mask on first iteration however is to match the incomplete last word of the input. Indeed, it is very specific to an input string and tokenizer. Both can be used on a model as needed.

It depends on the usage. When there is need to have both prefix matching and bad word list at the same time, we shall separate them. Otherwise, I think it is better to consolidate them since prefix mask is just like vocab_mask applied only to first iteration.

…efix_matching

onnxruntime/contrib_ops/cpu/transformers/logits_processor.cc

onnxruntime/python/tools/transformers/convert_beam_search.py

…efix_matching

tianleiwu · 2022-01-31T07:18:20Z

onnxruntime/contrib_ops/cpu/transformers/logits_processor.cc

@@ -9,6 +9,8 @@ namespace onnxruntime {
 namespace contrib {
 namespace transformers {

+static int beam_search_iteration;


It is better not be stored as a global variable. (Consider that two nodes executed at the same time, that means this code is not thread safe).

Could we pass the parameter in Process function instead?

Created a backlog item for this:
https://msdata.visualstudio.com/Vienna/_sprints/backlog/ONNX%20Inference/Vienna/Nickel?workitem=1606691

tianleiwu and others added 30 commits November 4, 2021 22:00

Add BeamSearch op schema

34f396f

Add ONNX conversion for beams search

1fe45a2

remove attention_mask and change input order

e7a665c

add option to run baseline

27bf809

add check data type NULL

a6c402f

applies VerifyNodeAndOpMatch to subgraph

147763c

update input_ids shape

bd10853

Add node name for Cast node

4ba5d00

expose API for topk

e343472

parse parameters

1c2d9cd

Add beam search scorer

820a53c

Merge remote-tracking branch 'origin/master' into tlwu/gpt_beam_searc…

e1076fe

…h_op

output results

e1ae848

fix typo

1f1ee1a

use c++ template and format python

7c4fd9a

Merge branch 'tlwu/gpt_beam_search_op' of https://github.com/Microsof…

eedb1af

…t/onnxruntime into tlwu/gpt_beam_search_op

Merge remote-tracking branch 'origin/master' into tlwu/gpt_beam_searc…

e6c3c29

…h_op

fix build pipeline errors

09e2458

symbolic shape infer of input onnx

afe4b12

output scores

2396c3b

add kernel def hash

b444e70

Handle vocab_mask; move CheckSubgraph

bb032c2

undo insert_cast_transformer.cc and fusion_utils.py

fb27564

fix typo

cdb62bb

Merge remote-tracking branch 'origin/master' into tlwu/gpt_beam_searc…

f32b5d1

…h_op

fix merge

bca63fd

update doc

3e0cb7f

add repetition penalty

872408c

refactoring: add GptSubgraph class

06dc72b

move BeamSearchState from .h to .cc file

8bab52a

viboga marked this pull request as ready for review January 12, 2022 19:09