[DML EP] Enable more MHA masks #17882

PatriceVignola · 2023-10-11T03:12:56Z

Those masks are used for MHA in LLaMA.

sumitsays · 2023-10-16T15:11:38Z

The implementation lgtm. Thank you for adding this.

The contrib op schema for MultiheadAttention says that it only accepts rank <= 2 tensor for mask as mentioned here:

onnxruntime/onnxruntime/core/graph/contrib_ops/bert_defs.cc

Line 955 in f7341e8

    
           "Key padding mask with shape (batch_size) or (3 * batch_size + 2) or (batch_size, kv_sequence_length)",

At the same time, I also see that there is no validation on the rank of this tensor in MultiHeadAttentionTypeAndShapeInference. But CPU EP does throw exception if mask tensor rank > 2.

So I think we should also update the contrib ops doc and may be create a bug for the CPU EP?

PatriceVignola · 2023-10-16T16:44:09Z

The implementation lgtm. Thank you for adding this.

The contrib op schema for MultiheadAttention says that it only accepts rank <= 2 tensor for mask as mentioned here:

onnxruntime/onnxruntime/core/graph/contrib_ops/bert_defs.cc

Line 955 in f7341e8

"Key padding mask with shape (batch_size) or (3 * batch_size + 2) or (batch_size, kv_sequence_length)",

At the same time, I also see that there is no validation on the rank of this tensor in MultiHeadAttentionTypeAndShapeInference. But CPU EP does throw exception if mask tensor rank > 2.

So I think we should also update the contrib ops doc and may be create a bug for the CPU EP?

The contrib op definitions are being updated in another branch in parallel (including tests), but it will all line up for 1.16.2.

Note: I don't own the other branch (and it's a giant feature branch with hundreds of modified files), which is why I made this separate PR.

sumitsays · 2023-10-17T13:37:20Z

The implementation lgtm. Thank you for adding this.
The contrib op schema for MultiheadAttention says that it only accepts rank <= 2 tensor for mask as mentioned here:

onnxruntime/onnxruntime/core/graph/contrib_ops/bert_defs.cc

Line 955 in f7341e8

"Key padding mask with shape (batch_size) or (3 * batch_size + 2) or (batch_size, kv_sequence_length)",

At the same time, I also see that there is no validation on the rank of this tensor in MultiHeadAttentionTypeAndShapeInference. But CPU EP does throw exception if mask tensor rank > 2.
So I think we should also update the contrib ops doc and may be create a bug for the CPU EP?

The contrib op definitions are being updated in another branch in parallel (including tests), but it will all line up for 1.16.2.

Note: I don't own the other branch (and it's a giant feature branch with hundreds of modified files), which is why I made this separate PR.

Understood. Thanks again!

Those masks are used for MHA in LLaMA.

[DML EP] Enable more MHA masks

9cdd9ef

PatriceVignola requested a review from sumitsays October 11, 2023 03:12

sumitsays approved these changes Oct 17, 2023

View reviewed changes

PatriceVignola merged commit 6557538 into main Oct 18, 2023

PatriceVignola deleted the user/pavignol/allow-more-mha-masks branch October 18, 2023 00:31

jchen351 pushed a commit that referenced this pull request Oct 18, 2023

[DML EP] Enable more MHA masks (#17882)

b7d8cdc

Those masks are used for MHA in LLaMA.

PatriceVignola added a commit that referenced this pull request Oct 26, 2023

[DML EP] Enable more MHA masks (#17882)

2673f82

Those masks are used for MHA in LLaMA.

kleiti pushed a commit to kleiti/onnxruntime that referenced this pull request Mar 22, 2024

[DML EP] Enable more MHA masks (microsoft#17882)

d8d636d

Those masks are used for MHA in LLaMA.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DML EP] Enable more MHA masks #17882

[DML EP] Enable more MHA masks #17882

PatriceVignola commented Oct 11, 2023

sumitsays commented Oct 16, 2023

PatriceVignola commented Oct 16, 2023 •

edited

Loading

sumitsays commented Oct 17, 2023

[DML EP] Enable more MHA masks #17882

[DML EP] Enable more MHA masks #17882

Conversation

PatriceVignola commented Oct 11, 2023

sumitsays commented Oct 16, 2023

PatriceVignola commented Oct 16, 2023 • edited Loading

sumitsays commented Oct 17, 2023

PatriceVignola commented Oct 16, 2023 •

edited

Loading