Make MultiHeadAttention use masks from query and value tensors #7951

mattsoulanille · 2023-09-12T06:17:25Z

Add optional Keras masks to tfjs tensors. Enable them for tfjs-layers on layers that emit them. Use the masks of query and value input tensors in MultiHeadAttention to compute the correct mask automatically.

To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.

pyu10055 · 2023-09-12T16:38:21Z

tfjs-layers/src/engine/topology.ts

+            for (let i = 0; i < output.length; i++) {
+              output[i].kerasMask = outputMask[i];
+            }
+          } else if (outputMask instanceof Array) {


what if the array only contains one mask?

I think that should be an error. If there's only one mask for all the tensors, it should be returned as a tensor instead of a [tensor].

Actually, keras does not seem to broadcast masks at all. Each output tensor needs its own mask:
https://github.com/keras-team/keras/blob/master/keras/engine/base_layer.py#L2893-L2898

pyu10055 · 2023-09-12T16:43:10Z

tfjs-layers/src/layers/nlp/multihead_attention.ts

      if (useCausalMask) {
        // the shape of the causal mask is [1, T, S]
        const mask = this.computeCausalMask(query, value);
-        autoMask = mask;
+        autoMask = autoMask ? logicalAnd(autoMask, mask) : mask;


how is this associated with the Topology computeMask logic?

Earlier layers (Embedding) have computeMask called to compute the mask for their output tensors. This layer uses those masks.

mattsoulanille · 2023-09-12T22:27:32Z

tfjs-layers/src/engine/topology.ts

+type MaybeSymbolic = SymbolicTensor | Tensor;
+
+function checkAllSymbolic(tensors: MaybeSymbolic | MaybeSymbolic[]
+                         ): tensors is SymbolicTensor | SymbolicTensor[] {
+  let allAreSymbolic = true;
+  for (const tensor of generic_utils.toList(tensors)) {
+    if (!(tensor instanceof SymbolicTensor)) {
+      allAreSymbolic = false;
+      break;
+    }
+  }
+  return allAreSymbolic;
+}
+
+function checkNoneSymbolic(tensors: MaybeSymbolic | MaybeSymbolic[]
+                          ): tensors is Tensor | Tensor[] {
+  let noneAreSymbolic = true;
+  for (const tensor of generic_utils.toList(tensors)) {
+    if (tensor instanceof SymbolicTensor) {
+      noneAreSymbolic = false;
+      break;
+    }
+  }
+  return noneAreSymbolic;
+}


I moved this here so I could write them as type guards (tensors is Tensor...)

mattsoulanille · 2023-09-12T22:27:45Z

tfjs-layers/src/layers/nlp/multihead_attention_test.ts

@@ -188,101 +188,6 @@ describe('MultiHeadAttention', () => {
    expectTensorsNotClose(queryKernel, outputKernel, 1e-6);
  });

-  describeMathCPU('High Dimensional Attention', () => {


mattsoulanille · 2023-09-12T22:28:14Z

tfjs-layers/src/layers/nlp/multihead_attention_test.ts

    /**
     * Test that the value and causal masks are taken into account.
     */
-    function testValueMask(testcaseName: string, useCausalMask: boolean) {


Refactored to explicitly declare both tests instead of using a loop to declare them.

mattsoulanille · 2023-09-12T22:28:28Z

tfjs-layers/src/layers/nlp/multihead_attention_test.ts

@@ -482,6 +395,101 @@ describe('MultiHeadAttention', () => {
  // TODO(pforderique): Test serialization.
 });

+describeMathCPU('High Dimensional Attention', () => {


Moved some tests from above to here.

mattsoulanille added 3 commits September 11, 2023 23:14

Use describeWithFlags

7cddfa6

Separate tests out of for loop

0132ef5

Make MHA use masks from query and value tensors

1b89d25

mattsoulanille requested review from pyu10055 and Linchenn September 12, 2023 06:17

Fix lint

bdf6ed1

pyu10055 reviewed Sep 12, 2023

View reviewed changes

Refactor mask computation into a separate function

6822278

mattsoulanille force-pushed the fix_mha branch from 111048c to 6822278 Compare September 12, 2023 21:58

Merge branch 'master' into fix_mha

94766d0

mattsoulanille commented Sep 12, 2023

View reviewed changes

mattsoulanille requested review from fengwuyao and removed request for Linchenn September 12, 2023 22:29

fengwuyao approved these changes Sep 12, 2023

View reviewed changes

pyu10055 approved these changes Sep 12, 2023

View reviewed changes

pyu10055 merged commit 8879e72 into tensorflow:master Sep 12, 2023
2 checks passed

AmitMY mentioned this pull request Sep 24, 2023

Breaking change: #7951 (tfjs-v4.11.0) #7974

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make MultiHeadAttention use masks from query and value tensors #7951

Make MultiHeadAttention use masks from query and value tensors #7951

mattsoulanille commented Sep 12, 2023

pyu10055 Sep 12, 2023

mattsoulanille Sep 12, 2023

mattsoulanille Sep 12, 2023

pyu10055 Sep 12, 2023

mattsoulanille Sep 12, 2023

mattsoulanille Sep 12, 2023

mattsoulanille Sep 12, 2023

mattsoulanille Sep 12, 2023

mattsoulanille Sep 12, 2023

Make MultiHeadAttention use masks from query and value tensors #7951

Make MultiHeadAttention use masks from query and value tensors #7951

Conversation

mattsoulanille commented Sep 12, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment