Fix batch matching for batch mat mul #7062

Linchenn · 2022-11-18T21:24:06Z

Fix #7061 for CPU, WASM (native implementation) and WebGL backends.

This error is because of batch mismatch when A has more dimensions than B, and vice versa.

For example: A's shape is [2,4,3,3] and B's shape is [4,3,3]. Then A's batch is the first two dimensions [2, 4] while B's batch is the first dimension [4], so B's batch is supposed to be broadcasted to be [2, 4]. Then, to compute output[1][0][0][0], we have to do dot product of A[1][0] [0][...] with B[0] [...][0], but the current algorithm is doing dot product of A[1][0][0][...] with B[3][...][0].

To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.

This change is

mattsoulanille

LGTM with a nit related to modulus. Thanks!

tfjs-backend-cpu/src/kernels/BatchMatMul.ts

tfjs-backend-wasm/src/cc/batch_mat_mul_impl.cc

mattsoulanille · 2022-11-19T00:37:22Z

tfjs-backend-cpu/src/kernels/BatchMatMul.ts

+                    a3dValues[batchIndexA * aBatch + i * aOuterStep + k * aInnerStep];
                const bVal =
-                    b3dValues[k * bInnerStep + j * bOuterStep + batchOffsetB];
+                    // tslint:disable-next-line: max-line-length
+                    b3dValues[k * bInnerStep + j * bOuterStep + batchIndexB * bBatch];


I was wondering why k matched with aInnerStep but did not then match with bOuterStep, but I see now that bOuterStep and bInnerStep are opposite to aOuterStep and aInnerStep (lines 82 - 87). They don't refer to 'rows' and 'columns' of the matrices being multiplied, where you would step a's row index with b's column index for the dot product.

No action necessary.

pyu10055

Thanks, we can move the batch variable assignment up to the loop of bi, given those assignment only depends on the bi value not other loop variables. It can be a separate PR.

Reviewable status: complete! 2 of 1 approvals obtained (waiting on @Linchenn and @mattsoulanille)

Linchenn · 2022-11-21T21:23:17Z

cc @qjia7 @xhcao In case WebGPU has similar problem. This problem happens in BatchMatMul when A's batch size does not match B's batch size. For details, you could take a look at my descriptions.

qjia7 · 2022-11-23T02:40:52Z

cc @qjia7 @xhcao In case WebGPU has similar problem. This problem happens in BatchMatMul when A's batch size does not match B's batch size. For details, you could take a look at my descriptions.

Thanks Lin. Yes, webgpu has the similar issue. @xhcao will work on it. Thank you!

BUG * fix * lint * fix

fix

445c29c

Linchenn requested review from mattsoulanille and pyu10055 November 18, 2022 21:24

lint

a55c8e3

mattsoulanille approved these changes Nov 19, 2022

View reviewed changes

Linchenn and others added 2 commits November 21, 2022 10:43

fix

e6179b4

Merge branch 'master' into fixMam

11a3a26

pyu10055 approved these changes Nov 21, 2022

View reviewed changes

Linchenn merged commit e6ada3a into tensorflow:master Nov 21, 2022

Linchenn deleted the fixMam branch November 21, 2022 21:24

Linchenn added a commit to Linchenn/tfjs that referenced this pull request Jan 9, 2023

Fix batch matching for batch mat mul (tensorflow#7062)

cf948c1

BUG * fix * lint * fix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix batch matching for batch mat mul #7062

Fix batch matching for batch mat mul #7062

Linchenn commented Nov 18, 2022 •

edited by dsmilkov

Loading

mattsoulanille left a comment

mattsoulanille Nov 19, 2022

pyu10055 left a comment

Linchenn commented Nov 21, 2022

qjia7 commented Nov 23, 2022

Fix batch matching for batch mat mul #7062

Fix batch matching for batch mat mul #7062

Conversation

Linchenn commented Nov 18, 2022 • edited by dsmilkov Loading

mattsoulanille left a comment

Choose a reason for hiding this comment

mattsoulanille Nov 19, 2022

Choose a reason for hiding this comment

pyu10055 left a comment

Choose a reason for hiding this comment

Linchenn commented Nov 21, 2022

qjia7 commented Nov 23, 2022

Linchenn commented Nov 18, 2022 •

edited by dsmilkov

Loading