Simplify `matmul` op #470

huningxin · 2023-10-18T08:51:53Z

(raised by @wacky6 Jiewei in Chromium CL review https://chromium-review.googlesource.com/c/chromium/src/+/4940628/comments/88fd209e_269f2c93)

WebNN's matmul supports 1-D input tensors, in particular for

c = builder.matmul(a, b)

If a is 1-dimensional, it is converted to a 2-dimensional tensor by prepending a 1 to its dimensions.
If b is 1-dimensional, it is converted to a 2-dimensional tensor by by appending a 1 to its dimensions.
If both a and b are 1-dimensional, the operation is a vector dot-product, which produces a scalar output.

This feature aligns with Pytorch (torch.matmul) and ONNX (MatMul).

However, 1-D input tensors are not widely supported by native ML APIs:

DirectML's DML_GEMM_OPERATOR_DESC supports input tensors rank from 2 to 4.
BNNS's BroadcastMatMul requires input tensors rank >= 2.
TensorFlow's BatchMatMulV2 which TensorFlow-Lite and NNAPI follows, requires input tensors x and y are 2-D or higher.

The open is whether WebNN matmul should drop the support of 1-D input tensors. This would help simplify the implementation. Frameworks can still support 1-D input tensor by reshaping those tensors to 2-D with prepending or appending 1 to its dimensions, as ONNXRuntime DirectML EP MatMulShapeMapping() does.

/cc @wchao1115 @fdwr

The text was updated successfully, but these errors were encountered:

fdwr · 2023-10-18T09:00:58Z

Frameworks can still support 1-D input tensor by reshaping those tensors to 2-D

@huningxin I feel the same, that the current if/else if/else reshaping logic chain is higher level framework policy which should already be resolved by the time it reaches WebNN, as the framework can trivially reshape that 1D tensor to 2D first (with ~4 lines of code, and reshapes are basically free with no memory copy). It would simplify the implementation and WPT test cases, easing conformance testing.

huningxin · 2023-10-18T09:05:47Z

There is another point raised by @wacky6 that whether we should merge matmul and gemm ops: https://chromium-review.googlesource.com/c/chromium/src/+/4940628/comments/88fd209e_269f2c93

If matmul is simply a spacialized gemm, I'd prefer removing it and only expose gemm in API (since backend can trivially detect matmul in gemm implementation).

matmul (input tensors rank>2) supports batched matrix multiply, while today's gemm only supports input tensors rank==2.

One possible way is to extend gemm to supporting higher rank (>=2) input tensors.

fdwr · 2023-10-18T09:33:48Z

whether we should merge matmul and gemm ops

huningxin: 🤔 Do all the relevant backends support a "combined GEMM"? Yes, it's a little odd that WebNN has essentially two different matmuls, with one supporting >2D tensors and another supporting only 2D tensors but with additional parameters (including a bias, aScale, biasScale, and two transposition flags). I don't have enough info on backend capabilities to firmly opine on that, but unifying them would reduce implementation code paths some and test cost, and for a data point, DirectML has no dedicated float32 "MatMul" operator (because GEMM is a superset, and the ORT DML EP for MatMul just calls GEMM).

⚖ On the other hand, given matmul is a more fundamental operator (GEMM can be expressed in terms of MatMul plus some elementwise ops/transpose), it's smart to have the fundamental form in the WebNN operator set (like TOSA's MatMul, which AFAICS lacks a GEMM). The Guidelines for new operators advocates to "consider performance consequences" but also to "prefer primitives", and if any operator can be decomposed into simpler primitives, then I'd like to see that those constituent primitives exist in WebNN. For analogy, it would be weird for an API to only have a multiply-and-add instruction, where you can achieve multiply by supplying a dummy 0 for the add, and achieve add by supplying a dummy 1 for the multiply, yet have no dedicated add or multiply instruction.

huningxin · 2023-10-19T02:37:27Z

huningxin: 🤔 Do all the relevant backends support a "combined GEMM"?

AFAIK, XNNPACK supports batch matrix multiply xnn_define_batch_matrix_multiply which requires the input tensors must be at least 3-D and the N-2 dimension of both input tensors should be equal. It only supports XNN_FLAG_TRANSPOSE_B as additional flag, no support for optional bias input and alpha, beta attributes. /cc @alankelly, if I missed anything.

On the other hand, given matmul is a more fundamental operator (GEMM can be expressed in terms of MatMul plus some elementwise ops), it's nice to have the fundamental form too in the WebNN operator set

I'd agree. WebNN's gemm can be emulated by matmul, transpose, add and mul as the sample code shows.

huningxin · 2023-10-19T03:03:53Z

@Honry also shared with me that the target transformer models use matmul with higher input tensors > 2. For instance, within stable-diffusion-v1-5, text_encoder model uses matmul with 3-D input tensors, both vae_encoder and vae_decoder models use matmul with 4-D input tensors.

inexorabletash · 2024-01-25T20:35:28Z

See https://github.com/webmachinelearning/webnn/pull/523/files#r1464619210 and https://github.com/webmachinelearning/webnn/pull/523/files#r1466932114 - the current algorithm for calculating the output size seems sketchy. Expert advice needed!

inexorabletash · 2024-09-25T22:53:29Z

Per https://www.w3.org/2024/09/23-webmachinelearning-minutes.html#t05, @fdwr will open a new issue to capture current thoughts, link back to this issue, then close it.

fdwr · 2024-11-14T03:14:07Z

fdwr will open a new issue to capture current thoughts, link back to this issue, then close it.

🤔 If anybody recalls (I'm not seeing it in the notes, even though I was probably the person who spoke up), what was the extra work here? Was it separating out the gemm vs matmul aspect from this issue into a separate one, since the simplification aspect (removing 1D special handling) is already complete. Otherwise I'll close it (or Ningxin can).

inexorabletash · 2024-11-14T15:19:50Z

separating out the gemm vs matmul aspect from this issue

I believe this was the point of discussion at TPAC, but it'd be good to hear from other attendees to confirm.

anssiko · 2024-11-21T11:38:59Z

Here's what I (vaguely, I admit) recall: I initially suggested retitling the issue given both the gemm and matmul are discussed together in this issue. In a follow up we decided to open a new issue to discuss gemm specifics (support for higher rank input tensors?) and close this issue. @fdwr feel free to take an action accordingly.

huningxin changed the title ~~1-D input tensors for matmul op are not widely supported by native ML APIs~~ Simplify matmul op Oct 18, 2023

fdwr mentioned this issue Nov 2, 2023

Define the algorithm of calculating the effective padding for "same-upper" and "same-lower" option #326

Closed

BruceDai mentioned this issue Nov 16, 2023

Update WebNN matmul tests with input.rank >= 2 web-platform-tests/wpt#43193

Open

inexorabletash mentioned this issue Jan 25, 2024

Wording change: Tighten up output shape calculation algorithms #523

Merged

inexorabletash mentioned this issue Feb 6, 2024

Process: Add documentation for labels, current and proposed #533

Merged

3 tasks

anssiko added the operator specific label Feb 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify `matmul` op #470

Simplify `matmul` op #470

huningxin commented Oct 18, 2023

fdwr commented Oct 18, 2023 •

edited

Loading

huningxin commented Oct 18, 2023 •

edited

Loading

fdwr commented Oct 18, 2023 •

edited

Loading

huningxin commented Oct 19, 2023

huningxin commented Oct 19, 2023

inexorabletash commented Jan 25, 2024

inexorabletash commented Sep 25, 2024

fdwr commented Nov 14, 2024 •

edited

Loading

inexorabletash commented Nov 14, 2024

anssiko commented Nov 21, 2024

Simplify matmul op #470

Simplify matmul op #470

Comments

huningxin commented Oct 18, 2023

fdwr commented Oct 18, 2023 • edited Loading

huningxin commented Oct 18, 2023 • edited Loading

fdwr commented Oct 18, 2023 • edited Loading

huningxin commented Oct 19, 2023

huningxin commented Oct 19, 2023

inexorabletash commented Jan 25, 2024

inexorabletash commented Sep 25, 2024

fdwr commented Nov 14, 2024 • edited Loading

inexorabletash commented Nov 14, 2024

anssiko commented Nov 21, 2024

Simplify `matmul` op #470

Simplify `matmul` op #470

fdwr commented Oct 18, 2023 •

edited

Loading

huningxin commented Oct 18, 2023 •

edited

Loading

fdwr commented Oct 18, 2023 •

edited

Loading

fdwr commented Nov 14, 2024 •

edited

Loading