Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify matmul op #470

Open
huningxin opened this issue Oct 18, 2023 · 10 comments
Open

Simplify matmul op #470

huningxin opened this issue Oct 18, 2023 · 10 comments

Comments

@huningxin
Copy link
Contributor

(raised by @wacky6 Jiewei in Chromium CL review https://chromium-review.googlesource.com/c/chromium/src/+/4940628/comments/88fd209e_269f2c93)

WebNN's matmul supports 1-D input tensors, in particular for

c = builder.matmul(a, b)

If a is 1-dimensional, it is converted to a 2-dimensional tensor by prepending a 1 to its dimensions.
If b is 1-dimensional, it is converted to a 2-dimensional tensor by by appending a 1 to its dimensions.
If both a and b are 1-dimensional, the operation is a vector dot-product, which produces a scalar output.

This feature aligns with Pytorch (torch.matmul) and ONNX (MatMul).

However, 1-D input tensors are not widely supported by native ML APIs:

  • DirectML's DML_GEMM_OPERATOR_DESC supports input tensors rank from 2 to 4.
  • BNNS's BroadcastMatMul requires input tensors rank >= 2.
  • TensorFlow's BatchMatMulV2 which TensorFlow-Lite and NNAPI follows, requires input tensors x and y are 2-D or higher.

The open is whether WebNN matmul should drop the support of 1-D input tensors. This would help simplify the implementation. Frameworks can still support 1-D input tensor by reshaping those tensors to 2-D with prepending or appending 1 to its dimensions, as ONNXRuntime DirectML EP MatMulShapeMapping() does.

/cc @wchao1115 @fdwr

@huningxin huningxin changed the title 1-D input tensors for matmul op are not widely supported by native ML APIs Simplify matmul op Oct 18, 2023
@fdwr
Copy link
Collaborator

fdwr commented Oct 18, 2023

Frameworks can still support 1-D input tensor by reshaping those tensors to 2-D

@huningxin I feel the same, that the current if/else if/else reshaping logic chain is higher level framework policy which should already be resolved by the time it reaches WebNN, as the framework can trivially reshape that 1D tensor to 2D first (with ~4 lines of code, and reshapes are basically free with no memory copy). It would simplify the implementation and WPT test cases, easing conformance testing.

@huningxin
Copy link
Contributor Author

huningxin commented Oct 18, 2023

There is another point raised by @wacky6 that whether we should merge matmul and gemm ops: https://chromium-review.googlesource.com/c/chromium/src/+/4940628/comments/88fd209e_269f2c93

If matmul is simply a spacialized gemm, I'd prefer removing it and only expose gemm in API (since backend can trivially detect matmul in gemm implementation).

matmul (input tensors rank>2) supports batched matrix multiply, while today's gemm only supports input tensors rank==2.

One possible way is to extend gemm to supporting higher rank (>=2) input tensors.

@fdwr
Copy link
Collaborator

fdwr commented Oct 18, 2023

whether we should merge matmul and gemm ops

huningxin: 🤔 Do all the relevant backends support a "combined GEMM"? Yes, it's a little odd that WebNN has essentially two different matmuls, with one supporting >2D tensors and another supporting only 2D tensors but with additional parameters (including a bias, aScale, biasScale, and two transposition flags). I don't have enough info on backend capabilities to firmly opine on that, but unifying them would reduce implementation code paths some and test cost, and for a data point, DirectML has no dedicated float32 "MatMul" operator (because GEMM is a superset, and the ORT DML EP for MatMul just calls GEMM).

⚖ On the other hand, given matmul is a more fundamental operator (GEMM can be expressed in terms of MatMul plus some elementwise ops/transpose), it's smart to have the fundamental form in the WebNN operator set (like TOSA's MatMul, which AFAICS lacks a GEMM). The Guidelines for new operators advocates to "consider performance consequences" but also to "prefer primitives", and if any operator can be decomposed into simpler primitives, then I'd like to see that those constituent primitives exist in WebNN. For analogy, it would be weird for an API to only have a multiply-and-add instruction, where you can achieve multiply by supplying a dummy 0 for the add, and achieve add by supplying a dummy 1 for the multiply, yet have no dedicated add or multiply instruction.

@huningxin
Copy link
Contributor Author

huningxin: 🤔 Do all the relevant backends support a "combined GEMM"?

AFAIK, XNNPACK supports batch matrix multiply xnn_define_batch_matrix_multiply which requires the input tensors must be at least 3-D and the N-2 dimension of both input tensors should be equal. It only supports XNN_FLAG_TRANSPOSE_B as additional flag, no support for optional bias input and alpha, beta attributes. /cc @alankelly, if I missed anything.

On the other hand, given matmul is a more fundamental operator (GEMM can be expressed in terms of MatMul plus some elementwise ops), it's nice to have the fundamental form too in the WebNN operator set

I'd agree. WebNN's gemm can be emulated by matmul, transpose, add and mul as the sample code shows.

@huningxin
Copy link
Contributor Author

@Honry also shared with me that the target transformer models use matmul with higher input tensors > 2. For instance, within stable-diffusion-v1-5, text_encoder model uses matmul with 3-D input tensors, both vae_encoder and vae_decoder models use matmul with 4-D input tensors.

@inexorabletash
Copy link
Member

See https://github.com/webmachinelearning/webnn/pull/523/files#r1464619210 and https://github.com/webmachinelearning/webnn/pull/523/files#r1466932114 - the current algorithm for calculating the output size seems sketchy. Expert advice needed!

@inexorabletash
Copy link
Member

Per https://www.w3.org/2024/09/23-webmachinelearning-minutes.html#t05, @fdwr will open a new issue to capture current thoughts, link back to this issue, then close it.

@fdwr
Copy link
Collaborator

fdwr commented Nov 14, 2024

fdwr will open a new issue to capture current thoughts, link back to this issue, then close it.

🤔 If anybody recalls (I'm not seeing it in the notes, even though I was probably the person who spoke up), what was the extra work here? Was it separating out the gemm vs matmul aspect from this issue into a separate one, since the simplification aspect (removing 1D special handling) is already complete. Otherwise I'll close it (or Ningxin can).

@inexorabletash
Copy link
Member

separating out the gemm vs matmul aspect from this issue

I believe this was the point of discussion at TPAC, but it'd be good to hear from other attendees to confirm.

@anssiko
Copy link
Member

anssiko commented Nov 21, 2024

Here's what I (vaguely, I admit) recall: I initially suggested retitling the issue given both the gemm and matmul are discussed together in this issue. In a follow up we decided to open a new issue to discuss gemm specifics (support for higher rank input tensors?) and close this issue. @fdwr feel free to take an action accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants