-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify matmul
op
#470
Comments
matmul
op are not widely supported by native ML APIsmatmul
op
@huningxin I feel the same, that the current if/else if/else reshaping logic chain is higher level framework policy which should already be resolved by the time it reaches WebNN, as the framework can trivially reshape that 1D tensor to 2D first (with ~4 lines of code, and reshapes are basically free with no memory copy). It would simplify the implementation and WPT test cases, easing conformance testing. |
There is another point raised by @wacky6 that whether we should merge
One possible way is to extend |
huningxin: 🤔 Do all the relevant backends support a "combined GEMM"? Yes, it's a little odd that WebNN has essentially two different matmuls, with one supporting >2D tensors and another supporting only 2D tensors but with additional parameters (including a bias, aScale, biasScale, and two transposition flags). I don't have enough info on backend capabilities to firmly opine on that, but unifying them would reduce implementation code paths some and test cost, and for a data point, DirectML has no dedicated float32 "MatMul" operator (because GEMM is a superset, and the ORT DML EP for MatMul just calls GEMM). ⚖ On the other hand, given matmul is a more fundamental operator (GEMM can be expressed in terms of MatMul plus some elementwise ops/transpose), it's smart to have the fundamental form in the WebNN operator set (like TOSA's MatMul, which AFAICS lacks a GEMM). The Guidelines for new operators advocates to "consider performance consequences" but also to "prefer primitives", and if any operator can be decomposed into simpler primitives, then I'd like to see that those constituent primitives exist in WebNN. For analogy, it would be weird for an API to only have a multiply-and-add instruction, where you can achieve multiply by supplying a dummy 0 for the add, and achieve add by supplying a dummy 1 for the multiply, yet have no dedicated add or multiply instruction. |
AFAIK, XNNPACK supports batch matrix multiply
I'd agree. WebNN's |
@Honry also shared with me that the target transformer models use |
See https://github.com/webmachinelearning/webnn/pull/523/files#r1464619210 and https://github.com/webmachinelearning/webnn/pull/523/files#r1466932114 - the current algorithm for calculating the output size seems sketchy. Expert advice needed! |
Per https://www.w3.org/2024/09/23-webmachinelearning-minutes.html#t05, @fdwr will open a new issue to capture current thoughts, link back to this issue, then close it. |
🤔 If anybody recalls (I'm not seeing it in the notes, even though I was probably the person who spoke up), what was the extra work here? Was it separating out the gemm vs matmul aspect from this issue into a separate one, since the simplification aspect (removing 1D special handling) is already complete. Otherwise I'll close it (or Ningxin can). |
I believe this was the point of discussion at TPAC, but it'd be good to hear from other attendees to confirm. |
Here's what I (vaguely, I admit) recall: I initially suggested retitling the issue given both the gemm and matmul are discussed together in this issue. In a follow up we decided to open a new issue to discuss gemm specifics (support for higher rank input tensors?) and close this issue. @fdwr feel free to take an action accordingly. |
(raised by @wacky6 Jiewei in Chromium CL review https://chromium-review.googlesource.com/c/chromium/src/+/4940628/comments/88fd209e_269f2c93)
WebNN's matmul supports 1-D input tensors, in particular for
This feature aligns with Pytorch (
torch.matmul
) and ONNX (MatMul
).However, 1-D input tensors are not widely supported by native ML APIs:
DML_GEMM_OPERATOR_DESC
supports input tensors rank from 2 to 4.BroadcastMatMul
requires input tensors rank >= 2.BatchMatMulV2
which TensorFlow-Lite and NNAPI follows, requires input tensors x and y are 2-D or higher.The open is whether WebNN matmul should drop the support of 1-D input tensors. This would help simplify the implementation. Frameworks can still support 1-D input tensor by reshaping those tensors to 2-D with prepending or appending 1 to its dimensions, as ONNXRuntime DirectML EP
MatMulShapeMapping()
does./cc @wchao1115 @fdwr
The text was updated successfully, but these errors were encountered: