-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WebNN EP] Decompose Concat with input number > 4 for CPU backend #18930
Conversation
WebNN XNNPack backend only supports the concat with inputs number <= 4, decomposing the Concat with inputs number > 4 into multiple WebNN concat ops.
/azp run ONNX Runtime Web CI Pipeline |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline |
/azp run Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,Windows x64 QNN CI Pipeline |
Azure Pipelines successfully started running 9 pipeline(s). |
Azure Pipelines successfully started running 7 pipeline(s). |
/azp run ONNX Runtime Web CI Pipeline |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is temporary, right? I'm surprised that XNNPack doesn't have a higher limit, like 16/256/.../65536. This approach reminds me of growing std::vector
with linear reallocation, and that because you're also copying all the existing elements each time, a linear push_back
in a loop will actually result in a higher than linear time complexity (which is why most implementations have a 1.5x or 2x growth pattern to avoid this). So, those models that have 128 concatenated inputs will experience n^2 time o_o.
We definitely don't expect WebNN callers to duplicate this code when calling CPU, and so either XNNPack should handle > 4 inputs directly, or the Chromium WebNN interface should do it (because anything ORT layer can handle, surely the WebNN front-end can directly handle).
cc/ @huningxin, hope you could address @fdwr's comment. |
@Honry , feel free to open a Chromium issue for WebNN XNNPACK backend. We'll seek feedback from XNNPACK developers and Chromium developers to decide where to implement this feature. Thanks! |
Sure. Will do that. |
Issue created at https://bugs.chromium.org/p/chromium/issues/detail?id=1519119. |
WebNN XNNPack backend only supports the concat with inputs number <= 4, decomposing the Concat with inputs number > 4 into multiple WebNN concat ops.