-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add IO binding support for custom ORTModel #447
Conversation
The documentation is not available anymore as the PR was closed or merged. |
As in the PR #539, the output logits would have variable shapes depending on the model used for the segmentation task(at least there is no one general rule to infer the output shape for all these models), I would like to ship this PR to enable the IOBinding for outputs that we can't infer the shape in advance. @michaelbenayoun @fxmarty And FYI @TheoMrc |
Thanks for the heads up ! I wonder, how does io binding work when you cannot bind a fixed output size ? Just a last question to satisfy my curiosity, is there no way to read output size from model ? I remember I could quite easily do so on tensorflow during my ML debuts (tf.keras.utils.plot_model) Have a good evening, Update: Found this on SO: ONNX graph - How to get output-dimensions
|
Maybe we could |
Hi @TheoMrc, Under the hood, ONNX Runtime uses a data structure named Yes, the aim of IOBinding is to prepare data buffers on the device to avoid offloading them to the CPU and causing data copying overhead when you want to reuse the data for the following compute on the device. That's why adding IOBinding is especially important for decoding as you will need to reuse the output of the last step. I don't think that you can infer the shape, you can get that info easily if you are under eager mode, but in this case, you are using the graph, and the outputs shape is dynamic, you will probably get shape for some initializers and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @JingyaHuang !!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Makes me think out loud that for the ORTModel
we could have either a fixed input/output path, or a more flexible one, or something in this flavor to solve e.g. #479
For the ORT modeling, it is already support for custom task while using one single ONNX model for inference(though penalty for being dynamic), and we can implement a custom one in case of seq2seq, WDYT? |
What I meant is that if you use |
@fxmarty In this case, use ORTModelForCustomTasks. From my understanding, other task-defined models are implemented in a static manner in order to avoid any penalty for the pipeline. @philschmid What I do agree is that, to fully enable the customizability, the exporter should support user to customize what they want to take in as inputs and outputs. |
Yes the idea is to keep the |
What does this PR do?
In #421, Optimum added IO binding support for task-defined models. In this PR, we will add IO binding support for custom tasks.
For custom cases, Optimum is unaware of the outputs and their size information. In this case, we will let ONNX Runtime allocate memory for the output as an
OrtValue
. So a transfer of tensor across frameworks will be needed.onnxruntime-training
is installed, with the dlpack support, the output tensors will be transferred from ONNX runtime to PyTorch on the device directly.onnxruntime-gpu
is installed, the ownership of the output tensor will be transferred from ORT -> CuPy -> PyTorch.This PR is currently on stand-by and waiting for requests from the community.