-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TOOLS]:Using transformers.optimizer optimize large model, segmentation fault (core dumped) #17212
Comments
the optimizer is for CUDA provider, and it need the UNet to be float32 model, and use The optimizer is not for TensorRT EP because TensorRT has its own graph optimization logic. For TRT EP, you can try the following for SD 1.5 or 2.1 model: Example code
For SDXL, currently we are still working on the optimization. For more information, see https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/models/stable_diffusion/README.md |
Thanks for your reply. In deed, my model UNet is a customized model. It's not in diffusers repo. I wonder is there some ways that make attention run in cudnn backend and other optimizations run in trt backend. The steps I use the optimization in transformers.optimizer are 1. export my customized fp32 UNet model. 2. use transformers.optimizer fuse attention layer. 3. run the model by onnxruntime. If I set the |
However, that might not be a good way to use TRT since TRT need convert NCHW to NHWC layout for the whole graph internally. If you use the optimizer for CUDA EP, TRT could not reach its full potential since TRT only works on subgraphs.
I think TRT could handle model > 2GB since TRT can run SDXL model which is larger than 2GB. @chilo-ms, it there some limitation in TRT EP? |
…ta (#17427) Some initializers are added without raw=True flag. That causes those tensors cannot be saved to external data. If those tensors exceed 2GB in total, optimized model cannot be saved due to protobuf limit. This change will save attention weights and bias in raw data. Note: it is optional to use raw data for shape tensor since they are tiny. ### Motivation and Context #17212 #15349
…ta (#17427) Some initializers are added without raw=True flag. That causes those tensors cannot be saved to external data. If those tensors exceed 2GB in total, optimized model cannot be saved due to protobuf limit. This change will save attention weights and bias in raw data. Note: it is optional to use raw data for shape tensor since they are tiny. ### Motivation and Context #17212 #15349
…ta (microsoft#17427) Some initializers are added without raw=True flag. That causes those tensors cannot be saved to external data. If those tensors exceed 2GB in total, optimized model cannot be saved due to protobuf limit. This change will save attention weights and bias in raw data. Note: it is optional to use raw data for shape tensor since they are tiny. ### Motivation and Context microsoft#17212 microsoft#15349
Describe the issue
When I use transformers.optimizer to optimize UNet model which is larger than 2GB, the remove_useless_cast_nodes pass will cause segfault. I find that symbolic shape inference in remove_useless_cast_nodes broke down.
The command is:
python3 -m onnxruntime.transformers.optimizer --input ./unet_onnx/original_model/unet.onnx --output ./unet_onnx/fuse_fp16_model/unet.onnx --model_type unet --opt_level 99 --float16 --use_gpu
And when I turn off some optimizations, the optimized model can not run on Tensorrt backend. the error message is " onnx.ModelProto exceeded maximum protobuf size of 2GB: 2357166045". The cudnn backend runs ok.
Here are the library versions I am using:
To reproduce
The model is too large that I can not upload it here.
Urgency
No response
Platform
Linux
OS Version
Ubuntu 7.5.0-3ubuntu1~18.04
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.16.0
ONNX Runtime API
Python
Architecture
X64
Execution Provider
Default CPU, CUDA, TensorRT
Execution Provider Library Version
No response
The text was updated successfully, but these errors were encountered: