[TOOLS]：Using transformers.optimizer optimize large model, segmentation fault (core dumped) #17212

han65487312 · 2023-08-18T08:12:43Z

Describe the issue

When I use transformers.optimizer to optimize UNet model which is larger than 2GB, the remove_useless_cast_nodes pass will cause segfault. I find that symbolic shape inference in remove_useless_cast_nodes broke down.

The command is:
python3 -m onnxruntime.transformers.optimizer --input ./unet_onnx/original_model/unet.onnx --output ./unet_onnx/fuse_fp16_model/unet.onnx --model_type unet --opt_level 99 --float16 --use_gpu

And when I turn off some optimizations, the optimized model can not run on Tensorrt backend. the error message is " onnx.ModelProto exceeded maximum protobuf size of 2GB: 2357166045". The cudnn backend runs ok.

Here are the library versions I am using:

onnx ==1.14.0
onnxruntime == 1.16.0
torch == 1.12.1
protobuf == 3.0.0

To reproduce

The model is too large that I can not upload it here.

Urgency

No response

Platform

Linux

OS Version

Ubuntu 7.5.0-3ubuntu1~18.04

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.16.0

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU, CUDA, TensorRT

Execution Provider Library Version

No response

The text was updated successfully, but these errors were encountered:

tianleiwu · 2023-08-18T21:15:30Z

@han65487312,

segmentation fault (core dumped) might be caused by protobuf. You can downgrade protobuf to 3.20.3 and try again.

the optimizer is for CUDA provider, and it need the UNet to be float32 model, and use --opt_level 0.

The optimizer is not for TensorRT EP because TensorRT has its own graph optimization logic.

For TRT EP, you can try the following for SD 1.5 or 2.1 model:
https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/models/stable_diffusion/onnxruntime_tensorrt_txt2img.py
Basically, it follows same logic in https://github.com/NVIDIA/TensorRT/tree/release/8.6/demo/Diffusion to generate the onnx models for TensorRT backend.

Example code

from onnxruntime.transformers.models.stable_diffusion.onnxruntime_tensorrt_txt2img import OnnxruntimeTensorRTStableDiffusionPipeline
from diffusers.schedulers import DDIMScheduler

model_name_or_path = "runwayml/stable-diffusion-v1-5"
scheduler = DDIMScheduler.from_pretrained(model_name_or_path, subfolder="scheduler")

 pipe = OnnxruntimeTensorRTStableDiffusionPipeline.from_pretrained(
        model_name_or_path,
        revision="fp16",
        torch_dtype=torch.float16,
        scheduler=scheduler,
        image_height=512,
        image_width=512,
        max_batch_size=4,
    )

# re-use cached folder to save ONNX models and TensorRT Engines
pipe.set_cached_folder(model_name_or_path, revision="fp16")

pipe = pipe.to("cuda")

prompt = "photorealistic new zealand hills"
image = pipe(prompt).images[0]
image.save("ort_trt_txt2img_new_zealand_hills.png")

For SDXL, currently we are still working on the optimization.

For more information, see https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/models/stable_diffusion/README.md

han65487312 · 2023-08-21T01:43:11Z

Thanks for your reply. In deed, my model UNet is a customized model. It's not in diffusers repo. I wonder is there some ways that make attention run in cudnn backend and other optimizations run in trt backend. The steps I use the optimization in transformers.optimizer are 1. export my customized fp32 UNet model. 2. use transformers.optimizer fuse attention layer. 3. run the model by onnxruntime. If I set the --opt_level 0, the step2 fuse the attention layer would not work.

tianleiwu · 2023-08-21T21:54:09Z

@han65487312,

I wonder is there some ways that make attention run in cudnn backend and other optimizations run in trt backend.
If you use both TRT and CUDA providers in session creation, ORT will partition those fused nodes to CUDA EP, and the others to TRT EP.

However, that might not be a good way to use TRT since TRT need convert NCHW to NHWC layout for the whole graph internally. If you use the optimizer for CUDA EP, TRT could not reach its full potential since TRT only works on subgraphs.

--opt_level 0 is required for ORT <= 1.16 since previously ORT cannot save optimized model > 2GB. This constraint is removed in ORT 1.16 (built from source).

I think TRT could handle model > 2GB since TRT can run SDXL model which is larger than 2GB. @chilo-ms, it there some limitation in TRT EP?

…ta (#17427) Some initializers are added without raw=True flag. That causes those tensors cannot be saved to external data. If those tensors exceed 2GB in total, optimized model cannot be saved due to protobuf limit. This change will save attention weights and bias in raw data. Note: it is optional to use raw data for shape tensor since they are tiny. ### Motivation and Context #17212 #15349

…ta (microsoft#17427) Some initializers are added without raw=True flag. That causes those tensors cannot be saved to external data. If those tensors exceed 2GB in total, optimized model cannot be saved due to protobuf limit. This change will save attention weights and bias in raw data. Note: it is optional to use raw data for shape tensor since they are tiny. ### Motivation and Context microsoft#17212 microsoft#15349

github-actions bot added ep:CUDA issues related to the CUDA execution provider ep:TensorRT issues related to TensorRT execution provider labels Aug 18, 2023

tianleiwu mentioned this issue Sep 5, 2023

Fix weight tensors in transformers optimizer not saved to external data #17427

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TOOLS]：Using transformers.optimizer optimize large model, segmentation fault (core dumped) #17212

[TOOLS]：Using transformers.optimizer optimize large model, segmentation fault (core dumped) #17212

han65487312 commented Aug 18, 2023 •

edited

Loading

tianleiwu commented Aug 18, 2023 •

edited

Loading

han65487312 commented Aug 21, 2023 •

edited

Loading

tianleiwu commented Aug 21, 2023

[TOOLS]：Using transformers.optimizer optimize large model, segmentation fault (core dumped) #17212

[TOOLS]：Using transformers.optimizer optimize large model, segmentation fault (core dumped) #17212

Comments

han65487312 commented Aug 18, 2023 • edited Loading

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

tianleiwu commented Aug 18, 2023 • edited Loading

han65487312 commented Aug 21, 2023 • edited Loading

tianleiwu commented Aug 21, 2023

han65487312 commented Aug 18, 2023 •

edited

Loading

tianleiwu commented Aug 18, 2023 •

edited

Loading

han65487312 commented Aug 21, 2023 •

edited

Loading