Reduce GPU memory for Whisper models converted to ONNX #17378

petermcaughan · 2023-08-31T21:50:47Z

Description

This PR changes the Whisper export scripts to further optimize the process of removing duplicate initializers from two subgraphs.

The current Greedy approach is quicker by a large factor, but results in some duplicate initializers not being caught and removed. This not only results in a slightly larger Whisper model, but also a model that uses more GPU memory.

The approach in this PR uses data hashes and caches to keep a quick export but no longer rely on a greedy approach.

onnxruntime/python/tools/transformers/onnx_model.py

onnxruntime/python/tools/transformers/convert_generation.py

onnxruntime/python/tools/transformers/onnx_model.py

tianleiwu · 2023-09-01T04:02:41Z

Use hash table to speed up like in https://github.com/huggingface/optimum/blob/7450ca30e295abc9e20d56d0aa741402322def0f/optimum/onnx/transformations_utils.py#L31-L54?

In that implementation, they ignore those with dimension 1 with data type int32 or int64, or scalar with dimension 0. That could filter out small initializers. Might also help speed up a little.

Regarding to results in some duplicate initializers not being caught and removed is likely caused by float16 tensor data is stored in int32_data field, or some tensor not loaded from external data file.

### Description This PR changes the Whisper export scripts to further optimize the process of removing duplicate initializers from two subgraphs. The current Greedy approach is quicker by a large factor, but results in some duplicate initializers not being caught and removed. This not only results in a slightly larger Whisper model, but also a model that uses more GPU memory. The approach in this PR uses data hashes and caches to keep a quick export but no longer rely on a greedy approach. --------- Co-authored-by: Peter McAughan <[email protected]>

Peter McAughan added 2 commits August 31, 2023 20:17

Initial commit from speech team

909bd45

Lintrunner changes

ef0bd50

github-advanced-security bot found potential problems Aug 31, 2023

View reviewed changes

tianleiwu previously approved these changes Sep 1, 2023

View reviewed changes

Lintrunner fixes

d436ade

petermcaughan dismissed tianleiwu’s stale review via d436ade September 5, 2023 17:39

tianleiwu approved these changes Sep 5, 2023

View reviewed changes

petermcaughan merged commit fa28359 into main Sep 5, 2023

petermcaughan deleted the petermca/whisper-gpu-memory branch September 5, 2023 23:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce GPU memory for Whisper models converted to ONNX #17378

Reduce GPU memory for Whisper models converted to ONNX #17378

petermcaughan commented Aug 31, 2023

tianleiwu commented Sep 1, 2023 •

edited

Loading

Reduce GPU memory for Whisper models converted to ONNX #17378

Reduce GPU memory for Whisper models converted to ONNX #17378

Conversation

petermcaughan commented Aug 31, 2023

Description

tianleiwu commented Sep 1, 2023 • edited Loading

tianleiwu commented Sep 1, 2023 •

edited

Loading