Implement pre-packed blobs serialization on disk and their memory mapping on load #23069

yuslepukhin · 2024-12-10T20:47:56Z

Description

Pre-packing is a feature, that allows kernels to re-arrange weights data
to gain performance at interference time

Currently, pre-packed blobs are shared when a cross-session weight sharing is enabled and only for those weights that are marked as shared by the user. Otherwise, data resides on the heap, the kernels own the data which may be duplicated.

This change enables pre-packed data to be stored on disk alongside with the external initializers.
The pre-packed blobs are memory mapped and are loaded into either the X-session shared container
or a new container that shares pre-packed blobs within the session.

With the new approach, pre-packed blobs are always owned by the shared container using the existing pre-pack mechanism for sharing. When X-session sharing is enabled, then the external container owns the data.
A separate container owned by a root SessionState owns and shares the data when X-session sharing is not enabled.

To facilitate this new approach, we introduce a new container that works in two modes. When an optimized model is being saved, and pre-packed weights saving is enabled, the new container will record pre-packed blobs and serialize them to disk using existing ToGraphProtoWithExternalInitializers function.

To externalize the pre-packed weights, we introduce a new session option kOrtSessionOptionsSavePrePackedConstantInitializers. Note, that pre-packing should be enabled (default) for this to work.

ToGraphProtoWithExternalInitializersfunction is modified to recurse into subgraphs to make sure we properly account for local initializer names.

In the second mode, the container would simply hold the pre-packed weights memory-mapped from disk and share them with the kernels.

Motivation and Context

Reduce memory usage by pre-packed initializers and externalize them.

and pre-packed blobs sharing when weights sharing is not enabled. Memory map pre-packed blobs. Recurse into subgraphs in ToGraphProtoWithExternalInitializers to make sure all big weights are serialized along with their pre-packs that is to be shared between the subgraphs.

onnxruntime/core/framework/prepacked_weights_container.h

onnxruntime/core/framework/session_state.cc

onnxruntime/test/framework/session_state_test.cc

onnxruntime/test/framework/tensorutils_test.cc

include/onnxruntime/core/graph/model_saving_options.h

onnxruntime/core/framework/prepacked_weights_container.h

onnxruntime/core/framework/tensorprotoutils.cc

onnxruntime/core/framework/tensor_external_data_info.cc

onnxruntime/core/framework/prepacked_weights_container.h

include/onnxruntime/core/graph/graph.h

include/onnxruntime/core/graph/model_saving_options.h

include/onnxruntime/core/session/onnxruntime_session_options_config_keys.h

onnxruntime/core/graph/graph.cc

include/onnxruntime/core/graph/graph.h

onnxruntime/core/framework/tensor_external_data_info.h

skottmckay

yuslepukhin requested review from snnn, skottmckay and frank-dong-ms December 10, 2024 20:47

yuslepukhin commented Dec 10, 2024

View reviewed changes

onnxruntime/core/framework/prepacked_weights_container.h Outdated Show resolved Hide resolved

yuslepukhin commented Dec 10, 2024

View reviewed changes

onnxruntime/core/framework/prepacked_weights_container.h Outdated Show resolved Hide resolved

yuslepukhin commented Dec 10, 2024

View reviewed changes

onnxruntime/core/framework/session_state.cc Outdated Show resolved Hide resolved

github-advanced-security bot found potential problems Dec 10, 2024

View reviewed changes

Fix CI complaints

fab27a7

yuslepukhin force-pushed the yuslepukhin/prepack_serialize branch from 7fc9a93 to fab27a7 Compare December 10, 2024 23:00

yuslepukhin added 3 commits December 10, 2024 15:43

Address macro invocation

110c53b

Remove structured binding as Androind does not like it

0dfac49

Exclude __wasm__ from testing external prepacks

eccd758

yuslepukhin marked this pull request as ready for review December 11, 2024 18:38

yuslepukhin requested a review from tianleiwu December 11, 2024 19:26