enable serialize prepacked weights into data file #22256

frank-dong-ms · 2024-09-28T07:16:57Z

Description

part of #21448
This change is intend to save CPU memory during model load for inference.
Added session option save_prepacked_constant_initializers, with save_prepacked_constant_initializers turn on:

optimize model with inference session, prepacked external initializer will be saved into data file.
load optimized model and external data file with prepacked initializer, no prepack is needed
run inference with optimized model and data file

Tested with model Phi-3-mini-instruct-onnx,
with ORT 1.12.0:

with this change:

Peak memory usage dropped from 5.438 GB to 2.726GB.
This change takes advantage of ORT loads external initializer with mmap on CPU. Prepack will use extra memory on heap, omit prepack process can save this part of memory (roughly same size as external initializers).

next step:
Change all the kernels on CPU with PrePack method implemented and test properly. Will do in next PR.

Motivation and Context

onnxruntime/test/testdata/prepack/model_with_external_initializers_and_prepack_kernel.py

+import os
+
+import numpy as np
+import onnx


onnxruntime/contrib_ops/cpu/quantization/matmul_nbits.cc

include/onnxruntime/core/framework/op_kernel.h

include/onnxruntime/core/graph/graph.h

include/onnxruntime/core/session/onnxruntime_session_options_config_keys.h

onnxruntime/contrib_ops/cpu/quantization/matmul_nbits.cc

onnxruntime/core/framework/data_types.cc

onnxruntime/core/framework/session_options.h

onnxruntime/test/framework/session_state_test.cc

…frdong/prepack_1

include/onnxruntime/core/graph/graph.h

onnxruntime/core/session/inference_session.cc

onnxruntime/contrib_ops/cpu/quantization/matmul_nbits.cc

onnxruntime/core/framework/tensorprotoutils.cc

github-actions

You can commit the suggested changes from lintrunner.

onnxruntime/core/framework/session_state.cc

onnxruntime/core/framework/tensorprotoutils.cc

onnxruntime/core/framework/tensorprotoutils.h

can't re-request review so dismiss to unblock

yuslepukhin

This reverts commit c5b6be0.

…22788) This reverts commit c5b6be0. ### Description Revert ### Motivation and Context This needs simpler and more robust approach

### Description part of microsoft#21448 This change is intend to save CPU memory during model load for inference. Added session option save_prepacked_constant_initializers, with save_prepacked_constant_initializers turn on: 1. optimize model with inference session, prepacked external initializer will be saved into data file. 2. load optimized model and external data file with prepacked initializer, no prepack is needed 3. run inference with optimized model and data file Tested with model Phi-3-mini-instruct-onnx, with ORT 1.12.0: ![image](https://github.com/user-attachments/assets/3c0337be-f340-4bb7-8f9f-30f3552072ef) with this change: ![image](https://github.com/user-attachments/assets/23282990-2e1e-4a1f-92de-afa8ed7e6a43) Peak memory usage dropped from **5.438 GB to 2.726GB**. This change takes advantage of ORT loads external initializer with mmap on CPU. Prepack will use extra memory on heap, omit prepack process can save this part of memory (roughly same size as external initializers). next step: Change all the kernels on CPU with PrePack method implemented and test properly. Will do in next PR. ### Motivation and Context

…22256)" (microsoft#22788) This reverts commit c5b6be0. ### Description Revert ### Motivation and Context This needs simpler and more robust approach

…22788) This reverts commit c5b6be0. ### Description Revert ### Motivation and Context This needs simpler and more robust approach

### Description part of microsoft#21448 This change is intend to save CPU memory during model load for inference. Added session option save_prepacked_constant_initializers, with save_prepacked_constant_initializers turn on: 1. optimize model with inference session, prepacked external initializer will be saved into data file. 2. load optimized model and external data file with prepacked initializer, no prepack is needed 3. run inference with optimized model and data file Tested with model Phi-3-mini-instruct-onnx, with ORT 1.12.0: ![image](https://github.com/user-attachments/assets/3c0337be-f340-4bb7-8f9f-30f3552072ef) with this change: ![image](https://github.com/user-attachments/assets/23282990-2e1e-4a1f-92de-afa8ed7e6a43) Peak memory usage dropped from **5.438 GB to 2.726GB**. This change takes advantage of ORT loads external initializer with mmap on CPU. Prepack will use extra memory on heap, omit prepack process can save this part of memory (roughly same size as external initializers). next step: Change all the kernels on CPU with PrePack method implemented and test properly. Will do in next PR. ### Motivation and Context

…22256)" (microsoft#22788) This reverts commit c5b6be0. ### Description Revert ### Motivation and Context This needs simpler and more robust approach

### Description part of microsoft#21448 This change is intend to save CPU memory during model load for inference. Added session option save_prepacked_constant_initializers, with save_prepacked_constant_initializers turn on: 1. optimize model with inference session, prepacked external initializer will be saved into data file. 2. load optimized model and external data file with prepacked initializer, no prepack is needed 3. run inference with optimized model and data file Tested with model Phi-3-mini-instruct-onnx, with ORT 1.12.0: ![image](https://github.com/user-attachments/assets/3c0337be-f340-4bb7-8f9f-30f3552072ef) with this change: ![image](https://github.com/user-attachments/assets/23282990-2e1e-4a1f-92de-afa8ed7e6a43) Peak memory usage dropped from **5.438 GB to 2.726GB**. This change takes advantage of ORT loads external initializer with mmap on CPU. Prepack will use extra memory on heap, omit prepack process can save this part of memory (roughly same size as external initializers). next step: Change all the kernels on CPU with PrePack method implemented and test properly. Will do in next PR. ### Motivation and Context

…22256)" (microsoft#22788) This reverts commit c5b6be0. ### Description Revert ### Motivation and Context This needs simpler and more robust approach

### Description part of microsoft#21448 This change is intend to save CPU memory during model load for inference. Added session option save_prepacked_constant_initializers, with save_prepacked_constant_initializers turn on: 1. optimize model with inference session, prepacked external initializer will be saved into data file. 2. load optimized model and external data file with prepacked initializer, no prepack is needed 3. run inference with optimized model and data file Tested with model Phi-3-mini-instruct-onnx, with ORT 1.12.0: ![image](https://github.com/user-attachments/assets/3c0337be-f340-4bb7-8f9f-30f3552072ef) with this change: ![image](https://github.com/user-attachments/assets/23282990-2e1e-4a1f-92de-afa8ed7e6a43) Peak memory usage dropped from **5.438 GB to 2.726GB**. This change takes advantage of ORT loads external initializer with mmap on CPU. Prepack will use extra memory on heap, omit prepack process can save this part of memory (roughly same size as external initializers). next step: Change all the kernels on CPU with PrePack method implemented and test properly. Will do in next PR. ### Motivation and Context

…22256)" (microsoft#22788) This reverts commit c5b6be0. ### Description Revert ### Motivation and Context This needs simpler and more robust approach

frank-dong-ms added 6 commits August 29, 2024 17:42

test

59fca4a

serialize prepack initializers to onnx data file

b34b3d0

sync and merge changes

57c5c58

fix matmul_nbits kernel

acc23f4

code clean up

fe9c81b

bug fix

c7f19ca

frank-dong-ms requested a review from pranavsharma September 28, 2024 07:16

github-advanced-security bot found potential problems Sep 28, 2024

View reviewed changes

fix lint style

327cb1c

github-advanced-security bot found potential problems Sep 28, 2024

View reviewed changes

frank-dong-ms and others added 8 commits September 30, 2024 15:02

fix CI failure in Linux

c6f8b4e

fix CI failure in Android

46b9bac

fix test failures

ee818ce

disbale test for non-CPU and non-PC env, fix several tests

4520d83

more code clean up

f77d479

Merge branch 'main' into frdong/prepack_1

d58a024

fix CI errors regarding memory leak

7bf0cd7

sync and merge

694a2bc

frank-dong-ms requested a review from yuslepukhin October 8, 2024 17:26

frank-dong-ms added 3 commits October 8, 2024 10:34

fix CI failures

18b079c

fix CI errors

81eb968

fix traning pipeline for session states

e48d808

yuslepukhin requested changes Oct 9, 2024

View reviewed changes

avoid use smart pointer to wrap tensor

600a79e

github-advanced-security bot found potential problems Oct 9, 2024

View reviewed changes

onnxruntime/test/framework/session_state_test.cc Fixed Show fixed Hide fixed

frank-dong-ms added 2 commits October 9, 2024 23:25

refine memory use

f121fb9

fix CI errors

1f96556

frank-dong-ms requested a review from yuslepukhin October 10, 2024 19:38

frank-dong-ms added 2 commits October 10, 2024 13:08

fix error in mini build

3e0da5c

Merge branch 'main' of https://github.com/Microsoft/onnxruntime into …

5c2e38c

…frdong/prepack_1

yuslepukhin requested a review from tianleiwu October 23, 2024 16:59

yuslepukhin previously approved these changes Oct 23, 2024

View reviewed changes

yuslepukhin reviewed Oct 23, 2024

View reviewed changes

include/onnxruntime/core/graph/graph.h Outdated Show resolved Hide resolved

yuslepukhin reviewed Oct 23, 2024

View reviewed changes

onnxruntime/core/session/inference_session.cc Outdated Show resolved Hide resolved

yuslepukhin reviewed Oct 23, 2024

View reviewed changes

onnxruntime/contrib_ops/cpu/quantization/matmul_nbits.cc Outdated Show resolved Hide resolved

yuslepukhin reviewed Oct 23, 2024

View reviewed changes

onnxruntime/core/framework/tensorprotoutils.cc Outdated Show resolved Hide resolved

frank-dong-ms added 2 commits October 23, 2024 17:36

take comments

c87f420

merge main and fix comments

5d0d07d

frank-dong-ms dismissed yuslepukhin’s stale review via 5d0d07d October 24, 2024 01:50

frank-dong-ms requested a review from yuslepukhin October 24, 2024 01:52

github-actions bot previously requested changes Oct 24, 2024

View reviewed changes

fix lint and build issue on web CI

e6b86e6

frank-dong-ms added 2 commits October 24, 2024 14:09

fix API

a3e7314

split test with private API and public api

b832ce9

yuslepukhin approved these changes Oct 25, 2024

View reviewed changes

frank-dong-ms merged commit c5b6be0 into main Oct 25, 2024
90 checks passed

frank-dong-ms deleted the frdong/prepack_1 branch October 25, 2024 05:24

frank-dong-ms mentioned this pull request Oct 31, 2024

prepack serialization follow up PR #22670

Open

yuslepukhin added a commit that referenced this pull request Nov 9, 2024

Revert "enable serialize prepacked weights into data file (#22256)"

9445510

This reverts commit c5b6be0.

yuslepukhin added a commit that referenced this pull request Nov 11, 2024

Revert "enable serialize prepacked weights into data file (#22256)" (#…

c5276ac

…22788) This reverts commit c5b6be0. ### Description Revert ### Motivation and Context This needs simpler and more robust approach

guschmue pushed a commit that referenced this pull request Dec 2, 2024

Revert "enable serialize prepacked weights into data file (#22256)" (#…

442f997

…22788) This reverts commit c5b6be0. ### Description Revert ### Motivation and Context This needs simpler and more robust approach

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enable serialize prepacked weights into data file #22256

enable serialize prepacked weights into data file #22256

frank-dong-ms commented Sep 28, 2024

github-actions bot left a comment

yuslepukhin left a comment

enable serialize prepacked weights into data file #22256

enable serialize prepacked weights into data file #22256

Conversation

frank-dong-ms commented Sep 28, 2024

Description

Motivation and Context

github-actions bot left a comment

Choose a reason for hiding this comment

yuslepukhin left a comment

Choose a reason for hiding this comment