fixed point based requantization on arm64 #11540

yufenglee · 2022-05-16T21:40:30Z

This PR adds fixed point based requantization for ARM64 devices.
Requantization is computed with formula:
v = round(clamp(S * (I - Z), min, max))
where v is the target value with type TOutput, which is either int8_t or uint8_t
I is the input value with type int32_t
S is the scale with type float
Z is the zero point with type same as TOutput.
min is the minimum value of type TOutput.
max is the maximum value of type TOutput.
For considerations of power consumption and some ARM devices don't even have FPUs, it is import to to be able to run
quantization with integer instructions only.FixedPoint Requantization is introduced to support this feature.Its general
idea is to convert scale S to fixed point. Ruy and XNNPack's method are referred for the implementation.

// NOTE that fixed point requantization rounds half to up, whereas ONNX spec rounds half to even, so for identical
// model and input the inference results may not be exactly same with option kOrtSessionOptionsConfigFixedPointRequantOnARM64 on and off. The impact should be
// small in practice (NNApi EP uses same rounding).

chenfucn · 2022-05-25T18:09:47Z

May want to add the rounding concerns in the description?

In reply to: 1137660710

chenfucn · 2022-05-25T18:14:39Z

include/onnxruntime/core/session/onnxruntime_session_options_config_keys.h

+//       S is the scale with type float
+//       Z is the zero point with type same as TOutput.
+//       min is the minimum value of type TOutput.
+//       max is the maximum value of type TOutput.


Great comments, very informative!

This is a performance improvement, yet the option is by default "off", due to rounding errors? Would you consider specifying the reason for this?

The same trick might work in other CPUs too. Since it's default off, maybe removing ARM from name? #Resolved

yes, it is turned off by default for rounding. It rounds half to up, however onnx specs requires to round half to even. Will add the info.

There is no plan to support it on x86. keeping arm here can avoid the confusion that users think x86 supports similar option.

chenfucn · 2022-05-26T18:15:15Z

onnxruntime/core/mlas/lib/convsym.cpp

@@ -500,8 +508,7 @@ MlasConvSym(
    }

    MLAS_CONV_SYM_POST_PROCESS_PARAMS PostProcessParams = {};
-
-    MlasConvSymSetOutputZeroPoint(PostProcessParams, Params.OutputZeroPoint, Params.InputIsSigned);
+    MlasConvSymSetOutputZeroPoint(PostProcessParams, OutputZeroPoint, Params.InputIsSigned);


is this done repeatedly on the same set of parameters? should we consider moving this out in the future? Need to change MLAS interface to do that. Since MLAS is not public yet maybe ok?

chenfucn · 2022-05-27T16:59:04Z

onnxruntime/core/mlas/lib/qdwconv_kernelsize.cpp

 void
 MLASCALL
-MlasConvSymDepthwiseKernelSize25ArmU8S8(
+MlasConvSymDepthwiseKernelSize25ArmS8S8Impl(


This looks like a huge change. It used to be the U8S8 code is defined before S8S8, now it is reversed. it seems that this is the cause of most of the changes. Why do you need to flip the position of these two? #Resolved

it is unintentional. let me reverse them back.

chenfucn · 2022-05-27T17:30:43Z

onnxruntime/core/mlas/inc/mlas.h

+        Multiplier(Multiplier), PreShift(PreShift), PostShift(PostShift),
+        Size(Size), ZeroPoint(ZeroPoint){}
+
+    MLAS_ROUND_KIND RequantRoundKind;


this field might be redundant, it can be deduced from the value of Scale or Multiplier (0 vs none 0) #Resolved

It is more descriptive. Would like to keep it

can be replaced by a const method

chenfucn · 2022-05-27T17:57:41Z

Since the change involves change to the kernels, especially adding branches, Consider Adding perf tests with one of the models with big conv or gemm ops?

This reverts commit 1f2c926.

This reverts commit 1f2c926. Because it makes our packaging pipeline crash Error message: [ RUN ] QLinearConvTest.Conv3D_S8S8_Depthwise Test #1: onnxruntime_test_all ...................Subprocess killed***Exception: 838.24 sec We haven't successfully reproduced the bug on a real ARM64 hardware. Currently we only saw it showed up with qemu. More investigations are on-going.

yufenglee requested a review from a team as a code owner May 16, 2022 21:40

yufenglee force-pushed the yufeng/requant branch 2 times, most recently from 121f8ff to 2927755 Compare May 23, 2022 04:02

fixed point based requantization on arm64

1468add

yufenglee force-pushed the yufeng/requant branch from 790d5b8 to 1468add Compare May 25, 2022 00:15

chenfucn reviewed May 25, 2022

View reviewed changes

chenfucn reviewed May 26, 2022

View reviewed changes

chenfucn reviewed May 27, 2022

View reviewed changes

yufenglee added 2 commits May 31, 2022 10:28

reverse MlasConvSymDepthwiseKernel u8s8 and s8s8 order

1494d90

Merge branch 'master' into yufeng/requant

9238ea3

chenfucn approved these changes Jun 1, 2022

View reviewed changes

yufenglee merged commit 1f2c926 into master Jun 2, 2022

yufenglee deleted the yufeng/requant branch June 2, 2022 19:34

snnn added a commit that referenced this pull request Jun 3, 2022

Revert "fixed point based requantization on arm64 (#11540)"

7fece7b

This reverts commit 1f2c926.

snnn added a commit that referenced this pull request Jun 3, 2022

Revert "fixed point based requantization on arm64 (#11540)"

73127e2

This reverts commit 1f2c926.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fixed point based requantization on arm64 #11540

fixed point based requantization on arm64 #11540

yufenglee commented May 16, 2022 •

edited

Loading

chenfucn commented May 25, 2022 •

edited by yufenglee

Loading

chenfucn May 25, 2022 •

edited by yufenglee

Loading

yufenglee May 31, 2022

chenfucn May 26, 2022

chenfucn May 27, 2022 •

edited by yufenglee

Loading

yufenglee May 31, 2022

chenfucn May 27, 2022 •

edited by yufenglee

Loading

yufenglee May 31, 2022

chenfucn Jun 1, 2022

chenfucn commented May 27, 2022

fixed point based requantization on arm64 #11540

fixed point based requantization on arm64 #11540

Conversation

yufenglee commented May 16, 2022 • edited Loading

chenfucn commented May 25, 2022 • edited by yufenglee Loading

chenfucn May 25, 2022 • edited by yufenglee Loading

Choose a reason for hiding this comment

yufenglee May 31, 2022

Choose a reason for hiding this comment

chenfucn May 26, 2022

Choose a reason for hiding this comment

chenfucn May 27, 2022 • edited by yufenglee Loading

Choose a reason for hiding this comment

yufenglee May 31, 2022

Choose a reason for hiding this comment

chenfucn May 27, 2022 • edited by yufenglee Loading

Choose a reason for hiding this comment

yufenglee May 31, 2022

Choose a reason for hiding this comment

chenfucn Jun 1, 2022

Choose a reason for hiding this comment

chenfucn commented May 27, 2022

yufenglee commented May 16, 2022 •

edited

Loading

chenfucn commented May 25, 2022 •

edited by yufenglee

Loading

chenfucn May 25, 2022 •

edited by yufenglee

Loading

chenfucn May 27, 2022 •

edited by yufenglee

Loading

chenfucn May 27, 2022 •

edited by yufenglee

Loading