[ORT 1.20.1 Release] Cherry pick 1st round #22785

yf711 · 2024-11-08T19:07:41Z

Description

Commits to be added: https://github.com/microsoft/onnxruntime/pulls?q=is%3Apr+label%3Arelease%3A1.20.1+sort%3Aupdated-desc

Motivation and Context

snnn · 2024-11-11T22:34:42Z

Please include #22345 , which will fix the Windows GPU DML CI Pipeline error.

### Description  **Changes applied to maven related signing:** * Windows sha256 file encoded by utf8(no BOM) * powershell script task used latest version, previous 5.1 version only supports utf8 with BOM. * Windows sha256 file content in format 'sha256value *filename.extension'. * Linux sha256 file content in format 'sha256value *filename.extension'. **More information about powershell encoding:** Windows powershell encoding reference: [about_Character_Encoding - PowerShell | Microsoft Learn](https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_character_encoding?view=powershell-7.4) - for version 5.1, it only has 'UTF8 Uses UTF-8 (with BOM).' - for version v7.1 and higher, it has: utf8: Encodes in UTF-8 format (no BOM). utf8BOM: Encodes in UTF-8 format with Byte Order Mark (BOM) utf8NoBOM: Encodes in UTF-8 format without Byte Order Mark (BOM)

…e can be easily used (#22345) ### Description The local build of the native library was being included by almost every project, but is only needed to run tests. Due to the multiple inclusions attempting to use a pre-built package was clashing with any local builds that were available. Create a helper file to include either a local built of a pre-built package and include that in the two test projects. Cleanup various miscellaous things. ### Motivation and Context Create setup to simplify running on-device tests with the nuget packages.

### Description support Qnn 2.28 update default qnn vesion to 2.28 in build pipeline

…ons (#22677) ### Description Introduces the `get_qdq_config()` function to get a quantization configuration for a full integer QDQ model. This function provides an easier way of specifying commonly used options and sets convenient defaults. Specifically: - Instead of requiring the user to pass a dictionary of `extra_options`, the new interface adds function parameters for common settings: - All calibrator settings - Whether activations/weights are symmetric - Whether to keep or fuse relu/clip into Q - Minimum real range for quantization - Dictionary of tensor quantization overrides. - Automatically scans the input floating-point model and fills out the operator types to quantize. Otherwise, only a limited number of operator types would be quantized by default. - Detects if the input model uses external data. If so, ensures that the generated QDQ model also uses external data. - Detects if the model will use newly introduced quantization types (int4/int16) with an older opset. If so, forces the use of the `com.microsoft` domain for Q/DQ ops, which support all types. - Automatically enables the "extra option" called `ForceQuantizeNoInputCheck` to ensure data movement operators (e.g., Transpose) are always quantized. - User can pass a function to indicate which nodes to exclude from quantization. - The user can still pass their own `extra_options` to override any of the above if necessary. ```python from onnxruntime.quantization import get_int_qdq_config, quantize # , ... # Get QDQ configuration qdq_config = get_int_qdq_config( float_model, data_reader, calibrate_method=CalibrationMethod.Percentile, calibrate_args={"percentile": 99.98}, # Converted to extra_options activation_type=QuantType.QUInt8, weight_type=QuantType.QInt8, per_channel=True, nodes_to_exclude=["Mul"], # Could also be a function. Ex: `lambda model, node: node.op_type == "Softmax"` # Other options converted to extra_options: min_real_range=0.0001, keep_removable_activations=True, activation_symmetric=True, weight_symmetric=True, ) # Quantize model quantize(float_model_path, qdq_model_path, qdq_config) ``` ### Motivation and Context Need a version of `get_qnn_qdq_config()` that is not EP-specific.

@yihonglyu

…the weight's scale (#22020) ### Description Fixes scenario in which a bias input quantized to int32 has a scale that is too small. A bias with a scale that is smaller than a certain threshold will overflow the range of an `int32` when quantized, which significantly decreases accuracy. Credit to @yihonglyu for finding out about this issue and the fix. ### Motivation and Context Consider the following Convolution with very small weights and a constant bias input of `[5, -4.5]`. ![image](https://github.com/user-attachments/assets/4bde2bd9-892f-4ae9-887b-61a6668779a1) The QDQ quantizer first computes the following quantization scale for `input_0` and `weight`: - `input_0`: scale=0.5 - `weight`: scale=7.843e-10 **[really small]** The QDQ quantizer then computes the bias input's scale as follows: ``` bias_scale = input_0_scale * weight_0_scale = 0.5 * 7.843e-10 = 3.9215686274509805e-11 ``` This `bias_scale` is too small. Before this PR, the QDQ quantizer would quantize the f32 bias with this `bias_scale`: ``` bias_quant = round(bias_f32 / bias_scale) = round([5.0/bias_scale, -4.5/bias_scale]) = [127500000000, -114750000000] ``` These quantized bias values exceed the range of int32, and so are clipped to [int32.min(), int32.max()], which is very inaccurate. #### New approach This PR increases the `weight_0_scale` by the necessary amount to ensure that `bias_scale` (which equals `weight_0_scale * input_0_scale`) is appropriate for the int32 quantization type. The smallest valid bias scale is given by the normal scale formula: `bias_smallest_valid_scale = (bias_f32_max - bias_f32_min) / (int32_max - int32_min)` Then, we compute the candidate bias scale: `bias_scale_candidate = input_0_scale * weight_0_scale` If the candidate scale is smaller than the smallest valid scale, we increase the `weight_0_scale` by the necessary ratio: ```python if bias_scale_candidate < bias_smallest_valid_scale: ratio = bias_smallest_valid_scale / bias_scale_candidate weight_0_scale = ratio * weight_0_scale ``` Then, we recompute the final bias scale: ```python bias_scale = input_0_scale * weight_0_scale ``` #### Impact on accuracy Here's the above model's quantized output compared to the f32 (ground-truth) output. - Before PR: - f32 model output[0]: **5.0f** - qdq model output[0]: **0.075** - SNR: 0.1369 (higher is better) - After PR: - f32 model output[0]: **5.0f** - qdq model output[0]: **4.992** - SNR: 55.656 (higher is better)

### Description Updates python quantization tool: - Ensures QDQ Pad has equal quantization parameters across input and output for certain Pad configurations. - Ensures QDQ Slice always has equal quantization parameters across input and output. - Fixes bug when Softmax is _excluded_ from quantization. ### Motivation and Context QDQ Pad and Slice have lower latency on QNN EP when their quantization parameters are equal.

### Description Adds `reduce_range` option to `get_qdq_config()` ### Motivation and Context Make it easier to set this option when calling get_qdq_config(). Otherwise, user has to set the option manually.

### Description Fixes a unit test that would fail intermittently due to an existing bug with Pad (reflect mode). When the number of padded values is >= the inner dimension size, the ORT Pad implementation accesses invalid memory. This PR makes the number of padding values less than the inner dimension size to avoid triggering the bug. ### Motivation and Context See related issues: #8265 #11828 #20801 Here's a valgrind trace obtained on a Linux machine (with `sess_options.enable_cpu_mem_arena = False`) ``` ==864228== Invalid read of size 4 ==864228== at 0x2716272A: void onnxruntime::PadInnermostAxis<unsigned int>(unsigned int*, unsigned int*, long, unsigned long) (pad.cc:370) ==864228== by 0x2715D213: onnxruntime::common::Status onnxruntime::PadImpl<unsigned int>(onnxruntime::OpKernelContext*, absl::lts_20240722::InlinedVector<long, 10ul, std::allocator<long> > const&, absl::lts_20240722::InlinedVector<long, 10ul, std::allocator<long> > const&, onnxruntime::Mode const&, unsigned int) (pad.cc:551) ==864228== by 0x2715B2BB: onnxruntime::Pad::Compute(onnxruntime::OpKernelContext*) const (pad.cc:725) ==864228== by 0x276FF6A7: onnxruntime::ExecuteKernel(onnxruntime::StreamExecutionContext&, unsigned long, unsigned long, bool const&, onnxruntime::SessionScope&) (sequential_executor.cc:484) ==864228== by 0x276F4A04: onnxruntime::LaunchKernelStep::Execute(onnxruntime::StreamExecutionContext&, unsigned long, onnxruntime::SessionScope&, bool const&, bool&) (execution_steps.cc:73) ... ``` The above is obtained with the basic Pad(reflect) example on the [ONNX Pad operator spec page](https://onnx.ai/onnx/operators/onnx__Pad.html#summary): ```python data = [ [1.0, 1.2], [2.3, 3.4], [4.5, 5.7], ] pads = [0, 2, 0, 0] mode = 'reflect' # Expected output by ONNX spec expected_output = [ [1.0, 1.2, 1.0, 1.2], [2.3, 3.4, 2.3, 3.4], [4.5, 5.7, 4.5, 5.7], ] # Bugged output from onnxruntime has invalid/uninitialized data for the first element in the inner dimension # invalid data may be 0.0, inf, nan, etc. ort_output = [ [inf, 1.2, 1.0, 1.2], [inf, 3.4, 2.3, 3.4], [inf, 5.7, 4.5, 5.7], ] ```

Update the `SkipLayerNorm` implementation to address issues.

1.20.0 -> 1.20.1

a01f829

sophies927 requested review from snnn and mszhanyi November 8, 2024 19:22

idiskyle and others added 7 commits November 11, 2024 15:22

support Qnn 2 28 (#22724)

f8e420e

### Description support Qnn 2.28 update default qnn vesion to 2.28 in build pipeline

[Quant Tool] Add reduce_range option to get_qdq_config() (#22782)

35e4296

### Description Adds `reduce_range` option to `get_qdq_config()` ### Motivation and Context Make it easier to set this option when calling get_qdq_config(). Otherwise, user has to set the option manually.

yf711 requested a review from a team as a code owner November 11, 2024 23:29

adrianlizarraga and others added 2 commits November 11, 2024 19:50

Update skip layer norm (#22719)

c356123

Update the `SkipLayerNorm` implementation to address issues.

snnn approved these changes Nov 12, 2024

View reviewed changes

yf711 requested a review from adrianlizarraga November 12, 2024 22:15

adrianlizarraga approved these changes Nov 12, 2024

View reviewed changes

yf711 merged commit c6156c1 into rel-1.20.1 Nov 12, 2024
242 of 248 checks passed

yf711 deleted the yifanl/round-1-cherry-pick-rel-1.20.1 branch November 12, 2024 22:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ORT 1.20.1 Release] Cherry pick 1st round #22785

[ORT 1.20.1 Release] Cherry pick 1st round #22785

yf711 commented Nov 8, 2024 •

edited

Loading

snnn commented Nov 11, 2024

[ORT 1.20.1 Release] Cherry pick 1st round #22785

[ORT 1.20.1 Release] Cherry pick 1st round #22785

Conversation

yf711 commented Nov 8, 2024 • edited Loading

Description

Motivation and Context

snnn commented Nov 11, 2024

yf711 commented Nov 8, 2024 •

edited

Loading