Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ORT 1.20.1 Release] Cherry pick 1st round #22785

Merged
merged 10 commits into from
Nov 12, 2024

Conversation

yf711
Copy link
Contributor

@yf711 yf711 commented Nov 8, 2024

@sophies927 sophies927 requested review from snnn and mszhanyi November 8, 2024 19:22
@snnn
Copy link
Member

snnn commented Nov 11, 2024

Please include #22345 , which will fix the Windows GPU DML CI Pipeline error.

idiskyle and others added 7 commits November 11, 2024 15:22
### Description
<!-- Describe your changes. -->
**Changes applied to maven related signing:** 
* Windows sha256 file encoded by utf8(no BOM)
* powershell script task used latest version, previous 5.1 version only
supports utf8 with BOM.
* Windows sha256 file content in format 'sha256value
*filename.extension'.
* Linux sha256 file content in format 'sha256value *filename.extension'.

**More information about powershell encoding:**
Windows powershell encoding reference: [about_Character_Encoding -
PowerShell | Microsoft
Learn](https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_character_encoding?view=powershell-7.4)
- for version 5.1, it only has 'UTF8 Uses UTF-8 (with BOM).'
- for version v7.1 and higher, it has:
     utf8: Encodes in UTF-8 format (no BOM).
     utf8BOM: Encodes in UTF-8 format with Byte Order Mark (BOM)
     utf8NoBOM: Encodes in UTF-8 format without Byte Order Mark (BOM)
…e can be easily used (#22345)

### Description
The local build of the native library was being included by almost every
project, but is only needed to run tests. Due to the multiple inclusions
attempting to use a pre-built package was clashing with any local builds
that were available.

Create a helper file to include either a local built of a pre-built
package and include that in the two test projects.

Cleanup various miscellaous things.

### Motivation and Context

Create setup to simplify running on-device tests with the nuget
packages.
### Description
support Qnn 2.28
update default qnn vesion to 2.28 in build pipeline
…ons (#22677)

### Description
Introduces the `get_qdq_config()` function to get a quantization
configuration for a full integer QDQ model. This function provides an
easier way of specifying commonly used options and sets convenient
defaults. Specifically:

- Instead of requiring the user to pass a dictionary of `extra_options`,
the new interface adds function parameters for common settings:
  - All calibrator settings
  - Whether activations/weights are symmetric
  - Whether to keep or fuse relu/clip into Q
  - Minimum real range for quantization
  - Dictionary of tensor quantization overrides.
- Automatically scans the input floating-point model and fills out the
operator types to quantize. Otherwise, only a limited number of operator
types would be quantized by default.
- Detects if the input model uses external data. If so, ensures that the
generated QDQ model also uses external data.
- Detects if the model will use newly introduced quantization types
(int4/int16) with an older opset. If so, forces the use of the
`com.microsoft` domain for Q/DQ ops, which support all types.
- Automatically enables the "extra option" called
`ForceQuantizeNoInputCheck` to ensure data movement operators (e.g.,
Transpose) are always quantized.
- User can pass a function to indicate which nodes to exclude from
quantization.
- The user can still pass their own `extra_options` to override any of
the above if necessary.
 
```python
from onnxruntime.quantization import get_int_qdq_config, quantize # , ...

# Get QDQ configuration
qdq_config = get_int_qdq_config(
    float_model,
    data_reader,
    calibrate_method=CalibrationMethod.Percentile,
    calibrate_args={"percentile": 99.98},  # Converted to extra_options
    activation_type=QuantType.QUInt8,
    weight_type=QuantType.QInt8,
    per_channel=True,
    nodes_to_exclude=["Mul"], # Could also be a function. Ex: `lambda model, node: node.op_type == "Softmax"`

    # Other options converted to extra_options:
    min_real_range=0.0001,
    keep_removable_activations=True,
    activation_symmetric=True,
    weight_symmetric=True,
)

# Quantize model
quantize(float_model_path, qdq_model_path, qdq_config)
```
### Motivation and Context
Need a version of `get_qnn_qdq_config()` that is not EP-specific.
…the weight's scale (#22020)

### Description
Fixes scenario in which a bias input quantized to int32 has a scale that
is too small. A bias with a scale that is smaller than a certain
threshold will overflow the range of an `int32` when quantized, which
significantly decreases accuracy.

Credit to @yihonglyu for finding out about this issue and the fix.

### Motivation and Context
Consider the following Convolution with very small weights and a
constant bias input of `[5, -4.5]`.

![image](https://github.com/user-attachments/assets/4bde2bd9-892f-4ae9-887b-61a6668779a1)

The QDQ quantizer first computes the following quantization scale for
`input_0` and `weight`:
- `input_0`: scale=0.5
- `weight`: scale=7.843e-10 **[really small]**

The QDQ quantizer then computes the bias input's scale as follows:
```
bias_scale = input_0_scale * weight_0_scale = 0.5 * 7.843e-10 = 3.9215686274509805e-11
```

This `bias_scale` is too small. Before this PR, the QDQ quantizer would
quantize the f32 bias with this `bias_scale`:
```
bias_quant = round(bias_f32 / bias_scale) =  round([5.0/bias_scale, -4.5/bias_scale]) = [127500000000, -114750000000]
```
These quantized bias values exceed the range of int32, and so are
clipped to [int32.min(), int32.max()], which is very inaccurate.

#### New approach
This PR increases the `weight_0_scale` by the necessary amount to ensure
that `bias_scale` (which equals `weight_0_scale * input_0_scale`) is
appropriate for the int32 quantization type.

The smallest valid bias scale is given by the normal scale formula: 
`bias_smallest_valid_scale = (bias_f32_max - bias_f32_min) / (int32_max
- int32_min)`

Then, we compute the candidate bias scale:
`bias_scale_candidate = input_0_scale * weight_0_scale`

If the candidate scale is smaller than the smallest valid scale, we
increase the `weight_0_scale` by the necessary ratio:
```python
if bias_scale_candidate < bias_smallest_valid_scale:
    ratio = bias_smallest_valid_scale / bias_scale_candidate
    weight_0_scale = ratio * weight_0_scale
```

Then, we recompute the final bias scale:
```python
bias_scale = input_0_scale * weight_0_scale
```

#### Impact on accuracy
Here's the above model's quantized output compared to the f32
(ground-truth) output.
- Before PR: 
  - f32 model output[0]: **5.0f**
  - qdq model output[0]: **0.075**
  - SNR: 0.1369 (higher is better)
- After PR:
  - f32 model output[0]: **5.0f**
  - qdq model output[0]: **4.992**
  - SNR: 55.656 (higher is better)
### Description
Updates python quantization tool:
- Ensures QDQ Pad has equal quantization parameters across input and
output for certain Pad configurations.
- Ensures QDQ Slice always has equal quantization parameters across
input and output.
- Fixes bug when Softmax is _excluded_ from quantization.


### Motivation and Context
QDQ Pad and Slice have lower latency on QNN EP when their quantization
parameters are equal.
### Description
Adds `reduce_range` option to `get_qdq_config()`



### Motivation and Context
Make it easier to set this option when calling get_qdq_config().
Otherwise, user has to set the option manually.
@yf711 yf711 requested a review from a team as a code owner November 11, 2024 23:29
adrianlizarraga and others added 2 commits November 11, 2024 19:50
### Description
Fixes a unit test that would fail intermittently due to an existing bug
with Pad (reflect mode). When the number of padded values is >= the
inner dimension size, the ORT Pad implementation accesses invalid
memory. This PR makes the number of padding values less than the inner
dimension size to avoid triggering the bug.


### Motivation and Context
See related issues:
#8265
#11828
#20801

Here's a valgrind trace obtained on a Linux machine (with
`sess_options.enable_cpu_mem_arena = False`)
```
==864228== Invalid read of size 4
==864228==    at 0x2716272A: void onnxruntime::PadInnermostAxis<unsigned int>(unsigned int*, unsigned int*, long, unsigned long) (pad.cc:370)
==864228==    by 0x2715D213: onnxruntime::common::Status onnxruntime::PadImpl<unsigned int>(onnxruntime::OpKernelContext*, absl::lts_20240722::InlinedVector<long, 10ul, std::allocator<long> > const&, absl::lts_20240722::InlinedVector<long, 10ul, std::allocator<long> > const&, onnxruntime::Mode const&, unsigned int) (pad.cc:551)
==864228==    by 0x2715B2BB: onnxruntime::Pad::Compute(onnxruntime::OpKernelContext*) const (pad.cc:725)
==864228==    by 0x276FF6A7: onnxruntime::ExecuteKernel(onnxruntime::StreamExecutionContext&, unsigned long, unsigned long, bool const&, onnxruntime::SessionScope&) (sequential_executor.cc:484)
==864228==    by 0x276F4A04: onnxruntime::LaunchKernelStep::Execute(onnxruntime::StreamExecutionContext&, unsigned long, onnxruntime::SessionScope&, bool const&, bool&) (execution_steps.cc:73)
...
```

The above is obtained with the basic Pad(reflect) example on the [ONNX
Pad operator spec
page](https://onnx.ai/onnx/operators/onnx__Pad.html#summary):

```python
data = [
    [1.0, 1.2],
    [2.3, 3.4],
    [4.5, 5.7],
]

pads = [0, 2, 0, 0]

mode = 'reflect'

# Expected output by ONNX spec
expected_output = [
    [1.0, 1.2, 1.0, 1.2],
    [2.3, 3.4, 2.3, 3.4],
    [4.5, 5.7, 4.5, 5.7],
]

# Bugged output from onnxruntime has invalid/uninitialized data for the first element in the inner dimension
# invalid data may be 0.0, inf, nan, etc.
ort_output = [
    [inf, 1.2, 1.0, 1.2],
    [inf, 3.4, 2.3, 3.4],
    [inf, 5.7, 4.5, 5.7],
]
```
Update the `SkipLayerNorm` implementation to address issues.
@yf711 yf711 merged commit c6156c1 into rel-1.20.1 Nov 12, 2024
242 of 248 checks passed
@yf711 yf711 deleted the yifanl/round-1-cherry-pick-rel-1.20.1 branch November 12, 2024 22:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants