pull ORT main #2

mengniwang95 · 2023-07-20T02:45:42Z

No description provided.

- Upgrade from Python 3.6 to 3.8 in packaging pipeline. - Raise build.py minimum required Python version.

#16506 Cause almost every translation units on linux complaint ``` [1175/1235] Building CXX object CMakeFiles/onnxruntime_test_all.dir/home/guangyunhan/onnxruntime/orttraining/orttraining/test/training_ops/cuda/softmax_test.cc.o In file included from /home/guangyunhan/onnxruntime/include/onnxruntime/core/framework/float16.h:18, from /home/guangyunhan/onnxruntime/include/onnxruntime/core/framework/data_types.h:17, from /home/guangyunhan/onnxruntime/include/onnxruntime/core/framework/tensor.h:17, from /home/guangyunhan/onnxruntime/onnxruntime/test/common/tensor_op_test_utils.h:16, from /home/guangyunhan/onnxruntime/onnxruntime/test/providers/compare_provider_test_utils.h:7, from /home/guangyunhan/onnxruntime/orttraining/orttraining/test/training_ops/cuda/softmax_test.cc:4: /home/guangyunhan/onnxruntime/include/onnxruntime/core/session/onnxruntime_float16.h: In instantiation of ‘static constexpr uint16_t onnxruntime_float16::Float16Impl<Derived>::ToUint16Impl(float) [with Derived = onnxruntime::MLFloat16; uint16_t = short unsigned int]’: /home/guangyunhan/onnxruntime/include/onnxruntime/core/framework/float16.h:42:66: required from here /home/guangyunhan/onnxruntime/include/onnxruntime/core/session/onnxruntime_float16.h:241:7: note: ‘union onnxruntime_float16::detail::float32_bits’ has no user-provided default constructor 241 | union float32_bits { | ^~~~~~~~~~~~ /home/guangyunhan/onnxruntime/include/onnxruntime/core/session/onnxruntime_float16.h:242:16: note: and the implicitly-defined constructor does not initialize ‘unsigned int onnxruntime_float16::detail::float32_bits::u’ 242 | unsigned int u; | ^ ``` This PR shut the compiler up.

…nce (#16658) ### Description  MAUI test app with tooling to add model and generated or provided input test data. The app will load the model and validate the output. It can also run a specified number of iterations to provide basic performance information. <img width="401" alt="image" src="https://github.com/microsoft/onnxruntime/assets/979079/daf3af13-fb22-4cbb-9159-486b483a7485"> ### Motivation and Context  Primarily to make it easier to test an arbitrary model on iOS. A MAUI app allows testing on all platforms. --------- Co-authored-by: Edward Chen <[email protected]>

Allow the whole pipeline to be parameterized with unary elementwise functor.

### Description  Replace the constructor function `MLFloat16()` with the public member function `FromBits()` in the file `onnxruntime/core/providers/cann/cann_common.cc` ### Motivation and Context  PR [#16506](#16506) changed the public constructor function `MLFloat16(uint16_t x)` to private, and added a public function `MLFloat16::FromBits(uint16_t x)` in the file `include/onnxruntime/core/framework/float16.h`, which broke the CANN CI. This PR aligns the CANN behavior with the modified class `MLFloat16`.

GemmSoftmaxGemmTunble occasionally broken with large numerical error. The root cause of this error is CK's Strided Batched Gemm has larger error under a specific initialization distribution `(multinormal_distribution)`. Generic(Gemm1 + Softmax + Gemm2) implementation is one instance of GemmSoftmaxGemmTunble. Gemm1 and Gemm2 in Generic implementation are TunableOps when tuning enabled. In some case GemmSoftmaxGemmTunble select Generic implentation, while Gemm1 or Gemm2 select ck implementation, the result of GemmSoftmaxGemmTunble affect by CK. - Make tolerance more loosen. - Add `GemmSoftmaxGemmPermuteGenericNestedTunable` to test Generic implementation with tuning enabled.

…16720) There are several global configs used by DORT. ```py DEFAULT_ONNX_EXPORTER_OPTIONS = torch.onnx._internal.exporter.ResolvedExportOptions( torch.onnx._internal.exporter.ExportOptions() ) # TODO(wechi): This line must generate result identical to the call of # _create_onnx_supports_op_overload_table(...) inside # create_onnx_friendly_decomposition_table(...) in # torch/onnx/_internal/fx/decomposition_table.py. _SUPPORT_DICT = torch.onnx._internal.fx.decomposition_table._create_onnx_supports_op_overload_table( DEFAULT_ONNX_EXPORTER_OPTIONS.onnx_registry ) # type: ignore _EXTRA_SUPPORT_DICT: Dict[str, Any] = { "getattr": None, "_operator.getitem": None, } DORT_DECOMPOSITION_TABLE = DEFAULT_ONNX_EXPORTER_OPTIONS.decomposition_table ``` We can see all but `_EXTRA_SUPPORT_DICT` are extracted from deduced from ONNX exporter's options. As there are many ways to configure ONNX exporter's options, we decided to move these variables to `OrtBackend`'s `__init__` so that the construction of `OrtBackend` becomes more flexible (especially for enabling dynamic shape or not).

### Description  Replace the offending bitwise `operator |` with if() logic for ARM.

### Description Fix some issues found in GPT-NeoX graph fusion: (1) GPT-NeoX uses float16 weights. The step of using onnxruntime with opt_level==1 uses CPU provider. Since most operators does not have fp16 in CPU EP, so extra Cast nodes are added to up cast to fp32. (2) Add is shared by two LayerNormalization children, and SkipLayerNormalization might cause invalid graph. (3) Reshape fusion might miss since some part only check initializer but not Constant. This PR adds a check whether model uses FP16, and output a warning when use_gpu is not True, and use GPU provider for graph optimization when use_gpu=True.

### Description - Fixes support for ArgMin/ArgMax to QNN CPU and HTP backends. - Adds Q/DQ node unit selection logic. - Handles casting int64 output to uint32 when necessary. - Adds unit tests for ArgMax/ArgMin. ### Motivation and Context QNN EP did not actually support ArgMin/ArgMax. Unit tests revealed that the existing translation was not sufficient to support these ops.

### Description This change upgrades a lot of dependencies. There are 2 motivations of doing this change: - fix the security issue reported by dependabot (protobufjs Prototype Pollution vulnerability - GHSA-h755-8qp9-cq85) - resolve the requirement of using ONNX IR_VERSION 9 (#16638) This requires: - upgrade protobufjs to v7.2.4 - upgrade library 'onnx-proto' to consume latest ONNX release (v1.14.0). Problems: - protobufjs v7.2.4 depends on long.js v5, which does not work well with typescript (commonjs). - onnx-proto depends on this fix with a new release of long.js - long.js is in maintenance and it takes longer than expected to put in new changes Solutions: - use a patch script in `preprepare` to copy type declarations to make long.js work with typescript (commonjs) - generate onnx protobuf JS/TS files and put them under js/web/lib/onnxjs/ort-schema/protobuf folder - remove 'onnx-proto' from dependency. - apply fixes to generated onnx.d.ts

Set WebNN EP minimum supported opset to 7 as ONNX Runtime currently only guarantees support for models stamped with opset 7 or above.

### Description This PR is includes changes in the documentation of _readmeOV.rst_ file and also the changes in the dockerfile which enables to build ORT with latest OpenVINO 2023.0.0 ### Motivation and Context Modified the dockerfile to incorporate the latest version of OpenVINO (2023.0.0) for building Onnxruntime. The changes in the PR aim to improve the overall user experience by providing accurate and up-to-date documentation while leveraging latest OpenVINO 2023.0.0

It gives up to 7.5% improvement in LLaMA 7B case.

edgchen1 and others added 14 commits July 17, 2023 08:24

Upgrade old Python version in packaging pipeline (#16667)

df8843c

- Upgrade from Python 3.6 to 3.8 in packaging pipeline. - Raise build.py minimum required Python version.

[ROCm] Generalize FastGeLU (#16623)

0cab7e1

Allow the whole pipeline to be parameterized with unary elementwise functor.

Work on eliminating Internal Compiler Error (#16741)

e752cbe

### Description  Replace the offending bitwise `operator |` with if() logic for ARM.

[WebNN EP] Only support opset >= 7 (#16730)

dcb0f2c

Set WebNN EP minimum supported opset to 7 as ONNX Runtime currently only guarantees support for models stamped with opset 7 or above.

Parallelize Max (#16745)

6e895fe

It gives up to 7.5% improvement in LLaMA 7B case.

mengniwang95 merged commit b76cc62 into mengniwang95:main Jul 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pull ORT main #2

pull ORT main #2

mengniwang95 commented Jul 20, 2023

pull ORT main #2

pull ORT main #2

Conversation

mengniwang95 commented Jul 20, 2023