Cherry-pick for 1.17.1 patch release #19477

YUNQIUGUO · 2024-02-08T23:35:55Z

Description

As title.

Motivation and Context

### Description There is a current bug in the BeamSearch implementation of T5, GPT, and Whisper due to an interaction between two PRs merged in the past 7 months. First PR/code change is the addition of BeamSearchScorer GPU implementation. This PR accelerates some operations by executing them in the GPU and not the CPU. The approach for this code change didn't utilize a cudaStream when copying one particular variable from GPU to CPU (see nullptr value here: [[link](https://github.com/microsoft/onnxruntime/blob/b65d3d0a5374daa3bc9272c2c02763a8428660db/onnxruntime/contrib_ops/cpu/transformers/beam_search_impl_t5.h#L213)]). The second PR/code change was the alteration to utilize a cudaStream to initialize various memory buffers in BeamSearch (see `stream` included as the last argument in these allocations [[link](https://github.com/microsoft/onnxruntime/blob/d1431e1b78fb81bf90fdc58c9118cb011171f387/onnxruntime/contrib_ops/cpu/transformers/beam_search_impl_base.h#L25)]). During the in-between period of these two PRs, I believe neither allocation utilized a stream and were thus synchronized. Once the latter PR was merged, the copy became desynchronized with the initialization due to different streams. The fix for this is to reintroduce the same stream into the copy operation added in the first PR. ### Motivation and Context This does not happen reliably on every hardware with every script due to the race condition nature, but the bug completely breaks ORT execution with a BeamSearch model. --------- Co-authored-by: Peter McAughan <[email protected]>

### Description - When converting ONNX split sizes to QNN split indices, do not include the split at index 0. QNN 2.19 assumes index 0 is implicit and throws a validation error if provided. - Fix bug when using an ONNX Split operator with a `num_outputs` attribute that does not evenly divide into `shape[axis]`. The ONNX spec states that the last chunk should be smaller, but QNN EP made the last chunk larger. - Fix bug when using an ONNX Split operator with a `split` input. QNN EP was incorrectly passing the split sizes as split indices without conversion. ### Motivation and Context QNN SDK 2.19 updated validation criteria for Split operators. QNN EP was previously passing a split index that should have been implicit. Also, discovered a bugs when using `num_outputs` attribute and `split` input.

### Description This change 55a6694 didn't take into account external data when unpacking initializer, and therefore crashes when trying to unpack them.

### Description Adds type/shape inferencing support for MSFT domain QuantizeLinear and DequantizeLinear operators to symbolic_shape_infer.py ### Motivation and Context Need a way to infer the types and shapes of Q/DQ ops in models that use the MSFT domain versions (e.g., int16 quantization).

### Description Updates qdq quantization to ensure the final model has the `com.microsoft` opset import if the model uses Q/DQ ops with the `com.microsoft` domain (e.g., for int16 quantization) ### Motivation and Context Need to ensure the MSFT domain is correctly set for all relevant cases. Otherwise, shape inferencing tools will raise an exception.

### Description Only set thread affinity on Server with auto affinity. Auto affinity = when API user does specify thread settings or affinity themselves. ### Motivation and Context On client best to let OS scheduler handle. On big (P-Core) / little (E-Core) CPU designs affinity overrides win32 Quality of Service (QoS) and has high power usage. Specifically on background workloads whose process is tagged QoS Utility (Background), this affinity setting overrides the OS scheduler that only wants to schedule on the E-Cores. Thus P-Cores waking up uses more energy than intended on client and users gets less battery life. Foreground AI workloads would be tagged QoS High and would run the ORT threads on all cores.

fs-eire · 2024-02-08T23:57:31Z

Could you please help to cherry-pick this one: #19458

YUNQIUGUO · 2024-02-08T23:59:48Z

Could you please help to cherry-pick this one: #19458

yes. this one was not merged just now so not being included. will cherry-pick it now.

### Description Since TypeScript v4.7, types need to specify inside "exports" field when it is available. This PR appends types just before each "default" (which is required by spec to be the last item). Fixes #19403.

ivberg · 2024-02-09T00:20:54Z

Mind grabbing #19397 ?

…er options (#19397) ### Description Add capturestate / rundown ETW support logging for session and provider options. ### Motivation and Context Follow-up to #16259 and #18882 This is very useful when you have longer running ONNX sessions which will be the case for a lot of AI workloads. That means ETW tracing may start minutes or hours after a process & session has been established. When a trace is captured, you would want to know the state of ONNX at that time. The state for ONNX is session and config options so that they show up in the trace. Tested with xperf and ORT xperf -start ort -on 3a26b1ff-7484-7484-7484-15261f42614d xperf -capturestate ort 3a26b1ff-7484-7484-7484-15261f42614d <--- Run this after session has been up for some time xperf -stop ort -d .\ort.etl <- Trace will now also have rundown events Also these will show if you use WPR [CaptureStateOnSave ](https://learn.microsoft.com/en-us/windows-hardware/test/wpt/capturestateonsave)

PatriceVignola · 2024-02-09T04:00:11Z

@YUNQIUGUO When is the deadline for 1.17.1 changes? I have a blocking regression that I may have a fix for by tomorrow.

YUNQIUGUO · 2024-02-09T06:54:52Z

@YUNQIUGUO When is the deadline for 1.17.1 changes? I have a blocking regression that I may have a fix for by tomorrow.

tomorrow by EOD should be fine.

YUNQIUGUO · 2024-02-10T00:17:10Z

@PatriceVignola it's approaching the EOD. Just checking in - Is the pr ready and merged yet?

If not,I will go ahead and merge this cherry-pick pr for now. and maybe open a second round cherry-pick if necessary. I think we are waiting on another two prs too.

BTW, please share the pr link and tag with release 1.17.1 once it's ready. thanks!

PatriceVignola · 2024-02-10T00:19:43Z

@PatriceVignola it's approaching the EOD. Just checking in - Is the pr ready and merged yet?

If not,I will go ahead and merge this cherry-pick pr for now. and maybe open a second round cherry-pick if necessary. I think we are waiting on another two prs too.

BTW, please share the pr link and tag with release 1.17.1 once it's ready. thanks!

The PR is here: #19481

It was sporadically failing pipelines unrelated to the change and I had to restart them.

PatriceVignola · 2024-02-10T08:35:45Z

@YUNQIUGUO The PR has been merged into main and can be cherry-picked.

There's currently a bug in the allocation planner when reusing buffers and more than one streams are used that make it possible (although rarely) to reach a reference count of 0 for a buffer that is still being used. Since DML doesn't benefit from multiple streams, disabling it is the safest option for now. This is a high priority issue that we need to fix for 1.17.1 since it breaks stable diffusion. Identifying the perfect fix and fixing the underlying issue would be too risky for a patch release, especially given the limited time that we have. #19480

As per title, fixes #19418 ONNX Runtime 1.17 broke the quantization of ONNX models with subgraphs where initializers are placed on the top-level graph, while different subgraphs use the same initializer.

smk2007 · 2024-02-14T18:31:57Z

Please add #19483

### Description Limit SoC core detection via 2 level cache core logic to Intel and Hybrid processors. ### Motivation and Context The following code was added to add support for a new class of CPU cores present in Intel’s next generation Intel Core Ultra mobile processors. This code is essential to avoid placing threads on low performing SoC cores that don’t have L3 cache. SoC cores are meant to specialize in system bringup and help improve responsiveness and power usage, in other words they are not meant to run compute heavy AI workloads. In order to avoid broad exposure of this logic, it is currently designed to be restricted to Intel platforms that have hybrid enabled. --------- Co-authored-by: Sheil Kumar <[email protected]>

…19516)

### Description  Add ATen fallback support for bicubic interpolation algorithm. ### Motivation and Context  Required for facebook/dinov2 model architecture as part of ONNX Runtime integration with AML Vision models.

smk2007 · 2024-02-16T13:36:30Z

Please add #19475. TY

**Description** 1) During SessionInitialization, KahnsTopologicalSort is a major cause of perf degradation. The main cause of slow down is that the TopologicalSort needs to keep track of nodes to visit in order, and reorder them based on priority (as informed by a comparator). The existing implementation uses a priority_queue that is backed by a std::vector container. However, vectors are not good for insertion and reordering. The appropriate data type for this operation is a linked list. However, linked lists like std::list are not usable as a container for std::priority_queue. This is because std::priority_queue requires random access, which linked lists do not have. However, for this simple implementation, we can leverage a std::list under the hood and perform insertions manually using std::upper_bound. This drastically reduces the time taken by the method, which currently instead causes numerous recopies and a lot of movement inside the graph nodes to visit list. 2) In the comparator, I hide forward and backward attribute checking behind the #ifdef ENABLE_TRAINING macro, as I believe it should only be valid in the training scenario. 3) In noopelimination transformer, I prevent the creation of Initializer (which unpacks tensorproto data) in every node and only create initializers when Add/Sub/Mul/Div op nodes are detected. **Motivation and Context** Session creation time of many models is quite slow. --------- Co-authored-by: Sheil Kumar <[email protected]>

Build from source and run the command below Example, converting whisper-base ` python -m onnxruntime.transformers.models.whisper.convert_to_onnx -m openai/whisper-base --model_impl openai -e -o -w --chain_model --output ./demo`

This PR updates exporting and running the Whisper model with beam search by adding the following. - Adds temperature as a graph input to the exported model - Fixes the token ids by adding them as attributes to `WhisperBeamSearch` - Fixes the timestamps test cases so they pass now - Fixes a bug with invoking `torch.onnx.export` - Cleans up the Whisper scripts and groups the arguments in `convert_to_onnx.py` - Adds a `requirements.txt` file to specify package dependencies - Adds `whisper-large-v3` to list of pretrained models - Fixes a bug with missing cross-attention KV cache inputs in the decoder subgraph - This is a follow-up to [this PR](#19188). - The incorrect token ids in the timestamps processor were first noticed during [this PR review](#17500 (comment)). When they were originally added in [this PR](#15853), the offsets were previously constant across the Whisper model sizes. When comparing the new `whisper-large-v3` variant, the English-only variants (e.g. `whisper-tiny.en`), and the original variants (e.g. `whisper-tiny`), both the values and the offsets differ. Therefore, it is easier to set the token ids as attributes to `WhisperBeamSearch` when exporting to ensure the right values are used in the timestamps processor. - The Hugging Face API for returning timestamps and the expected outputs from the PyTorch model have both changed. - The fix for `torch.onnx.export` is a follow-up to [this PR review](#17179 (comment)). - The argument grouping is a follow-up to [this PR review](#17500 (comment)). - Specific package versions are needed to run the Whisper scripts and the `requirements.txt` file ensures that these versions are installed. - The `whisper-large-v3` variant is released and should be in the list of official pretrained models. - After the changes from [this PR](#17316), the exported model is not loading in an ORT inference session because the cross-attention KV cache inputs are missing in the decoder subgraph.

YUNQIUGUO · 2024-02-17T07:12:59Z

so far the cherry-picks for the patch should all be ready in there. had to resolve some merge conflicts when merging #19509 please review it's still correct with what's included in this pr. thanks! cc @kunal-vaishnavi (specifically - two files whisper_chain.py and whisper_helper.py)

fs-eire · 2024-02-20T02:24:09Z

Seems #19274 is marked for 1.17.1 but still waiting for approval.

This pull request includes modifications to the `c-api-cpu.yml` Azure Pipelines configuration file. The changes mainly revolve around the Node.js packaging stage and the handling of Node.js artifacts. The most significant changes include renaming the Node.js packaging stage, adding a new dependency to the stage, changing artifact names, adding a new script to list Node.js artifacts, and updating the source folder for copying NuGet binaries. Changes in Node.js packaging: * [`tools/ci_build/github/azure-pipelines/templates/c-api-cpu.yml`](diffhunk://#diff-00815920cc190d10fdebceac0c3a4b8a59e408684ae38177dfe7f96cae276c59L503-R508): Renamed the Node.js packaging stage from `Nodejs_Packaging_CPU` to `Nodejs_Packaging` and added `Windows_CI_GPU_DML_Dev` as a new dependency to the stage. Changes in handling of Node.js artifacts: * [`tools/ci_build/github/azure-pipelines/templates/c-api-cpu.yml`](diffhunk://#diff-00815920cc190d10fdebceac0c3a4b8a59e408684ae38177dfe7f96cae276c59L568-R569): Changed the artifact name from `drop-onnxruntime-nodejs-win-x64` to `drop-onnxruntime-nodejs-win-x64-dml` in the task to download pipeline artifacts for Windows x64. * [`tools/ci_build/github/azure-pipelines/templates/c-api-cpu.yml`](diffhunk://#diff-00815920cc190d10fdebceac0c3a4b8a59e408684ae38177dfe7f96cae276c59R595-R598): Added a new script to list Node.js artifacts from the directory `$(Build.BinariesDirectory)/nodejs-artifacts/win32/x64/`. * [`tools/ci_build/github/azure-pipelines/templates/c-api-cpu.yml`](diffhunk://#diff-00815920cc190d10fdebceac0c3a4b8a59e408684ae38177dfe7f96cae276c59L635-R640): Updated the source folder from `$(Build.BinariesDirectory)\RelWithDebInfo\RelWithDebInfo\nuget-artifacts\onnxruntime-win-x64\lib` to `$(Build.BinariesDirectory)\nodejs-artifacts\win32\x64` in the task to copy NuGet binaries to the directory `$(Build.SourcesDirectory)\js\node\bin\napi-v3\win32\x64`. --------- Co-authored-by: Yulong Wang <[email protected]>

YUNQIUGUO · 2024-02-20T05:33:31Z

Seems #19274 is marked for 1.17.1 but still waiting for approval.

thanks for mentioning this. added the commit.

…antize (#19455) ### Description The current quantization tool relies on shape inference to provide the type of every intermediate tensor, then the tool knows which type it must dequantize into (float32, float16). However, this information is not available if shape inference fails. That happens every time the model include an operator from a custom domain such as com.microsoft. This PR introduces an extra option `DefaultTensorType` as a fall back when the quantizer cannot find the type it needs. ### Motivation and Context This fixes issue #19409.

smk2007 · 2024-02-20T21:15:21Z

Please add #19574, TY!

…19574) Disable __cpuid check on arm64 builds as intrinsic is not available Motivation Breaking the arm64 build. Co-authored-by: Sheil Kumar <[email protected]>

…#19577) ### Description  Accept the command line option --symmetric and its optional value correctly. If the optional value matches uncased to 'True' then set symmetric to True else set symmetric to False. Asymmetric quantization will generate zero_point input. ``` usage: matmul_4bits_quantizer.py [-h] --input_model INPUT_MODEL --output_model OUTPUT_MODEL [--block_size BLOCK_SIZE] [--symmetric [{True,False}]] [--accuracy_level ACCURACY_LEVEL] [-v] [--nodes_to_exclude NODES_TO_EXCLUDE [NODES_TO_EXCLUDE ...]] ``` ### Motivation and Context

petermcaughan and others added 6 commits February 8, 2024 15:28

[DML EP] Fix external data unpacking (#19415)

ad63507

### Description This change 55a6694 didn't take into account external data when unpacking initializer, and therefore crashes when trying to unpack them.

YUNQIUGUO requested review from adrianlizarraga, petermcaughan, ivberg and PatriceVignola February 8, 2024 23:36

YUNQIUGUO added the release:1.17.1 label Feb 8, 2024

PatriceVignola previously approved these changes Feb 8, 2024

View reviewed changes

[js/web] fix types exports in package.json (#19458)

098ef2c

### Description Since TypeScript v4.7, types need to specify inside "exports" field when it is available. This PR appends types just before each "default" (which is required by spec to be the last item). Fixes #19403.

YUNQIUGUO dismissed PatriceVignola’s stale review via 098ef2c February 9, 2024 00:01

YUNQIUGUO requested a review from fs-eire February 9, 2024 00:02

ivberg previously approved these changes Feb 9, 2024

View reviewed changes

adrianlizarraga previously approved these changes Feb 9, 2024

View reviewed changes

petermcaughan previously approved these changes Feb 9, 2024

View reviewed changes

snnn previously approved these changes Feb 9, 2024

View reviewed changes

YUNQIUGUO dismissed snnn’s stale review via e02b783 February 11, 2024 21:48

Fix subgraph quantization regression in onnxruntime 1.17 (#19421)

61730bd

As per title, fixes #19418 ONNX Runtime 1.17 broke the quantization of ONNX models with subgraphs where initializers are placed on the top-level graph, while different subgraphs use the same initializer.

smk2007 and others added 3 commits February 14, 2024 10:47

Update the default std flag used during torch extensions compilation (#…

ad02db8

…19516)

smk2007 and others added 3 commits February 16, 2024 10:06

Support ONNX export of OpenAi Whisper model (#17316)

ad86d13

Build from source and run the command below Example, converting whisper-base ` python -m onnxruntime.transformers.models.whisper.convert_to_onnx -m openai/whisper-base --model_impl openai -e -o -w --chain_model --output ./demo`

YUNQIUGUO requested review from kunal-vaishnavi, smk2007 and snnn February 17, 2024 07:09

kunal-vaishnavi previously approved these changes Feb 17, 2024

View reviewed changes

YUNQIUGUO dismissed kunal-vaishnavi’s stale review via e79a06b February 20, 2024 05:33

Disable __cpuid check on arm64 builds as intrinsic is not available (#…

1aa73b2

…19574) Disable __cpuid check on arm64 builds as intrinsic is not available Motivation Breaking the arm64 build. Co-authored-by: Sheil Kumar <[email protected]>

YUNQIUGUO requested review from fs-eire and removed request for fs-eire February 21, 2024 03:17

YUNQIUGUO requested a review from yufenglee February 21, 2024 07:56

jchen351 requested a review from baijumeswani February 21, 2024 20:49

jchen351 approved these changes Feb 21, 2024

View reviewed changes

yufenglee approved these changes Feb 21, 2024

View reviewed changes

YUNQIUGUO merged commit 75968b9 into rel-1.17.1 Feb 21, 2024
107 of 172 checks passed

YUNQIUGUO deleted the yguo/cherry-pick-1.17.1 branch February 21, 2024 20:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cherry-pick for 1.17.1 patch release #19477

Cherry-pick for 1.17.1 patch release #19477

YUNQIUGUO commented Feb 8, 2024

fs-eire commented Feb 8, 2024

YUNQIUGUO commented Feb 8, 2024

ivberg commented Feb 9, 2024

PatriceVignola commented Feb 9, 2024

YUNQIUGUO commented Feb 9, 2024 •

edited

Loading

YUNQIUGUO commented Feb 10, 2024 •

edited

Loading

PatriceVignola commented Feb 10, 2024

PatriceVignola commented Feb 10, 2024

smk2007 commented Feb 14, 2024

smk2007 commented Feb 16, 2024

YUNQIUGUO commented Feb 17, 2024

fs-eire commented Feb 20, 2024

YUNQIUGUO commented Feb 20, 2024

smk2007 commented Feb 20, 2024

Cherry-pick for 1.17.1 patch release #19477

Cherry-pick for 1.17.1 patch release #19477

Conversation

YUNQIUGUO commented Feb 8, 2024

Description

Motivation and Context

fs-eire commented Feb 8, 2024

YUNQIUGUO commented Feb 8, 2024

ivberg commented Feb 9, 2024

PatriceVignola commented Feb 9, 2024

YUNQIUGUO commented Feb 9, 2024 • edited Loading

YUNQIUGUO commented Feb 10, 2024 • edited Loading

PatriceVignola commented Feb 10, 2024

PatriceVignola commented Feb 10, 2024

smk2007 commented Feb 14, 2024

smk2007 commented Feb 16, 2024

YUNQIUGUO commented Feb 17, 2024

fs-eire commented Feb 20, 2024

YUNQIUGUO commented Feb 20, 2024

smk2007 commented Feb 20, 2024

YUNQIUGUO commented Feb 9, 2024 •

edited

Loading

YUNQIUGUO commented Feb 10, 2024 •

edited

Loading