Enable ClipQuantFusion exclusively on CPU EP #20627

yihonglyu · 2024-05-09T20:36:38Z

Motivation and Context

The Intel NPU does not support 16-bit int quantized operators. Consequently, the execution provider removes the QuantizeLinear/DeQuantizeLinear (Q/DQ) operators from node units and executes the operation as FP16 in the backend. However, if a Clip operator was fused into a Q operator in the node unit, the removal of Q/DQ operators results in inaccuracies because the effect of the original Clip operators is lost.

Consider the following example:

FP32 model: -> Op_FP32 -> Clip ->
QDQ model: -> (DQ-> Op_FP32 -> Q) -> (DQ' -> Clip -> Q') ->
After ClipQuantFusion: -> (DQ-> Op_FP32 -> Q) -> (DQ' -> Q') ->
Intel Execution Provider strips Q/DQ: -> Op_FP16 ->

To solve this issue, we have enabled ClipQuantFusion exclusively on the CPU execution provider.

onnxruntime/core/optimizer/graph_transformer_utils.cc

### Motivation and Context The Intel NPU does not support 16-bit int quantized operators. Consequently, the execution provider removes the QuantizeLinear/DeQuantizeLinear (Q/DQ) operators from node units and executes the operation as FP16 in the backend. However, if a Clip operator was fused into a Q operator in the node unit, the removal of Q/DQ operators results in inaccuracies because the effect of the original Clip operators is lost. Consider the following example: - FP32 model: -> Op_FP32 -> Clip -> - QDQ model: -> (DQ-> Op_FP32 -> Q) -> (DQ' -> Clip -> Q') -> - After ClipQuantFusion: -> (DQ-> Op_FP32 -> Q) -> (DQ' -> Q') -> - Intel Execution Provider strips Q/DQ: -> Op_FP16 -> To solve this issue, we have enabled ClipQuantFusion exclusively on the CPU execution provider.

### Description Moves the `Relu -> QuantizeLinear` fusion to Level2 optimizations for CPU EP only. ### Motivation and Context See the related PR for motivation and context: #20627

This reverts commit 49d197a.

Enable ClipQuantFusion on cpu only

a0cd15b

yihonglyu changed the title ~~Enable ClipQuantFusion on cpu only~~ Enable ClipQuantFusion exclusively on CPU execution provider May 9, 2024

yihonglyu changed the title ~~Enable ClipQuantFusion exclusively on CPU execution provider~~ Enable ClipQuantFusion exclusively on CPU EP May 9, 2024

yihonglyu marked this pull request as ready for review May 9, 2024 22:56

yihonglyu requested review from adrianlizarraga, yufenglee, skottmckay and jywu-msft May 9, 2024 22:57

yufenglee approved these changes May 10, 2024

View reviewed changes

edgchen1 reviewed May 10, 2024

View reviewed changes

onnxruntime/core/optimizer/graph_transformer_utils.cc Show resolved Hide resolved

yihonglyu merged commit 49d197a into main May 10, 2024
95 checks passed

yihonglyu deleted the yilyu/clip-quant-fusion-on-cpu-only branch May 10, 2024 23:07

adrianlizarraga mentioned this pull request Jul 16, 2024

Move ReluQuantFusion to Level2 for CPU EP only #21329

Merged

cloudhan added a commit that referenced this pull request Oct 24, 2024

Revert "Enable ClipQuantFusion exclusively on CPU EP (#20627)"

b137483

This reverts commit 49d197a.

cloudhan mentioned this pull request Oct 24, 2024

Move ClipQuantFusion and ReluQuantFusion back to level1 #22579

Closed

cloudhan added a commit that referenced this pull request Oct 28, 2024

Revert "Enable ClipQuantFusion exclusively on CPU EP (#20627)"

a322263

This reverts commit 49d197a.

cloudhan added a commit that referenced this pull request Oct 29, 2024

Revert "Enable ClipQuantFusion exclusively on CPU EP (#20627)"

486817d

This reverts commit 49d197a.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable ClipQuantFusion exclusively on CPU EP #20627

Enable ClipQuantFusion exclusively on CPU EP #20627

yihonglyu commented May 9, 2024 •

edited

Loading

Enable ClipQuantFusion exclusively on CPU EP #20627

Enable ClipQuantFusion exclusively on CPU EP #20627

Conversation

yihonglyu commented May 9, 2024 • edited Loading

Motivation and Context

yihonglyu commented May 9, 2024 •

edited

Loading