Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FQ2I] Support converting dense -> add to qnn.dense -> add -> requantize #13578

Merged
merged 12 commits into from
Dec 9, 2022

Conversation

masahi
Copy link
Member

@masahi masahi commented Dec 7, 2022

Closes #13545

The pattern of dense -> add, where the add is really bias addition, can appear often as the result of converting ONNX Gemm op:

out = out + _expr.const(beta, dtype=dtype) * inputs[2]

Currently, FQ2I tries to convert this add to qnn.add. But if this add is being used for bias addition, out_t.scale and out_t.zero_point variables in fake_quantization_to_integer.py, which are used to initialize the output scale and zp of the QNN binary operators, can be tensors rather than scalars. QNN binary operators do not support such output qparams, which led to the error reported in #13545.

For this reason, apparently we haven't supported converting dense -> add to qnn.dense -> add -> requantize, when add is a bias add, in FQ2I. The pattern of dense -> nn.bias_add can be converted to qnn.dense -> nn.bias_add -> requantize, but we never use nn.bias_add after dense.

So I added a code path in the FQ2I QNN binary op converter, to identify such patterns and use regular binary ops rather than QNN ones.

cc @AndrewZhaoLuo @Icemist @elvin-n

@tvm-bot
Copy link
Collaborator

tvm-bot commented Dec 7, 2022

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

  • No users to tag found in teams: fq2i See #10317 for details

Generated by tvm-bot

Copy link
Contributor

@Icemist Icemist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, a little code remark.

python/tvm/relay/frontend/onnx.py Show resolved Hide resolved
@@ -1391,7 +1391,7 @@ def _impl_v1(cls, inputs, attr, params):
dtype = input0_state.checked_type.dtype
# Y = alpha * A * B + beta * C
alpha = float(attr.get("alpha", 1.0))
beta = float(attr.get("beta", 1.0))
beta = float(attr.get("beta"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep the original line of .get('beta', 1.0) since you cannot call float() on None which attr.get can return.

Then below on L1409, you can just do if beta is None --> if 'beta' not in attr.keys() or something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though the change on L1409 might not be needed since if beta == 1, it can be removed with constant folding.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Constant folding doesn't work when beta is multiplying an output of qnn ops, since we cannot fold over them. The model in #13545 has multiply(1f, dequantize(bias) after dense, which was also causing some issues.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved float(beta) to the else block of if beta is None.

Copy link
Member Author

@masahi masahi Dec 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the whole purpose of this change was to avoid multiplying by 1.0, since multiply(1f, dequantize(bias) would be converted to qnn.mul(quantize(1), bias) by FQ2I. So I restored the original code cc @Icemist

An alternative would be to add algebraic simplification to the SimpliyfyExpr pass.

python/tvm/relay/transform/fake_quantization_to_integer.py Outdated Show resolved Hide resolved
@masahi masahi force-pushed the fq2i-dense-add-fix branch from d411b86 to da99fa5 Compare December 8, 2022 21:55
Copy link
Contributor

@AndrewZhaoLuo AndrewZhaoLuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks, though i will wait on @Icemist

@masahi masahi force-pushed the fq2i-dense-add-fix branch from bd6e2d2 to 9c8d26e Compare December 8, 2022 22:49
@masahi masahi merged commit 02820ad into apache:main Dec 9, 2022
fzi-peccia pushed a commit to fzi-peccia/tvm that referenced this pull request Mar 27, 2023
… `requantize` (apache#13578)

* wip

* hack to convert size-1 scale and zp tensors to scalar

* fix to binary op fast path

* check output zp

* add assert

* add comment

* lint

* clean up beta handling

* use regular binary op only for 32 bit add (bias addition)

* do float(beta) when we know that beta is not None

* restore original beta handling code to avoid mul by 1

* add comment on overflow
mikeseven pushed a commit to mikeseven/tvm that referenced this pull request Sep 27, 2023
… `requantize` (apache#13578)

* wip

* hack to convert size-1 scale and zp tensors to scalar

* fix to binary op fast path

* check output zp

* add assert

* add comment

* lint

* clean up beta handling

* use regular binary op only for 32 bit add (bias addition)

* do float(beta) when we know that beta is not None

* restore original beta handling code to avoid mul by 1

* add comment on overflow
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug][FQ2I] Failed to run FakeQuantizationToInteger on QDQ ONNX model
4 participants