-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FQ2I] Support converting dense
-> add
to qnn.dense
-> add
-> requantize
#13578
Conversation
Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.
Generated by tvm-bot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, a little code remark.
python/tvm/relay/frontend/onnx.py
Outdated
@@ -1391,7 +1391,7 @@ def _impl_v1(cls, inputs, attr, params): | |||
dtype = input0_state.checked_type.dtype | |||
# Y = alpha * A * B + beta * C | |||
alpha = float(attr.get("alpha", 1.0)) | |||
beta = float(attr.get("beta", 1.0)) | |||
beta = float(attr.get("beta")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would keep the original line of .get('beta', 1.0)
since you cannot call float()
on None
which attr.get
can return.
Then below on L1409, you can just do if beta is None
--> if 'beta' not in attr.keys()
or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though the change on L1409 might not be needed since if beta == 1, it can be removed with constant folding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Constant folding doesn't work when beta is multiplying an output of qnn ops, since we cannot fold over them. The model in #13545 has multiply(1f, dequantize(bias)
after dense
, which was also causing some issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved float(beta)
to the else
block of if beta is None
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the whole purpose of this change was to avoid multiplying by 1.0, since multiply(1f, dequantize(bias)
would be converted to qnn.mul(quantize(1), bias)
by FQ2I. So I restored the original code cc @Icemist
An alternative would be to add algebraic simplification to the SimpliyfyExpr
pass.
d411b86
to
da99fa5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks, though i will wait on @Icemist
bd6e2d2
to
9c8d26e
Compare
… `requantize` (apache#13578) * wip * hack to convert size-1 scale and zp tensors to scalar * fix to binary op fast path * check output zp * add assert * add comment * lint * clean up beta handling * use regular binary op only for 32 bit add (bias addition) * do float(beta) when we know that beta is not None * restore original beta handling code to avoid mul by 1 * add comment on overflow
… `requantize` (apache#13578) * wip * hack to convert size-1 scale and zp tensors to scalar * fix to binary op fast path * check output zp * add assert * add comment * lint * clean up beta handling * use regular binary op only for 32 bit add (bias addition) * do float(beta) when we know that beta is not None * restore original beta handling code to avoid mul by 1 * add comment on overflow
Closes #13545
The pattern of
dense -> add
, where the add is really bias addition, can appear often as the result of converting ONNXGemm
op:tvm/python/tvm/relay/frontend/onnx.py
Line 1409 in edfeba5
Currently, FQ2I tries to convert this
add
toqnn.add
. But if this add is being used for bias addition,out_t.scale
andout_t.zero_point
variables infake_quantization_to_integer.py
, which are used to initialize the output scale and zp of the QNN binary operators, can be tensors rather than scalars. QNN binary operators do not support such output qparams, which led to the error reported in #13545.For this reason, apparently we haven't supported converting
dense -> add
toqnn.dense -> add -> requantize
, whenadd
is a bias add, in FQ2I. The pattern ofdense -> nn.bias_add
can be converted toqnn.dense -> nn.bias_add -> requantize
, but we never usenn.bias_add
afterdense
.So I added a code path in the FQ2I QNN binary op converter, to identify such patterns and use regular binary ops rather than QNN ones.
cc @AndrewZhaoLuo @Icemist @elvin-n