-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[FEATURE] Fuse dequantize with convolution #20816
[FEATURE] Fuse dequantize with convolution #20816
Conversation
Hey @DominikaJedynak , Thanks for submitting the PR
CI supported jobs: [centos-cpu, unix-cpu, clang, miscellaneous, website, windows-cpu, windows-gpu, sanity, unix-gpu, centos-gpu, edge] Note: |
@mxnet-bot run ci [centos-gpu, windows-gpu] |
Jenkins CI successfully triggered : [windows-gpu, centos-gpu] |
@mxnet-bot run ci [all] |
Jenkins CI successfully triggered : [unix-cpu, windows-cpu, windows-gpu, centos-cpu, website, centos-gpu, unix-gpu, sanity, clang, edge, miscellaneous] |
@@ -209,7 +209,7 @@ class SgDNNLPostQuantizeProperty : public SubgraphProperty { | |||
|
|||
// When only fused quantized operator and requantize, set min/max_cablib_range, | |||
// When fused quantized operator + requantize + dequantize, set dequantize flag to true. | |||
if (dequantize_node != nullptr) { | |||
if ((dequantize_node != nullptr && (no_enable_float_output.count(fuse_node->op()) == 0))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doubled brackets.
if ((dequantize_node != nullptr && (no_enable_float_output.count(fuse_node->op()) == 0))) { | |
if (dequantize_node != nullptr && (no_enable_float_output.count(fuse_node->op()) == 0)) { |
@mx.util.use_np | ||
@pytest.mark.parametrize('data_shape', DATA_SHAPE) | ||
@pytest.mark.parametrize('no_bias', [True, False]) | ||
@pytest.mark.parametrize('out_type', ['int8', 'auto']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not uint8?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following settings for other tests in this file, I do not test it as it is scenario which is not used.
@mx.util.use_np | ||
@pytest.mark.parametrize('data_shape', DATA_SHAPE) | ||
@pytest.mark.parametrize('no_bias', [True, False]) | ||
@pytest.mark.parametrize('out_type', ['int8', 'auto']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
2 + (conv_param.no_bias ? 0 : 1) + (dnnl_param.with_bn ? 4 : 0) + | ||
(dnnl_param.with_sum ? 1 : 0) + | ||
(dnnl_param.quantized ? 2 + (full_conv_param.dnnl_param.with_sum ? 2 : 0) : 0); | ||
size_t input_size = 2 + (conv_param.no_bias ? 0 : 1) + (dnnl_param.with_bn ? 4 : 0) + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can skip this calculation and use only calculated from idx below. If we wish to double check it, it will be enough to do it in the assert. so we can replace CHECK_EQ in line 167 with assert with calculation from here.
if (param.full_conv_param.dnnl_param.quantized) { | ||
if (param.full_conv_param.dnnl_param.enable_float_output) | ||
return std::vector<std::string>{"output"}; | ||
else | ||
return std::vector<std::string>{"output", "output_min", "output_max"}; | ||
} else { | ||
return std::vector<std::string>{"output"}; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be simplfied:
if (param.full_conv_param.dnnl_param.quantized) { | |
if (param.full_conv_param.dnnl_param.enable_float_output) | |
return std::vector<std::string>{"output"}; | |
else | |
return std::vector<std::string>{"output", "output_min", "output_max"}; | |
} else { | |
return std::vector<std::string>{"output"}; | |
} | |
if (param.full_conv_param.dnnl_param.quantized && | |
!param.full_conv_param.dnnl_param.enable_float_output ) { | |
return std::vector<std::string>{"output", "output_min", "output_max"}; | |
} else { | |
return std::vector<std::string>{"output"}; | |
} |
net = ConvAdd(use_bias=True) | ||
check_quantize(net, data_shape, out_type) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about test with convolution, activation and sum ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They were already there ConvActAdd, ConvBNSumAct
@DominikaJedynak could you resolve conflict? |
6cb7574
to
07c80f4
Compare
07c80f4
to
f0c1a0f
Compare
@mxnet-bot run ci [unix-cpu, centos-gpu] |
Jenkins CI successfully triggered : [unix-cpu, centos-gpu] |
@mxnet-bot run ci [centos-gpu] |
Jenkins CI successfully triggered : [centos-gpu] |
@mxnet-bot run ci [centos-gpu] |
Jenkins CI successfully triggered : [centos-gpu] |
Description
This PR adds the possibility to fuse dequantize node with convolution node, what in practice enables us to avoid unnecessary multiplying and then dividing all entries of a convolution by the same scaling factor.
Speedup on various data sizes:
Measured on instance c6i.12xlarge (Intel Xeon Platinum 8375C), ami-04505e74c0741db8d (Canonical, Ubuntu, 20.04 LTS)
Script: