[FEATURE] Fuse dequantize with convolution #20816

DominikaJedynak · 2022-01-11T21:50:19Z

Description

This PR adds the possibility to fuse dequantize node with convolution node, what in practice enables us to avoid unnecessary multiplying and then dividing all entries of a convolution by the same scaling factor.

Speedup on various data sizes:

Measured on instance c6i.12xlarge (Intel Xeon Platinum 8375C), ami-04505e74c0741db8d (Canonical, Ubuntu, 20.04 LTS)
Script:

import mxnet as mx
from mxnet.contrib import quantization
from mxnet.gluon import nn
import gc
import time

batch_size = [1, 3, 8, 32, 64]
channels = [1, 3, 16, 64]
picture_size = [32, 64, 128, 248, 512, 1024]
DATA_SHAPE=[(n, c, s, s) for n in batch_size for c in channels for s in picture_size]

rounds = 1000
warmup = 100

def print_header(header):
    print( "\n---- ", header, " ----")
    print("    Shape    | Time [s] | Mean [ms]" )

def print_value(shape, total, mean):
    print("({:4}, {:4}, {:4}, {:4}) | {:8.3f} | {:8.3f} ".format( shape[0], shape[1], shape[2], shape[3], total, mean))

def measure(net, data, shape):
    mx.nd.waitall()
    gc.collect()
    gc.disable()
    tic = 0
    for i in range(rounds + warmup):
        if i == warmup:
            start_time = time.time() 
        o = net(data)
        o.wait_to_read()
    end_time = time.time()
    run_time = (end_time - start_time)
    print_value(shape, run_time, 1000 * run_time / rounds)
    gc.enable()
    gc.collect()

class Conv(nn.HybridBlock):
    def __init__(self, **kwargs):
        super(Conv, self).__init__(**kwargs)
        self.conv0 = nn.Conv2D(channels=4, kernel_size=(3, 3), strides=1, use_bias=False)

    def forward(self, x):
        out = self.conv0(x)
        return out

def benchmark():
    for data_shape in DATA_SHAPE:
        net = Conv()
        net.initialize()
        net.hybridize(static_alloc=True, static_shape=True)
        x = mx.np.random.uniform(size=data_shape, low=-1.0, high=1.0)

        data = mx.gluon.data.ArrayDataset(x)
        calib_data = mx.gluon.data.DataLoader(data, batch_size=1)
        net = quantization.quantize_net(net,
                                    ctx=mx.current_context(),
                                    calib_mode='naive',
                                    calib_data=calib_data,
                                    num_calib_batches=1,
                                    )
        measure(net, x, data_shape)

benchmark()

mxnet-bot · 2022-01-11T21:50:23Z

Hey @DominikaJedynak , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

To trigger all jobs: @mxnet-bot run ci [all]
To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [centos-cpu, unix-cpu, clang, miscellaneous, website, windows-cpu, windows-gpu, sanity, unix-gpu, centos-gpu, edge]

Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

DominikaJedynak · 2022-01-12T09:40:21Z

@mxnet-bot run ci [centos-gpu, windows-gpu]

mxnet-bot · 2022-01-12T09:40:28Z

Jenkins CI successfully triggered : [windows-gpu, centos-gpu]

bgawrych · 2022-01-26T06:59:44Z

@mxnet-bot run ci [all]

mxnet-bot · 2022-01-26T06:59:51Z

Jenkins CI successfully triggered : [unix-cpu, windows-cpu, windows-gpu, centos-cpu, website, centos-gpu, unix-gpu, sanity, clang, edge, miscellaneous]

bartekkuncer · 2022-01-27T13:47:14Z

src/operator/subgraph/dnnl/dnnl_post_quantize_property.h

@@ -209,7 +209,7 @@ class SgDNNLPostQuantizeProperty : public SubgraphProperty {

    // When only fused quantized operator and requantize, set min/max_cablib_range,
    // When fused quantized operator + requantize + dequantize, set dequantize flag to true.
-    if (dequantize_node != nullptr) {
+    if ((dequantize_node != nullptr && (no_enable_float_output.count(fuse_node->op()) == 0))) {


Doubled brackets.

Suggested change

if ((dequantize_node != nullptr && (no_enable_float_output.count(fuse_node->op()) == 0))) {

if (dequantize_node != nullptr && (no_enable_float_output.count(fuse_node->op()) == 0)) {

bartekkuncer · 2022-01-27T13:48:08Z

tests/python/dnnl/subgraphs/test_conv_subgraph.py

+@mx.util.use_np
+@pytest.mark.parametrize('data_shape', DATA_SHAPE)
+@pytest.mark.parametrize('no_bias', [True, False])
+@pytest.mark.parametrize('out_type', ['int8', 'auto'])


Why not uint8?

Following settings for other tests in this file, I do not test it as it is scenario which is not used.

bartekkuncer · 2022-01-27T13:48:38Z

tests/python/dnnl/subgraphs/test_conv_subgraph.py

+@mx.util.use_np
+@pytest.mark.parametrize('data_shape', DATA_SHAPE)
+@pytest.mark.parametrize('no_bias', [True, False])
+@pytest.mark.parametrize('out_type', ['int8', 'auto'])


Same as above.

anko-intel · 2022-01-28T10:07:23Z

src/operator/subgraph/dnnl/dnnl_conv.cc

-      2 + (conv_param.no_bias ? 0 : 1) + (dnnl_param.with_bn ? 4 : 0) +
-      (dnnl_param.with_sum ? 1 : 0) +
-      (dnnl_param.quantized ? 2 + (full_conv_param.dnnl_param.with_sum ? 2 : 0) : 0);
+  size_t input_size     = 2 + (conv_param.no_bias ? 0 : 1) + (dnnl_param.with_bn ? 4 : 0) +


I think we can skip this calculation and use only calculated from idx below. If we wish to double check it, it will be enough to do it in the assert. so we can replace CHECK_EQ in line 167 with assert with calculation from here.

src/operator/subgraph/dnnl/dnnl_conv.cc

anko-intel · 2022-01-28T10:48:59Z

src/operator/subgraph/dnnl/dnnl_conv.cc

+  if (param.full_conv_param.dnnl_param.quantized) {
+    if (param.full_conv_param.dnnl_param.enable_float_output)
+      return std::vector<std::string>{"output"};
+    else
+      return std::vector<std::string>{"output", "output_min", "output_max"};
+  } else {
    return std::vector<std::string>{"output"};
+  }


It could be simplfied:

Suggested change

if (param.full_conv_param.dnnl_param.quantized) {

if (param.full_conv_param.dnnl_param.enable_float_output)

return std::vector<std::string>{"output"};

else

return std::vector<std::string>{"output", "output_min", "output_max"};

} else {

return std::vector<std::string>{"output"};

}

if (param.full_conv_param.dnnl_param.quantized &&

!param.full_conv_param.dnnl_param.enable_float_output ) {

return std::vector<std::string>{"output", "output_min", "output_max"};

} else {

return std::vector<std::string>{"output"};

}

anko-intel · 2022-01-28T11:25:16Z

tests/python/dnnl/subgraphs/test_conv_subgraph.py

+  net = ConvAdd(use_bias=True)
+  check_quantize(net, data_shape, out_type)
+
+


What about test with convolution, activation and sum ?

They were already there ConvActAdd, ConvBNSumAct

szha · 2022-02-05T16:54:51Z

@DominikaJedynak could you resolve conflict?

bgawrych · 2022-02-11T07:53:13Z

@mxnet-bot run ci [unix-cpu, centos-gpu]

mxnet-bot · 2022-02-11T07:53:18Z

Jenkins CI successfully triggered : [unix-cpu, centos-gpu]

DominikaJedynak · 2022-02-11T11:23:02Z

@mxnet-bot run ci [centos-gpu]

mxnet-bot · 2022-02-11T11:23:08Z

Jenkins CI successfully triggered : [centos-gpu]

DominikaJedynak · 2022-02-11T14:40:39Z

@mxnet-bot run ci [centos-gpu]

mxnet-bot · 2022-02-11T14:40:43Z

Jenkins CI successfully triggered : [centos-gpu]

mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress labels Jan 11, 2022

mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-awaiting-review PR is waiting for code review and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Jan 12, 2022

bgawrych approved these changes Jan 18, 2022

View reviewed changes

mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-awaiting-merge Review and CI is complete. Ready to Merge and removed pr-awaiting-review PR is waiting for code review pr-awaiting-testing PR is reviewed and waiting CI build and test labels Jan 26, 2022

bartekkuncer suggested changes Jan 27, 2022

View reviewed changes

anko-intel reviewed Jan 28, 2022

View reviewed changes

DominikaJedynak force-pushed the fuse_dequantize_convolution branch from 6cb7574 to 07c80f4 Compare February 8, 2022 19:52

mseth10 removed the pr-awaiting-merge Review and CI is complete. Ready to Merge label Feb 8, 2022

DominikaJedynak added 6 commits February 10, 2022 18:39

Sum post-op fix and tests

a3f839b

Review change

c2f81e0

Sanity fix

d3318de

Sanity fix

674b08d

Review suggestions

894dc53

Resolving conflicts

f0c1a0f

DominikaJedynak force-pushed the fuse_dequantize_convolution branch from 07c80f4 to f0c1a0f Compare February 10, 2022 17:41

mseth10 added pr-work-in-progress PR is still work in progress and removed pr-awaiting-testing PR is reviewed and waiting CI build and test labels Feb 10, 2022

mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Feb 11, 2022

mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-awaiting-review PR is waiting for code review and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Feb 11, 2022

bartekkuncer approved these changes Feb 14, 2022

View reviewed changes

bgawrych approved these changes Feb 14, 2022

View reviewed changes

bgawrych merged commit f4c4952 into apache:master Feb 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Fuse dequantize with convolution #20816

[FEATURE] Fuse dequantize with convolution #20816

DominikaJedynak commented Jan 11, 2022

mxnet-bot commented Jan 11, 2022

DominikaJedynak commented Jan 12, 2022

mxnet-bot commented Jan 12, 2022

bgawrych commented Jan 26, 2022

mxnet-bot commented Jan 26, 2022

bartekkuncer Jan 27, 2022

bartekkuncer Jan 27, 2022

DominikaJedynak Feb 2, 2022

bartekkuncer Jan 27, 2022

anko-intel Jan 28, 2022

anko-intel Jan 28, 2022

anko-intel Jan 28, 2022 •

edited

Loading

DominikaJedynak Jan 31, 2022

szha commented Feb 5, 2022

bgawrych commented Feb 11, 2022

mxnet-bot commented Feb 11, 2022

DominikaJedynak commented Feb 11, 2022

mxnet-bot commented Feb 11, 2022

DominikaJedynak commented Feb 11, 2022

mxnet-bot commented Feb 11, 2022

	if ((dequantize_node != nullptr && (no_enable_float_output.count(fuse_node->op()) == 0))) {
	if (dequantize_node != nullptr && (no_enable_float_output.count(fuse_node->op()) == 0)) {

		net = ConvAdd(use_bias=True)
		check_quantize(net, data_shape, out_type)

[FEATURE] Fuse dequantize with convolution #20816

[FEATURE] Fuse dequantize with convolution #20816

Conversation

DominikaJedynak commented Jan 11, 2022

Description

mxnet-bot commented Jan 11, 2022

DominikaJedynak commented Jan 12, 2022

mxnet-bot commented Jan 12, 2022

bgawrych commented Jan 26, 2022

mxnet-bot commented Jan 26, 2022

bartekkuncer Jan 27, 2022

Choose a reason for hiding this comment

bartekkuncer Jan 27, 2022

Choose a reason for hiding this comment

DominikaJedynak Feb 2, 2022

Choose a reason for hiding this comment

bartekkuncer Jan 27, 2022

Choose a reason for hiding this comment

anko-intel Jan 28, 2022

Choose a reason for hiding this comment

anko-intel Jan 28, 2022

Choose a reason for hiding this comment

anko-intel Jan 28, 2022 • edited Loading

Choose a reason for hiding this comment

DominikaJedynak Jan 31, 2022

Choose a reason for hiding this comment

szha commented Feb 5, 2022

bgawrych commented Feb 11, 2022

mxnet-bot commented Feb 11, 2022

DominikaJedynak commented Feb 11, 2022

mxnet-bot commented Feb 11, 2022

DominikaJedynak commented Feb 11, 2022

mxnet-bot commented Feb 11, 2022

anko-intel Jan 28, 2022 •

edited

Loading