[Quantization] Load quantized resnet50 model #2016

beicy · 2018-11-13T23:15:00Z

Description:
This PR finished the rest work of loading Caffe2 quantized Resnet50 model discussed in #1727.
The model can be found here : https://github.com/caffe2/models/tree/master/resnet50_quantized
For testing in Glow side, the model can be downloaded by running utils/download_caffe2_models.sh.

Testing:
added quantized ResNet50 to Glow's test suite (run.sh), tested for interpreter and cpu backend.

 File: tests/images/imagenet/cat_285.png	Label-K1: 281 (probability: 0.7541)
 File: tests/images/imagenet/dog_207.png	Label-K1: 207 (probability: 0.9583)
 File: tests/images/imagenet/zebra_340.png	Label-K1: 340 (probability: 0.9952)

Documentation:
[Optional Fixes #1762 #1727]

Please see a detailed explanation of how to fill out the fields in the relevant sections in PULL_REQUEST.md.

lib/Graph/Graph.cpp

+                                     llvm::ArrayRef<unsigned_t> strides,
+                                     llvm::ArrayRef<unsigned_t> pads) {
+  ShapeNHWC idim = ShapeNHWC(input.dims());
+  (void)idim;


lib/Graph/VerifierHelper.cpp

+  bool ret = expectCompareTrue("Mismatching isQuantized", A->isQuantizedType(),
+                               B->isQuantizedType(), parent);
+  if (!ret) {
+    printf("now quantization has issue\n");


rdzhabarov

Nice!!

MaxPool implementation relies on the fact that Scale of input and Scale of output are the same.
You'd need to fix CPU and Interpreter implementation. Also, check special cases for Max/Avg Pool in the quantization procedure, we'd not need post processing phase after you add proper handling.

lib/Importer/Caffe2ModelLoader.cpp

+/// For Glow: -127 <= orig_fp32/scale_1 + offset_1 < 128
+/// For Caffe2: 0 <= orig_fp32/scale_2 + offset_2 < 255
+/// Therefore, we can make scale_1 == scale_2, and offset_1 = offset2 - 128
+#define OFFSETSHIFT 128


lib/Importer/Caffe2ModelLoader.cpp

@@ -159,34 +169,18 @@ void Caffe2ModelLoader::loadOperator(const caffe2::OperatorDef &op) {
    Tensor *w = getTensorByName(op.input(1));

    // Transpose the weights to the right format. Glow expects to read the
-    // weights in the format CRSK. Caffe2 stores the operators as KCRS.
-    // C - output_depth, R - filter_height, S - filter_width, K - input_depth.
+    // weights in the format NHWC.


lib/Importer/Caffe2ModelLoader.cpp

-    w->transpose(&wtag, NCHW2NHWC);
+    if (typeName != "Conv" && order == "NHWC")
+      wtag.assign(w);
+    else


lib/Importer/Caffe2ModelLoader.cpp


    // The structure of the conv weigts is: NHWC. We take the C, which is the
    // number of filters. We use this value to calculate the size of the bias
    // if it is not specified.
    size_t depth = wtag.dims()[0];

-    // Construct the Filter field.


nadavrot · 2018-11-13T23:48:35Z

Nice!

lib/Graph/Nodes.cpp

@@ -199,7 +199,8 @@ static bool verifyPool(NodeValue src, NodeValue dest,
  ShapeNHWC exp(idim.n, outSz.first, outSz.second, idim.c);
  isValid &=
      expectCompareTrue("Unexpected output dimensions", exp, odim, parent);
-  isValid &= checkTypeIgnoreShape(src, dest, dest.getNode());
+  if (!src.getType()->isQuantizedType())


utils/download_caffe2_models.sh

 EOF
 )

 for model in $MODELS; do
-  for file in predict_net.pbtxt predict_net.pb init_net.pb; do
+  for file in predict_net.pbtxt predict_net.pb init_net.pb init_net.pbtxt; do


tests/images/run.sh

@@ -41,3 +41,5 @@ done
 for png_filename in tests/images/imagenet_299/*.png; do
  ./bin/image-classifier "$png_filename" -image_mode=0to1 -m=googlenet_v4_slim/googlenet_v4_slim.onnx -model_input_name=input:0 -image_layout=NHWC -label_offset=1 "$@"
 done
+#Quantized Resnet50 Caffe2 model test
+./bin/image-classifier tests/images/imagenet/*.png -image_mode=0to1 -m=quant_resnet50/predict_net.pbtxt -m=quant_resnet50/init_net.pbtxt  -model_input_name=gpu_0/data_0 -use-imagenet-normalization "$@"


rdzhabarov · 2018-11-14T18:52:24Z

cc: @jspark1105

artemrakhov-glow

Thank you for doing this! This is a lot of tricky code for one PR.

include/glow/Graph/Graph.h

@@ -260,6 +270,9 @@ class Function final : public Named {
                                           NodeValue input, Storage *W,
                                           Storage *B);

+  FullyConnectedNode *createFullyConnected(llvm::StringRef name, TypeRef outTy,
+                                           NodeValue input, Storage *W,
+                                           Storage *B);
  /// Create a fully connected node with the specified output type.
  /// Note, outputDepth is infered based on the output type.
  FullyConnectedNode *createFullyConnected(llvm::StringRef name,


lib/Graph/Nodes.cpp

@@ -199,7 +199,11 @@ static bool verifyPool(NodeValue src, NodeValue dest,
  ShapeNHWC exp(idim.n, outSz.first, outSz.second, idim.c);
  isValid &=
      expectCompareTrue("Unexpected output dimensions", exp, odim, parent);
-  isValid &= checkTypeIgnoreShape(src, dest, dest.getNode());
+  if (!src.getType()->isQuantizedType()) {


lib/Graph/Graph.cpp

+                                     llvm::ArrayRef<unsigned_t> kernels,
+                                     llvm::ArrayRef<unsigned_t> strides,
+                                     llvm::ArrayRef<unsigned_t> pads) {
+  ShapeNHWC idim = ShapeNHWC(input.dims());


lib/Importer/Caffe2ModelLoader.cpp

    // C - output_depth, R - filter_height, S - filter_width, K - input_depth.
+    // Caffe2 "Conv" op always stores the weight as KCRS, while for "Int8Conv",


lib/Importer/Caffe2ModelLoader.cpp

    Tensor wtag;
-    w->transpose(&wtag, NCHW2NHWC);
+    if (typeName != "Conv" && order == "NHWC") {


lib/Importer/Caffe2ModelLoader.cpp

@@ -214,7 +279,49 @@ void Caffe2ModelLoader::loadOperator(const caffe2::OperatorDef &op) {
    return;
  }

-  if (typeName == "MaxPool" || typeName == "AveragePool") {
+  if (typeName == "Int8SumRelu") {


lib/Importer/Caffe2ModelLoader.cpp

+    auto in0 = getNodeValueOrCreateConstantByName(op.input(0));
+    auto in1 = getNodeValueOrCreateConstantByName(op.input(1));
+    auto outDims = in0.getType()->dims();
+    auto outTy = G_.getParent()->uniqueType(


lib/Importer/Caffe2ModelLoader.cpp

+          loadInt(dict["Y_zero_point"]) - OFFSETSHIFT);
+      node = G_.createRELU(opName, node, outTy1);
+    } else if (typeName == "Int8Conv") {
+      node = G_.createConv(opName, tr, filter, bias, outTy, kernels, strides,


lib/Importer/Caffe2ModelLoader.cpp

@@ -214,7 +279,49 @@ void Caffe2ModelLoader::loadOperator(const caffe2::OperatorDef &op) {
    return;
  }

-  if (typeName == "MaxPool" || typeName == "AveragePool") {
+  if (typeName == "Int8SumRelu") {


lib/Graph/Graph.cpp

@@ -494,6 +494,16 @@ MaxPoolNode *Function::createMaxPool(llvm::StringRef name, NodeValue input,
  return addNode(new MaxPoolNode(name, OT, input, kernels, strides, pads));
 }

+MaxPoolNode *Function::createMaxPool(llvm::StringRef name, NodeValue input,
+                                     TypeRef outTy,


rdzhabarov · 2018-11-26T19:51:08Z

include/glow/Graph/Graph.h

@@ -243,6 +243,11 @@ class Function final : public Named {
                             llvm::ArrayRef<unsigned_t> strides,
                             llvm::ArrayRef<unsigned_t> pads);

+  MaxPoolNode *createMaxPool(llvm::StringRef name, NodeValue input,


since we enforce output type to be the same as input type for maxPool, i think we can just use createMaxPool method above (no need to overload it).

updated. Thanks!

rdzhabarov · 2018-11-26T21:58:50Z

lib/Graph/Graph.cpp

+  (void)kdim;
+  assert((idim.w + pdim.left + pdim.right) >= kdim.width &&
+         (idim.h + pdim.top + pdim.bottom) >= kdim.height &&
+         "buffer too small for selected stride");


it's a bit misleading "buffer too small for selected stride" as stride is not used in this computations at all.

rdzhabarov · 2018-11-26T22:01:12Z

lib/Graph/Nodes.cpp

@@ -200,7 +200,11 @@ static bool verifyPool(NodeValue src, NodeValue dest,
  ShapeNHWC exp(idim.n, outSz.first, outSz.second, idim.c);
  isValid &=
      expectCompareTrue("Unexpected output dimensions", exp, odim, parent);
-  isValid &= checkTypeIgnoreShape(src, dest, dest.getNode());
+  if (src.getType()->isQuantizedType()) {
+    isValid &= checkSameIsQuantized(src.getType(), dest.getType(), parent);


this check is a bit relaxed. For maxpool we'd like to check that type with ignored shape match as well (not only that the input/output is quantized).

slightly simplified comparison.
both needs to be either quantized or not quantized and in case of max pool we'd need to check matching type except shape.

isValid &= checkSameIsQuantized(src.getType(), dest.getType(), parent); if (!isAvgPool) { isValid &= checkTypeIgnoreShape(src, dest, parent); }

rdzhabarov · 2018-11-26T22:05:35Z

lib/Importer/Caffe2ModelLoader.cpp

+      bias = G_.getParent()->createConstant("conv.bias", biasTensor);
+    } else {
+      assert(dict.count("Y_zero_point") &&
+             "missing zero point for quantzied type");


can you be more explicit that the problem is with zero point for output

It has been explained before defining OFFSETSHIFT if it is what you mean:).

/// For the quantized Caffe2 ops, the activations are quantized to uint_8. /// In Glow, the activations are quantized to int_8. Therefore, for the offset /// read from quantized caffe2 model, we need to subtract 128(i.e. INT8_MIN) to /// make the activations becomes int8_t. /// For Glow: -127 <= orig_fp32/scale_1 + offset_1 < 128 /// For Caffe2: 0 <= orig_fp32/scale_2 + offset_2 < 255 /// Therefore, we can make scale_1 == scale_2, and offset_1 = offset2 - 128 const int32_t OFFSETSHIFT = 128 ```;

yeah, i was referring to "missing zero point for quantzied type"
just to clarify that it's for the output type etc.

also, there is typo in quantized :)

rdzhabarov

Looking good! Few comments.

rdzhabarov · 2018-11-26T22:08:58Z

lib/Importer/Caffe2ModelLoader.cpp


    Node *node = G_.createConv(opName, tr, filter, bias, outTy, kernels,
                               strides, pads, group);
+    if (typeName == "Int8ConvRelu") {


similarly how you handle Int8SumRelu we should be able to handle Int8ConvRelu without additional Relu node.

rdzhabarov · 2018-11-26T22:09:48Z

lib/Importer/Caffe2ModelLoader.cpp

+    auto outTy = G_.getParent()->uniqueType(
+        ElemKind::Int8QTy, outDims, loadFloat(dict["Y_scale"]),
+        loadInt(dict["Y_zero_point"]) - OFFSETSHIFT);
+    auto *node = G_.createAdd("tempadd", outTy, in0, in1);


tempadd ? why not op name?

rdzhabarov · 2018-11-26T22:11:37Z

lib/Importer/Caffe2ModelLoader.cpp

+      assert(dict.count("Y_zero_point") &&
+             "missing zero point for quantzied type");
+      assert(dict.count("Y_scale") && "missing Y_scale for quantzied type");
+      ShapeNHWC idim = ShapeNHWC(tr->getType(0)->dims());


it would be less error prone to get result and type from the result and avoid explicit indexing?

tr could either be the input node directly or a node with Transposed type. So I think it is easier to get the type directly without casting.

I meant: tr->getResult().getType() which avoids explicit indexing.

rdzhabarov · 2018-11-26T22:13:04Z

lib/Importer/Caffe2ModelLoader.cpp

@@ -238,7 +334,28 @@ void Caffe2ModelLoader::loadOperator(const caffe2::OperatorDef &op) {
    }

    Node *node = nullptr;
-    if (typeName == "MaxPool") {
+
+    if (typeName == "Int8MaxPool" || typeName == "Int8AveragePool") {


do you need special casing for Int8MaxPool? I think typeName == "MaxPool" should handle both cases nicely. Anything i'm missing?

Yes. They are the same. However, for "Int8MaxPool", it has the ""Y_zero_point" and "Y_scale". I would like to combine it with Int8AveragePool to check if the fields are there (although for Int8MaxPool, those fields can be ignored).

rdzhabarov · 2018-11-26T22:14:58Z

lib/Importer/Caffe2ModelLoader.cpp

+        TH.raw(i) = ((uint8_t)(str.c_str()[i]) - OFFSETSHIFT);
+      }
+    } else {
+      T->reset(ElemKind::Int32QTy, dim, scale, offset);


what is the typeName for which we set tensor type to be Int32?

Int8GivenIntTensorFill this feels a bit strange as Int range is bigger than Int8 range. But since it's what's on C2 we probably cannot do anything.

rdzhabarov · 2018-11-26T22:15:58Z

tests/images/run.sh

@@ -41,3 +41,5 @@ done
 for png_filename in tests/images/imagenet_299/*.png; do
  ./bin/image-classifier "$png_filename" -image_mode=0to1 -m=googlenet_v4_slim/googlenet_v4_slim.onnx -model_input_name=input:0 -image_layout=NHWC -label_offset=1 "$@"
 done
+#Quantized Resnet50 Caffe2 model test
+./bin/image-classifier tests/images/imagenet/*.png -image_mode=0to1 -m=quant_resnet50 -model_input_name=gpu_0/data_0 -use-imagenet-normalization "$@"


where do we get model from? Likely we need to update utils script for downloading the model

Yes, the download_caffe2_models.sh is included in this PR.

missed that :(

rdzhabarov · 2018-11-27T00:19:56Z

lib/Importer/Caffe2ModelLoader.cpp

+             "missing Y_scale for quantized output type");
+      // Construct the Bias field.
+      Tensor biasTensor(ElemKind::Int32QTy, {depth}, 1.0, 0);
+      biasTensor.zero();


this is fine, as we have offset equal to 0, but isZero and zero does not properly work for quantized types currently.
Tracking this here: #2082

Got it. Will improve it in the following PR.

Nick was going to look at the issue above.

rdzhabarov

Can you create a small net and test the quantized loader? I'm a bit worried that we do not test it anyhow on CI ...

nadavrot · 2018-11-27T05:16:07Z

lib/Graph/Nodes.cpp

@@ -200,7 +200,14 @@ static bool verifyPool(NodeValue src, NodeValue dest,
  ShapeNHWC exp(idim.n, outSz.first, outSz.second, idim.c);
  isValid &=
      expectCompareTrue("Unexpected output dimensions", exp, odim, parent);
-  isValid &= checkTypeIgnoreShape(src, dest, dest.getNode());
+  if (src.getType()->isQuantizedType() && isAvgPool) {
+    // For quantized AvgPool, the scale and offset of its input and output could


Is this documented anywhere? Like, in the doc string of the node?

No, it hasn't been documented yet. However, after this PR is merged, we need to update our quantization docs.

beicy · 2018-11-27T06:04:56Z

@rdzhabarov Is that OK to create a separate PR for the net testing? Since I think the unittest for each quantized op should be added as well.

rdzhabarov · 2018-11-27T18:55:25Z

Since I think the unittest for each quantized op should be added as well.

We already have tests for each quantized op (see operatorTest).

The only thing which is missing is loader tests and since we do not run resnet50 quantized.
This PR is getting bigger I'm ok to schedule followup to address these concerns.

beicy · 2018-11-27T19:06:04Z

@rdzhabarov I agree. Will test the Caffe2 loader for the now quantized caffe2 op.

rdzhabarov

Great work! Really happy to see this merged.

Pending: doc update + importer test

rdzhabarov · 2018-11-28T00:27:03Z

lib/Importer/Caffe2ModelLoader.cpp

+
+      // Check if we have a serialized bias vector.
+      if (op.input_size() > 2) {
+        auto &biasTensorName = op.input(2);


nit: const auto &

rdzhabarov · 2018-11-28T00:27:58Z

lib/Importer/Caffe2ModelLoader.cpp

+      biasTensor.zero();
+      // Check if we have a serialized bias vector.
+      if (op.input_size() > 2) {
+        auto &biasTensorName = op.input(2);


rdzhabarov · 2018-11-28T00:29:55Z

lib/Importer/Caffe2ModelLoader.cpp

@@ -602,6 +736,73 @@ void Caffe2ModelLoader::loadWeight(const caffe2::OperatorDef &op) {
    return;
  }

+  // Load tensors and quantize the values:


we do not really quantize values here, right. Just loading already quantized tensors.

rdzhabarov · 2018-11-28T00:32:24Z

utils/download_caffe2_models.sh

@@ -26,6 +26,7 @@ vgg19
 zfnet512
 bvlc_alexnet
 en2gr
+quant_resnet50


kinda unrelated question but why don't we download directly from the c2 zoo?

Their model has a problem: for their init_net.pb/init_net.pbtxt file, it has duplicated tensors which is unexpected. I removed the duplicated tensors in our version. According to the model owner, they will update the model on their side in the next release.

facebook-github-bot added the CLA Signed label Nov 13, 2018

jackm321 reviewed Nov 13, 2018

View reviewed changes

rdzhabarov suggested changes Nov 13, 2018

View reviewed changes

nadavrot reviewed Nov 13, 2018

View reviewed changes

lib/Importer/Caffe2ModelLoader.cpp Outdated

w->transpose(&wtag, NCHW2NHWC);

if (typeName != "Conv" && order == "NHWC")

wtag.assign(w);

else

This comment was marked as off-topic.

Sign in to view

nadavrot reviewed Nov 13, 2018

View reviewed changes

beicy force-pushed the expr1 branch from 659b853 to 7db7366 Compare November 14, 2018 00:12

jfix71 reviewed Nov 14, 2018

View reviewed changes

beicy force-pushed the expr1 branch from 7db7366 to ffba0ac Compare November 14, 2018 08:20

rdzhabarov mentioned this pull request Nov 14, 2018

[Quantization] Make Int8 MaxPool and AvgPool handle different input/output fp32 dynamic ranges #2023

Closed

artemrakhov-glow reviewed Nov 14, 2018

View reviewed changes

artemrakhov-glow reviewed Nov 15, 2018

View reviewed changes

beicy force-pushed the expr1 branch from ffba0ac to edf3449 Compare November 15, 2018 22:46

qcolombet reviewed Nov 16, 2018

View reviewed changes

rdzhabarov reviewed Nov 26, 2018

View reviewed changes

beicy force-pushed the expr1 branch from edf3449 to 037e480 Compare November 26, 2018 20:59

rdzhabarov reviewed Nov 26, 2018

View reviewed changes

rdzhabarov suggested changes Nov 26, 2018

View reviewed changes

beicy force-pushed the expr1 branch 2 times, most recently from 491159e to e999ebb Compare November 26, 2018 23:31

rdzhabarov reviewed Nov 27, 2018

View reviewed changes

nadavrot reviewed Nov 27, 2018

View reviewed changes

beicy force-pushed the expr1 branch from e999ebb to f4adcc9 Compare November 28, 2018 00:14

rdzhabarov approved these changes Nov 28, 2018

View reviewed changes

beicy force-pushed the expr1 branch from f4adcc9 to 1bcf14b Compare November 28, 2018 06:19

[Quantization] Load quantized resnet50 model

060eb0d

beicy force-pushed the expr1 branch from 1bcf14b to 060eb0d Compare November 28, 2018 07:00

beicy merged commit 9f706f3 into pytorch:master Nov 28, 2018

beicy deleted the expr1 branch December 3, 2018 23:11

		// C - output_depth, R - filter_height, S - filter_width, K - input_depth.
		// Caffe2 "Conv" op always stores the weight as KCRS, while for "Int8Conv",

[Quantization] Load quantized resnet50 model #2016

[Quantization] Load quantized resnet50 model #2016

Conversation

beicy commented Nov 13, 2018

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

rdzhabarov left a comment • edited Loading

Choose a reason for hiding this comment

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

nadavrot commented Nov 13, 2018

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

rdzhabarov commented Nov 14, 2018

artemrakhov-glow left a comment

Choose a reason for hiding this comment

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rdzhabarov Nov 26, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rdzhabarov Nov 26, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rdzhabarov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rdzhabarov left a comment • edited Loading

rdzhabarov left a comment •

edited

Loading

rdzhabarov Nov 26, 2018 •

edited

Loading

rdzhabarov Nov 26, 2018 •

edited

Loading

rdzhabarov left a comment •

edited

Loading

rdzhabarov left a comment •

edited

Loading