Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Quantization] Load quantized resnet50 model #2016

Merged
merged 1 commit into from
Nov 28, 2018
Merged

Conversation

beicy
Copy link
Contributor

@beicy beicy commented Nov 13, 2018

Description:
This PR finished the rest work of loading Caffe2 quantized Resnet50 model discussed in #1727.
The model can be found here : https://github.com/caffe2/models/tree/master/resnet50_quantized
For testing in Glow side, the model can be downloaded by running utils/download_caffe2_models.sh.

Testing:
added quantized ResNet50 to Glow's test suite (run.sh), tested for interpreter and cpu backend.

 File: tests/images/imagenet/cat_285.png	Label-K1: 281 (probability: 0.7541)
 File: tests/images/imagenet/dog_207.png	Label-K1: 207 (probability: 0.9583)
 File: tests/images/imagenet/zebra_340.png	Label-K1: 340 (probability: 0.9952)

Documentation:
[Optional Fixes #1762 #1727]

Please see a detailed explanation of how to fill out the fields in the relevant sections in PULL_REQUEST.md.

llvm::ArrayRef<unsigned_t> strides,
llvm::ArrayRef<unsigned_t> pads) {
ShapeNHWC idim = ShapeNHWC(input.dims());
(void)idim;

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

bool ret = expectCompareTrue("Mismatching isQuantized", A->isQuantizedType(),
B->isQuantizedType(), parent);
if (!ret) {
printf("now quantization has issue\n");

This comment was marked as off-topic.

This comment was marked as off-topic.

Copy link
Contributor

@rdzhabarov rdzhabarov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!!

MaxPool implementation relies on the fact that Scale of input and Scale of output are the same.
You'd need to fix CPU and Interpreter implementation. Also, check special cases for Max/Avg Pool in the quantization procedure, we'd not need post processing phase after you add proper handling.

/// For Glow: -127 <= orig_fp32/scale_1 + offset_1 < 128
/// For Caffe2: 0 <= orig_fp32/scale_2 + offset_2 < 255
/// Therefore, we can make scale_1 == scale_2, and offset_1 = offset2 - 128
#define OFFSETSHIFT 128

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

@@ -159,34 +169,18 @@ void Caffe2ModelLoader::loadOperator(const caffe2::OperatorDef &op) {
Tensor *w = getTensorByName(op.input(1));

// Transpose the weights to the right format. Glow expects to read the
// weights in the format CRSK. Caffe2 stores the operators as KCRS.
// C - output_depth, R - filter_height, S - filter_width, K - input_depth.
// weights in the format NHWC.

This comment was marked as off-topic.

w->transpose(&wtag, NCHW2NHWC);
if (typeName != "Conv" && order == "NHWC")
wtag.assign(w);
else

This comment was marked as off-topic.


// The structure of the conv weigts is: NHWC. We take the C, which is the
// number of filters. We use this value to calculate the size of the bias
// if it is not specified.
size_t depth = wtag.dims()[0];

// Construct the Filter field.

This comment was marked as off-topic.

This comment was marked as off-topic.

@nadavrot
Copy link
Contributor

Nice!

@@ -199,7 +199,8 @@ static bool verifyPool(NodeValue src, NodeValue dest,
ShapeNHWC exp(idim.n, outSz.first, outSz.second, idim.c);
isValid &=
expectCompareTrue("Unexpected output dimensions", exp, odim, parent);
isValid &= checkTypeIgnoreShape(src, dest, dest.getNode());
if (!src.getType()->isQuantizedType())

This comment was marked as off-topic.

This comment was marked as off-topic.

EOF
)

for model in $MODELS; do
for file in predict_net.pbtxt predict_net.pb init_net.pb; do
for file in predict_net.pbtxt predict_net.pb init_net.pb init_net.pbtxt; do

This comment was marked as off-topic.

This comment was marked as off-topic.

@@ -41,3 +41,5 @@ done
for png_filename in tests/images/imagenet_299/*.png; do
./bin/image-classifier "$png_filename" -image_mode=0to1 -m=googlenet_v4_slim/googlenet_v4_slim.onnx -model_input_name=input:0 -image_layout=NHWC -label_offset=1 "$@"
done
#Quantized Resnet50 Caffe2 model test
./bin/image-classifier tests/images/imagenet/*.png -image_mode=0to1 -m=quant_resnet50/predict_net.pbtxt -m=quant_resnet50/init_net.pbtxt -model_input_name=gpu_0/data_0 -use-imagenet-normalization "$@"

This comment was marked as off-topic.

@rdzhabarov
Copy link
Contributor

cc: @jspark1105

Copy link
Contributor

@artemrakhov-glow artemrakhov-glow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for doing this! This is a lot of tricky code for one PR.

@@ -260,6 +270,9 @@ class Function final : public Named {
NodeValue input, Storage *W,
Storage *B);

FullyConnectedNode *createFullyConnected(llvm::StringRef name, TypeRef outTy,
NodeValue input, Storage *W,
Storage *B);
/// Create a fully connected node with the specified output type.
/// Note, outputDepth is infered based on the output type.
FullyConnectedNode *createFullyConnected(llvm::StringRef name,

This comment was marked as off-topic.

This comment was marked as off-topic.

@@ -199,7 +199,11 @@ static bool verifyPool(NodeValue src, NodeValue dest,
ShapeNHWC exp(idim.n, outSz.first, outSz.second, idim.c);
isValid &=
expectCompareTrue("Unexpected output dimensions", exp, odim, parent);
isValid &= checkTypeIgnoreShape(src, dest, dest.getNode());
if (!src.getType()->isQuantizedType()) {

This comment was marked as off-topic.

llvm::ArrayRef<unsigned_t> kernels,
llvm::ArrayRef<unsigned_t> strides,
llvm::ArrayRef<unsigned_t> pads) {
ShapeNHWC idim = ShapeNHWC(input.dims());

This comment was marked as off-topic.

This comment was marked as off-topic.

// C - output_depth, R - filter_height, S - filter_width, K - input_depth.
// Caffe2 "Conv" op always stores the weight as KCRS, while for "Int8Conv",

This comment was marked as off-topic.

Tensor wtag;
w->transpose(&wtag, NCHW2NHWC);
if (typeName != "Conv" && order == "NHWC") {

This comment was marked as off-topic.

This comment was marked as off-topic.

@@ -214,7 +279,49 @@ void Caffe2ModelLoader::loadOperator(const caffe2::OperatorDef &op) {
return;
}

if (typeName == "MaxPool" || typeName == "AveragePool") {
if (typeName == "Int8SumRelu") {

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

auto in0 = getNodeValueOrCreateConstantByName(op.input(0));
auto in1 = getNodeValueOrCreateConstantByName(op.input(1));
auto outDims = in0.getType()->dims();
auto outTy = G_.getParent()->uniqueType(

This comment was marked as off-topic.

This comment was marked as off-topic.

loadInt(dict["Y_zero_point"]) - OFFSETSHIFT);
node = G_.createRELU(opName, node, outTy1);
} else if (typeName == "Int8Conv") {
node = G_.createConv(opName, tr, filter, bias, outTy, kernels, strides,

This comment was marked as off-topic.

@@ -214,7 +279,49 @@ void Caffe2ModelLoader::loadOperator(const caffe2::OperatorDef &op) {
return;
}

if (typeName == "MaxPool" || typeName == "AveragePool") {
if (typeName == "Int8SumRelu") {

This comment was marked as off-topic.

@@ -494,6 +494,16 @@ MaxPoolNode *Function::createMaxPool(llvm::StringRef name, NodeValue input,
return addNode(new MaxPoolNode(name, OT, input, kernels, strides, pads));
}

MaxPoolNode *Function::createMaxPool(llvm::StringRef name, NodeValue input,
TypeRef outTy,

This comment was marked as off-topic.

This comment was marked as off-topic.

@@ -243,6 +243,11 @@ class Function final : public Named {
llvm::ArrayRef<unsigned_t> strides,
llvm::ArrayRef<unsigned_t> pads);

MaxPoolNode *createMaxPool(llvm::StringRef name, NodeValue input,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we enforce output type to be the same as input type for maxPool, i think we can just use createMaxPool method above (no need to overload it).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated. Thanks!

(void)kdim;
assert((idim.w + pdim.left + pdim.right) >= kdim.width &&
(idim.h + pdim.top + pdim.bottom) >= kdim.height &&
"buffer too small for selected stride");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a bit misleading "buffer too small for selected stride" as stride is not used in this computations at all.

@@ -200,7 +200,11 @@ static bool verifyPool(NodeValue src, NodeValue dest,
ShapeNHWC exp(idim.n, outSz.first, outSz.second, idim.c);
isValid &=
expectCompareTrue("Unexpected output dimensions", exp, odim, parent);
isValid &= checkTypeIgnoreShape(src, dest, dest.getNode());
if (src.getType()->isQuantizedType()) {
isValid &= checkSameIsQuantized(src.getType(), dest.getType(), parent);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this check is a bit relaxed. For maxpool we'd like to check that type with ignored shape match as well (not only that the input/output is quantized).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slightly simplified comparison.
both needs to be either quantized or not quantized and in case of max pool we'd need to check matching type except shape.

isValid &= checkSameIsQuantized(src.getType(), dest.getType(), parent);

if (!isAvgPool) {
  isValid &= checkTypeIgnoreShape(src, dest, parent);
}

bias = G_.getParent()->createConstant("conv.bias", biasTensor);
} else {
assert(dict.count("Y_zero_point") &&
"missing zero point for quantzied type");
Copy link
Contributor

@rdzhabarov rdzhabarov Nov 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you be more explicit that the problem is with zero point for output

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has been explained before defining OFFSETSHIFT if it is what you mean:).

/// For the quantized Caffe2 ops, the activations are quantized to uint_8.                                                                                                    
/// In Glow, the activations are quantized to int_8. Therefore, for the offset                                                                                                
/// read from quantized caffe2 model, we need to subtract 128(i.e. INT8_MIN) to                                                                                               
/// make the activations becomes int8_t.                                                                                                                                      
/// For Glow: -127 <= orig_fp32/scale_1 + offset_1 < 128                                                                                                                      
/// For Caffe2: 0 <= orig_fp32/scale_2 + offset_2 < 255                                                                                                                       
/// Therefore, we can make scale_1 == scale_2, and offset_1 = offset2 - 128                                                                                                   
const int32_t OFFSETSHIFT = 128
```;

Copy link
Contributor

@rdzhabarov rdzhabarov Nov 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i was referring to "missing zero point for quantzied type"
just to clarify that it's for the output type etc.

also, there is typo in quantized :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! :)

Copy link
Contributor

@rdzhabarov rdzhabarov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Few comments.


Node *node = G_.createConv(opName, tr, filter, bias, outTy, kernels,
strides, pads, group);
if (typeName == "Int8ConvRelu") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similarly how you handle Int8SumRelu we should be able to handle Int8ConvRelu without additional Relu node.

auto outTy = G_.getParent()->uniqueType(
ElemKind::Int8QTy, outDims, loadFloat(dict["Y_scale"]),
loadInt(dict["Y_zero_point"]) - OFFSETSHIFT);
auto *node = G_.createAdd("tempadd", outTy, in0, in1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tempadd ? why not op name?

assert(dict.count("Y_zero_point") &&
"missing zero point for quantzied type");
assert(dict.count("Y_scale") && "missing Y_scale for quantzied type");
ShapeNHWC idim = ShapeNHWC(tr->getType(0)->dims());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be less error prone to get result and type from the result and avoid explicit indexing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tr could either be the input node directly or a node with Transposed type. So I think it is easier to get the type directly without casting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant: tr->getResult().getType() which avoids explicit indexing.

@@ -238,7 +334,28 @@ void Caffe2ModelLoader::loadOperator(const caffe2::OperatorDef &op) {
}

Node *node = nullptr;
if (typeName == "MaxPool") {

if (typeName == "Int8MaxPool" || typeName == "Int8AveragePool") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need special casing for Int8MaxPool? I think typeName == "MaxPool" should handle both cases nicely. Anything i'm missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. They are the same. However, for "Int8MaxPool", it has the ""Y_zero_point" and "Y_scale". I would like to combine it with Int8AveragePool to check if the fields are there (although for Int8MaxPool, those fields can be ignored).

TH.raw(i) = ((uint8_t)(str.c_str()[i]) - OFFSETSHIFT);
}
} else {
T->reset(ElemKind::Int32QTy, dim, scale, offset);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the typeName for which we set tensor type to be Int32?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Int8GivenIntTensorFill this feels a bit strange as Int range is bigger than Int8 range. But since it's what's on C2 we probably cannot do anything.

@@ -41,3 +41,5 @@ done
for png_filename in tests/images/imagenet_299/*.png; do
./bin/image-classifier "$png_filename" -image_mode=0to1 -m=googlenet_v4_slim/googlenet_v4_slim.onnx -model_input_name=input:0 -image_layout=NHWC -label_offset=1 "$@"
done
#Quantized Resnet50 Caffe2 model test
./bin/image-classifier tests/images/imagenet/*.png -image_mode=0to1 -m=quant_resnet50 -model_input_name=gpu_0/data_0 -use-imagenet-normalization "$@"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where do we get model from? Likely we need to update utils script for downloading the model

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the download_caffe2_models.sh is included in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missed that :(

@beicy beicy force-pushed the expr1 branch 2 times, most recently from 491159e to e999ebb Compare November 26, 2018 23:31
"missing Y_scale for quantized output type");
// Construct the Bias field.
Tensor biasTensor(ElemKind::Int32QTy, {depth}, 1.0, 0);
biasTensor.zero();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is fine, as we have offset equal to 0, but isZero and zero does not properly work for quantized types currently.
Tracking this here: #2082

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Will improve it in the following PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nick was going to look at the issue above.

Copy link
Contributor

@rdzhabarov rdzhabarov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you create a small net and test the quantized loader? I'm a bit worried that we do not test it anyhow on CI ...

@@ -200,7 +200,14 @@ static bool verifyPool(NodeValue src, NodeValue dest,
ShapeNHWC exp(idim.n, outSz.first, outSz.second, idim.c);
isValid &=
expectCompareTrue("Unexpected output dimensions", exp, odim, parent);
isValid &= checkTypeIgnoreShape(src, dest, dest.getNode());
if (src.getType()->isQuantizedType() && isAvgPool) {
// For quantized AvgPool, the scale and offset of its input and output could
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this documented anywhere? Like, in the doc string of the node?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it hasn't been documented yet. However, after this PR is merged, we need to update our quantization docs.

@beicy
Copy link
Contributor Author

beicy commented Nov 27, 2018

@rdzhabarov Is that OK to create a separate PR for the net testing? Since I think the unittest for each quantized op should be added as well.

@rdzhabarov
Copy link
Contributor

Since I think the unittest for each quantized op should be added as well.

We already have tests for each quantized op (see operatorTest).

The only thing which is missing is loader tests and since we do not run resnet50 quantized.
This PR is getting bigger I'm ok to schedule followup to address these concerns.

@beicy
Copy link
Contributor Author

beicy commented Nov 27, 2018

@rdzhabarov I agree. Will test the Caffe2 loader for the now quantized caffe2 op.

Copy link
Contributor

@rdzhabarov rdzhabarov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Really happy to see this merged.

Pending: doc update + importer test


// Check if we have a serialized bias vector.
if (op.input_size() > 2) {
auto &biasTensorName = op.input(2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: const auto &

biasTensor.zero();
// Check if we have a serialized bias vector.
if (op.input_size() > 2) {
auto &biasTensorName = op.input(2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -602,6 +736,73 @@ void Caffe2ModelLoader::loadWeight(const caffe2::OperatorDef &op) {
return;
}

// Load tensors and quantize the values:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do not really quantize values here, right. Just loading already quantized tensors.

@@ -26,6 +26,7 @@ vgg19
zfnet512
bvlc_alexnet
en2gr
quant_resnet50
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kinda unrelated question but why don't we download directly from the c2 zoo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Their model has a problem: for their init_net.pb/init_net.pbtxt file, it has duplicated tensors which is unexpected. I removed the duplicated tensors in our version. According to the model owner, they will update the model on their side in the next release.

@beicy beicy merged commit 9f706f3 into pytorch:master Nov 28, 2018
@beicy beicy deleted the expr1 branch December 3, 2018 23:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants