Low accuracy of TF-Lite model for Mobilenet (Quantization aware training) #368

NobuoTsukamoto · 2020-04-20T13:16:21Z

Describe the bug
The accuracy of TF-Lite model becomes extremely low after the quantization aware training of tf.keras.applications.mobilenet (v1/v2).

System information

TensorFlow installed from (source or binary): binary

TensorFlow version: tf-nightly-gpu (2.2.0.dev20200420)

TensorFlow Model Optimization version: 0.3.0

Python version: 3.6.9

Describe the expected behavior
The accuracy of Keras model (with quantization aware training) and TF-Lite model are almost the same.
Image classification with tools

Describe the current behavior

Train using the tf_flowers dataset.
Train Mobilenet V2 model without quantization aware training.
After training, Create a quantized model using quantize_model api and train with quantization aware training.
Check accuracy of test set with evaluate api
- Keras model without quantization aware training: 0.99
- Keras model with quantization aware training: 0.97
Convert to TF-Lite model and check the accuracy with a test set.
Accuracy is extremely low: 0.20%

If the model is defined as follows, the accuracy of Keras model and TF-Lite model will be almost the same.

  # extract image features by convolution and max pooling layers
  inputs = tf.keras.Input(shape = (IMG_SIZE, IMG_SIZE, 3))
  x = tf.keras.layers.Conv2D(32, kernel_size=3, padding="same", activation="relu")(inputs)
  x = tf.keras.layers.MaxPooling2D(pool_size=(2, 2))(x)
  x = tf.keras.layers.Conv2D(64, kernel_size=3, padding="same", activation="relu")(x)
  x = tf.keras.layers.MaxPooling2D(pool_size=(2, 2))(x)
  # classify the class by fully-connected layers
  x = tf.keras.layers.Flatten()(x)
  x = tf.keras.layers.Dense(512, activation="relu")(x)
  x = tf.keras.layers.Dense(info.features['label'].num_classes)(x)
  x = tf.keras.layers.Activation("softmax")(x)
  model_functional = tf.keras.Model(inputs=inputs, outputs=x)

Code to reproduce the issue
(Google Colab notebook)
https://gist.github.com/NobuoTsukamoto/b42128104531a7612e5c85e246cb2dac

Screenshots

Additional context

The text was updated successfully, but these errors were encountered:

alanchiao · 2020-04-20T16:15:27Z

I skimmed through your colab. Could you try one thing I didn't see?

If you try taking your "Keras model without quantization aware training" (0.99), converting it to TFLite, and then evaluating it in a manner similar to how you got the 0.20% accuracy number, could you see what you get?

NobuoTsukamoto · 2020-04-21T04:49:31Z

I updated colab notebook.
https://gist.github.com/NobuoTsukamoto/b42128104531a7612e5c85e246cb2dac

If you try taking your "Keras model without quantization aware training" (0.99), converting it to TFLite, and then evaluating it in a manner similar to how you got the 0.20% accuracy number, could you see what you get?

Keras model without quantization aware training: 0.9837
- TF-Lite model: 0.8965
- TF-Lite weight quantization model (Post training quantization): 0.8556
- TF-Lite float16 quantization (Post training quantization): 0.9837
- TF-Lite integer quantization (Post training quantization): 0.9782
Keras model with quantization aware training: 0.9946
- TF-Lite integer quantization (Quantization aware training): 0.2343

kmkolasinski · 2020-04-21T06:39:10Z

Can you try to train your q_aware model much longer, e.g.

q_aware_history = q_aware_model.fit(train.repeat(),
                                    initial_epoch=10,
                                    epochs=200,
                                    steps_per_epoch=500,
                                    validation_data=validation.repeat(),
                                    validation_steps=validation_steps)

there are running exponential averages in the quantized layers which may need to converge.

kmkolasinski · 2020-04-21T06:44:12Z

You can take a look on this issue: #309
TLDR: I had similar problem, but when I trained quantization aware model for a longer time, the gap between keras and tflite model decreased.

NobuoTsukamoto · 2020-04-21T12:56:28Z

@kmkolasinski
Thanks for your information.

I tried two patterns ( training with QAT).

epochs=50
Keras model (QAT): 0.99 , TF-Lite integer quant model (QAT): 0.55
epochs=100
Keras model (QAT): 1.00 , TF-Lite integer quant model (QAT): 1.00

I need quite a long time and a large number of epochs. Also, It is not possible to confirm the gap between the Keras model and the TF-Lite model from the accuracy and loss metrics.

How can I find that the gap disappears during training? Also, can I guess how many epochs to set?
(According to #309, it didn't seem possible ...)

alanchiao · 2020-04-21T13:45:51Z

@NobuoTsukamoto, @krzys-ostrowski: this is good feedback.

Just from the analysis, there are some things we could possibly do:

Have Tensorboard log the exponential averages so you can see them converge, through a new callback for QAT.

and then with regards to how long it takes

More intelligently initialize the exponential averages which track the min/max values of weights/activations to reflect things that are fixed for the activations (e.g. 0 as minimum for RELU)
Have the exponential averages change more quickly at the start of training ("zero_debias") from their initialized values or modify ema_decay - I'm not sure how well this would work across models

kmkolasinski · 2020-04-21T14:10:34Z

Indeed, having native callback for EMA monitoring would be a nice feature.

Additionally, since EMA decay in the moving average quantizer is set to beta=0.999 we need approximately 1000 steps to 'forget' about the initial state. Here is a table which shows how many steps you need to 'forget' about the initial state of the quantizer min/max values:

Probably, setting the default EMA decay to 0.995 would be a better choice for users with simpler problems.

One can also monitor the GAP between Keras model and TFLite during training via custom callback. For example I use model output statistics as a proxy for measuring the GAP. Here is how it looks like in my case (source):

INFO:tensorflow:Measured deviation between keras and tflite model:
INFO:tensorflow:
 - export/objectness/output  
	MAE     =  0.000759 
	RMSE    =  0.003540 
	Keras   = N(μ=  0.030039, σ=  0.143009)
	tflite  = N(μ=  0.030201, σ=  0.143635)
 - export/box_shape/output   
	MAE     =  0.001983 
	RMSE    =  0.003444 
	Keras   = N(μ=  0.413126, σ=  0.267102)
	tflite  = N(μ=  0.413075, σ=  0.266314)
 - export/classes/output     
	MAE     =  0.000494 
	RMSE    =  0.011030 
	Keras   = N(μ=  0.000562, σ=  0.012718)
	tflite  = N(μ=  0.000565, σ=  0.013538)

The problem with this approach is that, predictions through TFLite model can be very slow on non arm architectures and this type of test should be run in background in order to not block the training loop.

NobuoTsukamoto · 2020-04-22T13:04:01Z

Have Tensorboard log the exponential averages so you can see them converge, through a new callback for QAT.

It would be nice if the convergence can see in the Tensorboard log.
Like "pruning_callbacks", if "a new callback for QAT" is keras.callbacks, I think it's very easy to use.

nutsiepully · 2020-04-28T10:34:20Z

I think there is likely some confusion here. Exponential Moving Average is used during QAT to calculate the ranges of dynamic tensors. Since the initial cold start is [-6, 6], it can lead to a huge accuracy drop at the beginning of QAT. Say a tensor only has values in [-0.1, 0.1], then most of the range is wasted and can lead to huge losses.

As training goes on, this range slowly converges to the actual range. As @kmkolasinski mentioned, ~1000 steps. And the QAT accuracy goes up.

However, when converting to TFLite, these same ranges are used which are used in QAT. So the TF and TFLite accuracy and values should be very close. QAT tries to emulate TFLite as closely as possible, and there shouldn't be such divergences.

We don't see it in our local tests either. For example, if you run quantize_functional_test, you'll see that the results for TF QAT and TFLite are the same.

There can be some subtle differences. We don't place FakeQuants after Softmax for instance since it hinders with convergence. There's a possibility that's happening, but I can't be sure of it. I'm trying to recreate the issue.

kmkolasinski · 2020-04-28T10:59:07Z

There is a chance that I'm doing something wrong, however It seems that I'm not the only one with this issue. You can check much bigger model than the one used in the quantize_functional_test. I have encountered this issue with MobileNetV2. When models get bigger the errors between emulated quantization and the real one will accumulate.

nutsiepully · 2020-05-05T07:07:15Z

We've found the issue. One of the quantized kernel activation ranges had a problem, but was getting hidden when the range has converged.

We'll have a fix out soon. tf-nightly should have it.

nutsiepully · 2020-05-05T07:07:49Z

Thanks a lot for your help reporting and helping reproduce this issue. Would've been really hard to narrow down without the reproduction code.

sayakpaul · 2020-05-05T07:16:22Z

@nutsiepully could you mention if there's any specific version of TensorFlow that would have the fix? Or should pip install tf-nightly should do it?

kmkolasinski · 2020-05-05T09:23:47Z

Cool thanks for feedback @nutsiepully ! I will check it today. Out of curiosity, was it some general issue or something related to MobileNet models or specific layer etc ?

@sayakpaul Yes, you can also use pip install tf-nightly --upgrade, but you need to uninstall regular TF first.

nutsiepully · 2020-05-05T09:25:10Z

@sayakpaul - tf-nightly should do it. the next version release will have it.

@kmkolasinski - I'll point out the commit here once it's in so you can see it. It was a general issue, with the DepthConv kernel implementation, which got triggered when ranges hadn't converged.

kmkolasinski · 2020-05-05T09:41:31Z

Thanks, it makes sense to me, few weeks ago I've switched to a custom ResNet model which does not have DepthConvs and I got better results.

sayakpaul · 2020-05-05T09:48:00Z

Thanks for letting me know. I will check and report back.

sayakpaul · 2020-05-07T02:28:10Z

@nutsiepully I can definitely see the improvement and this Colab Gist reproduces this.

Additionally, I worked on this report for folks to make the onboarding process for quantization a bit easier. It incorporates many of your suggestions as well. Happy to address any feedback.

Thank you so much for all your help :)

nutsiepully · 2020-05-07T08:23:47Z

Thanks a lot @sayakpaul. Really appreciate the feedback and the effort.

Thanks @kmkolasinski and @NobuoTsukamoto for the detailed bug reports and feedback. I'm closing the bug. Please reopen if you face any further issues.

@sayakpaul, the report is awesome! Great work, this explains the value of the tooling really well.

tarushbansal · 2024-01-14T12:57:47Z

Hi. I’m facing the same issue with MobileNetV3 where I see a large drop in accuracy in the TFLite model compared to the QAT Keras Model. I’m using Tensorflow version 2.15.0 and Tensorflow Model Optimization version 0.7.5. I had to refactor MobileNetV3 a little to make it compatible with QAT by using OnlyOutputQuantizeConfig for the Multiply layers (Moving Average Quantizer) and replacing the Add operations in Hard Sigmoid with Rescaling but I don’t think that should be the cause of this issue? Would appreciate any help. Thanks!

KBOUSTM · 2024-08-05T14:29:56Z

Hello @tarushbansal,
I have the same issue. I'm using the last version of tfmot (0.8.0), i had to make the MobilenetV3 QAT friendly then i got a huge gap between the QAT model accuracy and tflite model accuracy. Did you find any solution to this issue? Thanks!

NobuoTsukamoto added the bug Something isn't working label Apr 20, 2020

raziel assigned nutsiepully Apr 21, 2020

nutsiepully mentioned this issue May 5, 2020

Quantization: QuantizeModelsTest.testModelEndToEnd() function does not check the correctness of quantization process #309

Closed

nutsiepully closed this as completed May 7, 2020

This was referenced Jul 9, 2020

Mobilenetv2 - Quantization Aware Training - Low Accuracy google-research/tf-slim#11

Closed

Quantization Aware Training - TF 1.15 using TF-Slim tensorflow/models#8816

Open

tarushbansal mentioned this issue Jan 15, 2024

MobileNetV3 QAT TFLite Conversion Issue #1107

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Low accuracy of TF-Lite model for Mobilenet (Quantization aware training) #368

Low accuracy of TF-Lite model for Mobilenet (Quantization aware training) #368

NobuoTsukamoto commented Apr 20, 2020

alanchiao commented Apr 20, 2020

NobuoTsukamoto commented Apr 21, 2020

kmkolasinski commented Apr 21, 2020

kmkolasinski commented Apr 21, 2020

NobuoTsukamoto commented Apr 21, 2020

alanchiao commented Apr 21, 2020

kmkolasinski commented Apr 21, 2020

NobuoTsukamoto commented Apr 22, 2020

nutsiepully commented Apr 28, 2020

kmkolasinski commented Apr 28, 2020

nutsiepully commented May 5, 2020

nutsiepully commented May 5, 2020

sayakpaul commented May 5, 2020

kmkolasinski commented May 5, 2020

nutsiepully commented May 5, 2020

kmkolasinski commented May 5, 2020

sayakpaul commented May 5, 2020

sayakpaul commented May 7, 2020 •

edited

Loading

nutsiepully commented May 7, 2020

tarushbansal commented Jan 14, 2024 •

edited

Loading

KBOUSTM commented Aug 5, 2024

Low accuracy of TF-Lite model for Mobilenet (Quantization aware training) #368

Low accuracy of TF-Lite model for Mobilenet (Quantization aware training) #368

Comments

NobuoTsukamoto commented Apr 20, 2020

alanchiao commented Apr 20, 2020

NobuoTsukamoto commented Apr 21, 2020

kmkolasinski commented Apr 21, 2020

kmkolasinski commented Apr 21, 2020

NobuoTsukamoto commented Apr 21, 2020

alanchiao commented Apr 21, 2020

kmkolasinski commented Apr 21, 2020

NobuoTsukamoto commented Apr 22, 2020

nutsiepully commented Apr 28, 2020

kmkolasinski commented Apr 28, 2020

nutsiepully commented May 5, 2020

nutsiepully commented May 5, 2020

sayakpaul commented May 5, 2020

kmkolasinski commented May 5, 2020

nutsiepully commented May 5, 2020

kmkolasinski commented May 5, 2020

sayakpaul commented May 5, 2020

sayakpaul commented May 7, 2020 • edited Loading

nutsiepully commented May 7, 2020

tarushbansal commented Jan 14, 2024 • edited Loading

KBOUSTM commented Aug 5, 2024

sayakpaul commented May 7, 2020 •

edited

Loading

tarushbansal commented Jan 14, 2024 •

edited

Loading