Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low accuracy of TF-Lite model for Mobilenet (Quantization aware training) #368

Closed
NobuoTsukamoto opened this issue Apr 20, 2020 · 21 comments
Assignees
Labels
bug Something isn't working

Comments

@NobuoTsukamoto
Copy link

Describe the bug
The accuracy of TF-Lite model becomes extremely low after the quantization aware training of tf.keras.applications.mobilenet (v1/v2).

System information

TensorFlow installed from (source or binary): binary

TensorFlow version: tf-nightly-gpu (2.2.0.dev20200420)

TensorFlow Model Optimization version: 0.3.0

Python version: 3.6.9

Describe the expected behavior
The accuracy of Keras model (with quantization aware training) and TF-Lite model are almost the same.
Image classification with tools

Describe the current behavior

  • Train using the tf_flowers dataset.
  • Train Mobilenet V2 model without quantization aware training.
  • After training, Create a quantized model using quantize_model api and train with quantization aware training.
  • Check accuracy of test set with evaluate api
    • Keras model without quantization aware training: 0.99
    • Keras model with quantization aware training: 0.97
  • Convert to TF-Lite model and check the accuracy with a test set.
    Accuracy is extremely low: 0.20%

If the model is defined as follows, the accuracy of Keras model and TF-Lite model will be almost the same.

  # extract image features by convolution and max pooling layers
  inputs = tf.keras.Input(shape = (IMG_SIZE, IMG_SIZE, 3))
  x = tf.keras.layers.Conv2D(32, kernel_size=3, padding="same", activation="relu")(inputs)
  x = tf.keras.layers.MaxPooling2D(pool_size=(2, 2))(x)
  x = tf.keras.layers.Conv2D(64, kernel_size=3, padding="same", activation="relu")(x)
  x = tf.keras.layers.MaxPooling2D(pool_size=(2, 2))(x)
  # classify the class by fully-connected layers
  x = tf.keras.layers.Flatten()(x)
  x = tf.keras.layers.Dense(512, activation="relu")(x)
  x = tf.keras.layers.Dense(info.features['label'].num_classes)(x)
  x = tf.keras.layers.Activation("softmax")(x)
  model_functional = tf.keras.Model(inputs=inputs, outputs=x)

Code to reproduce the issue
(Google Colab notebook)
https://gist.github.com/NobuoTsukamoto/b42128104531a7612e5c85e246cb2dac

Screenshots

Additional context

@NobuoTsukamoto NobuoTsukamoto added the bug Something isn't working label Apr 20, 2020
@alanchiao
Copy link

I skimmed through your colab. Could you try one thing I didn't see?

If you try taking your "Keras model without quantization aware training" (0.99), converting it to TFLite, and then evaluating it in a manner similar to how you got the 0.20% accuracy number, could you see what you get?

@NobuoTsukamoto
Copy link
Author

I updated colab notebook.
https://gist.github.com/NobuoTsukamoto/b42128104531a7612e5c85e246cb2dac

If you try taking your "Keras model without quantization aware training" (0.99), converting it to TFLite, and then evaluating it in a manner similar to how you got the 0.20% accuracy number, could you see what you get?

  • Keras model without quantization aware training: 0.9837
    • TF-Lite model: 0.8965
    • TF-Lite weight quantization model (Post training quantization): 0.8556
    • TF-Lite float16 quantization (Post training quantization): 0.9837
    • TF-Lite integer quantization (Post training quantization): 0.9782
  • Keras model with quantization aware training: 0.9946
    • TF-Lite integer quantization (Quantization aware training): 0.2343

@kmkolasinski
Copy link

Can you try to train your q_aware model much longer, e.g.

q_aware_history = q_aware_model.fit(train.repeat(),
                                    initial_epoch=10,
                                    epochs=200,
                                    steps_per_epoch=500,
                                    validation_data=validation.repeat(),
                                    validation_steps=validation_steps)

there are running exponential averages in the quantized layers which may need to converge.

@kmkolasinski
Copy link

You can take a look on this issue: #309
TLDR: I had similar problem, but when I trained quantization aware model for a longer time, the gap between keras and tflite model decreased.

@NobuoTsukamoto
Copy link
Author

@kmkolasinski
Thanks for your information.

I tried two patterns ( training with QAT).

  1. epochs=50
    Keras model (QAT): 0.99 , TF-Lite integer quant model (QAT): 0.55
  2. epochs=100
    Keras model (QAT): 1.00 , TF-Lite integer quant model (QAT): 1.00

I need quite a long time and a large number of epochs. Also, It is not possible to confirm the gap between the Keras model and the TF-Lite model from the accuracy and loss metrics.

How can I find that the gap disappears during training? Also, can I guess how many epochs to set?
(According to #309, it didn't seem possible ...)

@alanchiao
Copy link

@NobuoTsukamoto, @krzys-ostrowski: this is good feedback.

Just from the analysis, there are some things we could possibly do:

  1. Have Tensorboard log the exponential averages so you can see them converge, through a new callback for QAT.

and then with regards to how long it takes

  1. More intelligently initialize the exponential averages which track the min/max values of weights/activations to reflect things that are fixed for the activations (e.g. 0 as minimum for RELU)
  2. Have the exponential averages change more quickly at the start of training ("zero_debias") from their initialized values or modify ema_decay - I'm not sure how well this would work across models

@kmkolasinski
Copy link

Indeed, having native callback for EMA monitoring would be a nice feature.

Additionally, since EMA decay in the moving average quantizer is set to beta=0.999 we need approximately 1000 steps to 'forget' about the initial state. Here is a table which shows how many steps you need to 'forget' about the initial state of the quantizer min/max values:
image

Probably, setting the default EMA decay to 0.995 would be a better choice for users with simpler problems.

One can also monitor the GAP between Keras model and TFLite during training via custom callback. For example I use model output statistics as a proxy for measuring the GAP. Here is how it looks like in my case (source):

INFO:tensorflow:Measured deviation between keras and tflite model:
INFO:tensorflow:
 - export/objectness/output  
	MAE     =  0.000759 
	RMSE    =  0.003540 
	Keras   = N(μ=  0.030039, σ=  0.143009)
	tflite  = N(μ=  0.030201, σ=  0.143635)
 - export/box_shape/output   
	MAE     =  0.001983 
	RMSE    =  0.003444 
	Keras   = N(μ=  0.413126, σ=  0.267102)
	tflite  = N(μ=  0.413075, σ=  0.266314)
 - export/classes/output     
	MAE     =  0.000494 
	RMSE    =  0.011030 
	Keras   = N(μ=  0.000562, σ=  0.012718)
	tflite  = N(μ=  0.000565, σ=  0.013538)

The problem with this approach is that, predictions through TFLite model can be very slow on non arm architectures and this type of test should be run in background in order to not block the training loop.

@NobuoTsukamoto
Copy link
Author

Have Tensorboard log the exponential averages so you can see them converge, through a new callback for QAT.

It would be nice if the convergence can see in the Tensorboard log.
Like "pruning_callbacks", if "a new callback for QAT" is keras.callbacks, I think it's very easy to use.

@nutsiepully
Copy link
Contributor

I think there is likely some confusion here. Exponential Moving Average is used during QAT to calculate the ranges of dynamic tensors. Since the initial cold start is [-6, 6], it can lead to a huge accuracy drop at the beginning of QAT. Say a tensor only has values in [-0.1, 0.1], then most of the range is wasted and can lead to huge losses.

As training goes on, this range slowly converges to the actual range. As @kmkolasinski mentioned, ~1000 steps. And the QAT accuracy goes up.

However, when converting to TFLite, these same ranges are used which are used in QAT. So the TF and TFLite accuracy and values should be very close. QAT tries to emulate TFLite as closely as possible, and there shouldn't be such divergences.

We don't see it in our local tests either. For example, if you run quantize_functional_test, you'll see that the results for TF QAT and TFLite are the same.

There can be some subtle differences. We don't place FakeQuants after Softmax for instance since it hinders with convergence. There's a possibility that's happening, but I can't be sure of it. I'm trying to recreate the issue.

@kmkolasinski
Copy link

There is a chance that I'm doing something wrong, however It seems that I'm not the only one with this issue. You can check much bigger model than the one used in the quantize_functional_test. I have encountered this issue with MobileNetV2. When models get bigger the errors between emulated quantization and the real one will accumulate.

@nutsiepully
Copy link
Contributor

We've found the issue. One of the quantized kernel activation ranges had a problem, but was getting hidden when the range has converged.

We'll have a fix out soon. tf-nightly should have it.

@nutsiepully
Copy link
Contributor

Thanks a lot for your help reporting and helping reproduce this issue. Would've been really hard to narrow down without the reproduction code.

@sayakpaul
Copy link

@nutsiepully could you mention if there's any specific version of TensorFlow that would have the fix? Or should pip install tf-nightly should do it?

@kmkolasinski
Copy link

Cool thanks for feedback @nutsiepully ! I will check it today. Out of curiosity, was it some general issue or something related to MobileNet models or specific layer etc ?

@sayakpaul Yes, you can also use pip install tf-nightly --upgrade, but you need to uninstall regular TF first.

@nutsiepully
Copy link
Contributor

@sayakpaul - tf-nightly should do it. the next version release will have it.

@kmkolasinski - I'll point out the commit here once it's in so you can see it. It was a general issue, with the DepthConv kernel implementation, which got triggered when ranges hadn't converged.

@kmkolasinski
Copy link

Thanks, it makes sense to me, few weeks ago I've switched to a custom ResNet model which does not have DepthConvs and I got better results.

@sayakpaul
Copy link

Thanks for letting me know. I will check and report back.

@sayakpaul
Copy link

sayakpaul commented May 7, 2020

@nutsiepully I can definitely see the improvement and this Colab Gist reproduces this.

Additionally, I worked on this report for folks to make the onboarding process for quantization a bit easier. It incorporates many of your suggestions as well. Happy to address any feedback.

Thank you so much for all your help :)

@nutsiepully
Copy link
Contributor

Thanks a lot @sayakpaul. Really appreciate the feedback and the effort.

Thanks @kmkolasinski and @NobuoTsukamoto for the detailed bug reports and feedback. I'm closing the bug. Please reopen if you face any further issues.

@sayakpaul, the report is awesome! Great work, this explains the value of the tooling really well.

@tarushbansal
Copy link

tarushbansal commented Jan 14, 2024

Hi. I’m facing the same issue with MobileNetV3 where I see a large drop in accuracy in the TFLite model compared to the QAT Keras Model. I’m using Tensorflow version 2.15.0 and Tensorflow Model Optimization version 0.7.5. I had to refactor MobileNetV3 a little to make it compatible with QAT by using OnlyOutputQuantizeConfig for the Multiply layers (Moving Average Quantizer) and replacing the Add operations in Hard Sigmoid with Rescaling but I don’t think that should be the cause of this issue? Would appreciate any help. Thanks!

@KBOUSTM
Copy link

KBOUSTM commented Aug 5, 2024

Hello @tarushbansal,
I have the same issue. I'm using the last version of tfmot (0.8.0), i had to make the MobilenetV3 QAT friendly then i got a huge gap between the QAT model accuracy and tflite model accuracy. Did you find any solution to this issue? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants