-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DRAFT]: Add FracBits experimental feature #1286
base: develop
Are you sure you want to change the base?
[DRAFT]: Add FracBits experimental feature #1286
Conversation
…#1234) * Implement FracBitsQuantizationBuilder and Controller - Implement Builder and Controller - Add and test ModelSizeCompressionLoss Signed-off-by: Kim, Vinnam <[email protected]>
* Add fracbits runnable script and configs Signed-off-by: Kim, Vinnam <[email protected]> * Fix FracBitsAsymmetricQuantizer bug Signed-off-by: Kim, Vinnam <[email protected]> * Update config Signed-off-by: Kim, Vinnam <[email protected]> * Update configs for mobilenetv2-imagenet Signed-off-by: Kim, Vinnam <[email protected]> * Fix unsynchronization bug in distributed system Signed-off-by: Kim, Vinnam <[email protected]> * Fix test errors Signed-off-by: Kim, Vinnam <[email protected]> * Add find_unused_parameters=True Signed-off-by: Kim, Vinnam <[email protected]> * Fix code format Signed-off-by: Kim, Vinnam <[email protected]> * Add resnet50 configs Signed-off-by: Kim, Vinnam <[email protected]> * Fix PyTorch example dependency - Move efficientnet_pytorch from test to examples - Add setuptools==59.5.0 because of tensorboard issue Signed-off-by: Kim, Vinnam <[email protected]> * Add inception_v3 configs Signed-off-by: Kim, Vinnam <[email protected]> * Fix configs Signed-off-by: Kim, Vinnam <[email protected]> * Gather integer model size - We has been gathering fractional model size to compute compression_rate for report. Fix it to report integer model size. Signed-off-by: Kim, Vinnam <[email protected]> * Refactor Fracbits params Signed-off-by: Kim, Vinnam <[email protected]> * Add find_unused_parameters to configs Signed-off-by: Kim, Vinnam <[email protected]> * Update batchsize of resnet50 config Signed-off-by: Kim, Vinnam <[email protected]> * Add pylint disables Signed-off-by: Kim, Vinnam <[email protected]> * Log fractional model size also Signed-off-by: Kim, Vinnam <[email protected]> * Fix parameter construction from config dictionary Signed-off-by: Kim, Vinnam <[email protected]> * Update FracBits README.md Signed-off-by: Kim, Vinnam <[email protected]> * Update README.md Signed-off-by: Kim, Vinnam <[email protected]> * Update accuracy by re-running experiments after the latest code change Signed-off-by: Kim, Vinnam <[email protected]> * Fix typos Signed-off-by: Kim, Vinnam <[email protected]>
Signed-off-by: Kim, Vinnam <[email protected]>
Signed-off-by: Kim, Vinnam <[email protected]>
Signed-off-by: Kim, Vinnam <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vinnamkim, thanks for your contribution!
Some general questions:
- Did you compare the implemented algorithm with existing NNCF algorithms.
- What user scenario do you cover?
Signed-off-by: Kim, Vinnam <[email protected]>
Signed-off-by: Kim, Vinnam <[email protected]>
443462a
to
d86120a
Compare
Hi @alexsu52,
I just compared FracBits with NNCF 8bit QAT. You can see the results in README.md included in this PR. It shows that FracBits can compress the total bits of model weights (model size) 1.5x compared to NNCF 8bit QAT for 3 models (MobileNet-V2, Inception-V3, and ResNet-50) and 2 datasets (ImageNet and CIFAR100) under competitive degradation (<1%).
I think that it can be used by users who want to compress their model size more with the mixed-precision QAT. It doesn't require any time-consuming initialization phase or external exploration phase unlike HAWQ and AutoQ. However, it requires quantization forward-backward propagation steps twice than vanilla QAT. |
It looks like it's not fair to compare with 8bit QAT. Have you had any comparison results (time/accuracy/compression rate/easy to use) with HAWQ and AutoQ?
If I understand correctly, the user must to get a smaller model in comparison with INT8 model in the OpenVINO format. Does OpenVINO support your model? You reported the theoretical compression rate in README.md. What is the actual compression rate? |
Changes
Add a new mixed-precision QAT algorithm, FracBits [paper] and [code] as an experimental feature.
Reason for changes
To expand the choice of mixed-precision QAT algorithms for users.
Related tickets
87363
Tests
Implemented in
tests/torch/experimental/fracbits
.