Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalization the weight compression algorithm to PyTorch model. #2333

Merged
merged 10 commits into from
Jan 17, 2024

Conversation

alexsu52
Copy link
Contributor

@alexsu52 alexsu52 commented Dec 22, 2023

Changes

Main changes:

  • Reworked the weight compression algorithm to support different backends.
  • Implementation PyTorch backend for the weight compression algorithm.
  • Introduced WeightsDecompressor module in the Torch backend
  • Introduced decompress compression operator in the Torch backend

Related changes:

  • added linalg.norm(), clip(), as_tensor_like(), item(), sum(), multiply(), var(), size() to tensor.functions library
  • the logic for caching processed torch parameters has been changed to caching only those parameters that will be reused.
  • added weight_port_ids to torch metatypes

Reason for changes

Model NNCF Version Config Backend PPI
opt-125m develop int8 OV 25.672986761305495
opt-125m develop int4_asym_g128_r80_max_activation_variance OV 32.624618051345685
opt-125m develop int4_asym_g128_r80 OV 33.00145528966855
opt-125m PR int8 OV 25.672986761305495
opt-125m PR int4_asym_g128_r80_max_activation_variance OV 32.624618051345685
opt-125m PR int4_asym_g128_r80 OV 33.00145528966855
opt-125m PR int8 Torch 25.672986761305495

Related tickets

ref: 119585

Tests

test_tensors

@github-actions github-actions bot added NNCF TF Pull requests that updates NNCF TensorFlow NNCF PT Pull requests that updates NNCF PyTorch NNCF Common Pull request that updates NNCF Common experimental NNCF OpenVINO Pull requests that updates NNCF OpenVINO NNCF ONNX Pull requests that updates NNCF ONNX NNCF PTQ Pull requests that updates NNCF PTQ labels Dec 22, 2023
@alexsu52 alexsu52 force-pushed the as/8bit_weights_compression branch 4 times, most recently from d6e8c73 to 85f3afb Compare December 24, 2023 14:48
@alexsu52 alexsu52 force-pushed the as/8bit_weights_compression branch 3 times, most recently from 1b1e4fa to 6302489 Compare January 8, 2024 14:35
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Jan 8, 2024
@alexsu52 alexsu52 force-pushed the as/8bit_weights_compression branch from 6302489 to 6532240 Compare January 8, 2024 16:58
Copy link

codecov bot commented Jan 8, 2024

Codecov Report

Attention: 194 lines in your changes are missing coverage. Please review.

Comparison is base (01e3de7) 90.09% compared to head (3b5413c) 81.94%.
Report is 2 commits behind head on develop.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2333      +/-   ##
===========================================
- Coverage    90.09%   81.94%   -8.16%     
===========================================
  Files          492      497       +5     
  Lines        45331    44974     -357     
===========================================
- Hits         40843    36852    -3991     
- Misses        4488     8122    +3634     
Files Coverage Δ
nncf/common/graph/layer_attributes.py 96.26% <100.00%> (+0.03%) ⬆️
nncf/common/graph/transformations/commands.py 97.01% <ø> (+1.12%) ⬆️
nncf/common/scopes.py 98.07% <100.00%> (ø)
.../common/tensor_statistics/statistical_functions.py 100.00% <100.00%> (ø)
nncf/experimental/tensor/__init__.py 100.00% <100.00%> (ø)
nncf/experimental/tensor/definitions.py 100.00% <100.00%> (ø)
nncf/experimental/tensor/functions/__init__.py 100.00% <100.00%> (ø)
nncf/experimental/tensor/functions/dispatcher.py 100.00% <100.00%> (ø)
nncf/experimental/tensor/functions/linalg.py 100.00% <100.00%> (ø)
nncf/experimental/tensor/functions/torch_linalg.py 100.00% <100.00%> (ø)
... and 37 more

... and 57 files with indirect coverage changes

Flag Coverage Δ
COMMON 44.83% <16.83%> (-0.01%) ⬇️
ONNX 34.20% <45.92%> (+0.40%) ⬆️
OPENVINO ∅ <ø> (∅)
TENSORFLOW 29.93% <32.62%> (+0.31%) ⬆️
TORCH 63.21% <74.97%> (+0.75%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
common 92.81% <100.00%> (-0.23%) ⬇️
torch 93.16% <87.07%> (+1.17%) ⬆️
tensorflow 94.00% <94.44%> (+0.63%) ⬆️
onnx 97.22% <100.00%> (+1.21%) ⬆️
openvino 0.00% <0.00%> (-90.79%) ⬇️
ptq 63.87% <62.94%> (-24.80%) ⬇️

@alexsu52 alexsu52 force-pushed the as/8bit_weights_compression branch 3 times, most recently from 91da031 to 95aea75 Compare January 8, 2024 17:44
@alexsu52 alexsu52 marked this pull request as ready for review January 9, 2024 05:00
@alexsu52 alexsu52 requested a review from a team as a code owner January 9, 2024 05:00
@alexsu52 alexsu52 force-pushed the as/8bit_weights_compression branch from 95aea75 to 9f0654d Compare January 9, 2024 06:43
Comment on lines 114 to 115
if edge.input_port_id == weight_port_id:
weight_node = prev_node
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to get specific edge by index instead of a search through all ancestors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is possible, but I did not find such API in the NNCFGraph. It looks like we need to think about it.

@alexsu52 alexsu52 force-pushed the as/8bit_weights_compression branch 2 times, most recently from 333439d to d7cc206 Compare January 10, 2024 06:11
Comment on lines +26 to +28
from nncf.quantization.algorithms.weight_compression.config import WeightCompressionConfig
from nncf.quantization.algorithms.weight_compression.mixed_precision import MIXED_PRECISION_CRITERIA
from nncf.quantization.algorithms.weight_compression.weight_lowering import get_integer_quantization_error
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI
I made a ticket to make a template tests for weights compression algorithm to reuse references and tests No 129446

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, good suggestion!

@ljaljushkin
Copy link
Contributor

ljaljushkin commented Jan 12, 2024

Calculated the perplexity on wikitext with the proposed changes and before them.
There is a small difference starting from the 3rd digit.
@alexsu52 have you noticed any difference in your tests?

model mode w/o PR w/ PR
stabilityai_stablelm-3b-4e1t int4_sym_g64_r80_data 10.6759 10.6770
stabilityai_stablelm-3b-4e1t int4_sym_g64_r80 10.8315 10.8321

I use the same version of OpenVINO
The .xml files are identical except different names of constants, the compression config per layer is the same, but the .bin files are different. I assume there is some difference in quantization.

"group_size": self._group_size,
"ratio": self._ratio,
"all_layers": self._all_layers,
"ignored_scope": self._ignored_scope,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about checking for the empty ignored scope and dumping when it has some values only?
IMHO, empty ignored scope adds a noise
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your suggestion. This behavior corresponds to ignore-scoped methods such as nncf.quantize(). I think this needs to be changed for all methods with ignored scope at once. Please open a ticket to take action on your suggestion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opened ticket 129593

@alexsu52
Copy link
Contributor Author

Calculated the perplexity on wikitext with the proposed changes and before them. There is a small difference starting from the 3rd digit. @alexsu52 have you noticed any difference in your tests?
model mode w/o PR w/ PR
stabilityai_stablelm-3b-4e1t int4_sym_g64_r80_data 10.6759 10.6770
stabilityai_stablelm-3b-4e1t int4_sym_g64_r80 10.8315 10.8321

I use the same version of OpenVINO The .xml files are identical except different names of constants, the compression config per layer is the same, but the .bin files are different. I assume there is some difference in quantization.

I'm guessing the following code made the changes


and
compressed_weights = compressed_weights.astype(dtype=weight.dtype)

I would note that I did not change references in the tests.

@alexsu52 alexsu52 requested a review from ljaljushkin January 14, 2024 14:11
@alexsu52 alexsu52 force-pushed the as/8bit_weights_compression branch from cf71969 to 1e0cd8a Compare January 15, 2024 08:53
@ljaljushkin
Copy link
Contributor

Calculated the perplexity on wikitext with the proposed changes and before them. There is a small difference starting from the 3rd digit. @alexsu52 have you noticed any difference in your tests?
model mode w/o PR w/ PR
stabilityai_stablelm-3b-4e1t int4_sym_g64_r80_data 10.6759 10.6770
stabilityai_stablelm-3b-4e1t int4_sym_g64_r80 10.8315 10.8321
I use the same version of OpenVINO The .xml files are identical except different names of constants, the compression config per layer is the same, but the .bin files are different. I assume there is some difference in quantization.

I'm guessing the following code made the changes

and

compressed_weights = compressed_weights.astype(dtype=weight.dtype)

I would note that I did not change references in the tests.

Yes, after commenting lines with casting scale and zp, the perplexity become the same.
Agree that casting is needed, otherwise decompression might be incorrect.

@alexsu52 alexsu52 force-pushed the as/8bit_weights_compression branch from 34ce756 to 7dbf702 Compare January 16, 2024 10:23
@alexsu52 alexsu52 merged commit 203a2cf into openvinotoolkit:develop Jan 17, 2024
9 checks passed
nikita-malininn pushed a commit that referenced this pull request Jan 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation experimental NNCF Common Pull request that updates NNCF Common NNCF ONNX Pull requests that updates NNCF ONNX NNCF OpenVINO Pull requests that updates NNCF OpenVINO NNCF PT Pull requests that updates NNCF PyTorch NNCF PTQ Pull requests that updates NNCF PTQ NNCF TF Pull requests that updates NNCF TensorFlow
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants