Generalization the weight compression algorithm to PyTorch model. #2333

alexsu52 · 2023-12-22T10:01:23Z

Changes

Main changes:

Reworked the weight compression algorithm to support different backends.
Implementation PyTorch backend for the weight compression algorithm.
Introduced WeightsDecompressor module in the Torch backend
Introduced decompress compression operator in the Torch backend

Related changes:

added linalg.norm(), clip(), as_tensor_like(), item(), sum(), multiply(), var(), size() to tensor.functions library
the logic for caching processed torch parameters has been changed to caching only those parameters that will be reused.
added weight_port_ids to torch metatypes

Reason for changes

Model	NNCF Version	Config	Backend	PPI
opt-125m	develop	int8	OV	25.672986761305495
opt-125m	develop	int4_asym_g128_r80_max_activation_variance	OV	32.624618051345685
opt-125m	develop	int4_asym_g128_r80	OV	33.00145528966855
opt-125m	PR	int8	OV	25.672986761305495
opt-125m	PR	int4_asym_g128_r80_max_activation_variance	OV	32.624618051345685
opt-125m	PR	int4_asym_g128_r80	OV	33.00145528966855
opt-125m	PR	int8	Torch	25.672986761305495

Related tickets

ref: 119585

Tests

test_tensors

nncf/quantization/algorithms/weight_compression/openvino_backend.py

codecov · 2024-01-08T17:01:36Z

Codecov Report

Attention: 194 lines in your changes are missing coverage. Please review.

Comparison is base (01e3de7) 90.09% compared to head (3b5413c) 81.94%.
Report is 2 commits behind head on develop.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #2333      +/-   ##
===========================================
- Coverage    90.09%   81.94%   -8.16%     
===========================================
  Files          492      497       +5     
  Lines        45331    44974     -357     
===========================================
- Hits         40843    36852    -3991     
- Misses        4488     8122    +3634

Files	Coverage Δ
nncf/common/graph/layer_attributes.py	`96.26% <100.00%> (+0.03%)`	⬆️
nncf/common/graph/transformations/commands.py	`97.01% <ø> (+1.12%)`	⬆️
nncf/common/scopes.py	`98.07% <100.00%> (ø)`
.../common/tensor_statistics/statistical_functions.py	`100.00% <100.00%> (ø)`
nncf/experimental/tensor/__init__.py	`100.00% <100.00%> (ø)`
nncf/experimental/tensor/definitions.py	`100.00% <100.00%> (ø)`
nncf/experimental/tensor/functions/__init__.py	`100.00% <100.00%> (ø)`
nncf/experimental/tensor/functions/dispatcher.py	`100.00% <100.00%> (ø)`
nncf/experimental/tensor/functions/linalg.py	`100.00% <100.00%> (ø)`
nncf/experimental/tensor/functions/torch_linalg.py	`100.00% <100.00%> (ø)`
... and 37 more

... and 57 files with indirect coverage changes

Flag	Coverage Δ
COMMON	`44.83% <16.83%> (-0.01%)`	⬇️
ONNX	`34.20% <45.92%> (+0.40%)`	⬆️
OPENVINO	`∅ <ø> (∅)`
TENSORFLOW	`29.93% <32.62%> (+0.31%)`	⬆️
TORCH	`63.21% <74.97%> (+0.75%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
common	`92.81% <100.00%> (-0.23%)`	⬇️
torch	`93.16% <87.07%> (+1.17%)`	⬆️
tensorflow	`94.00% <94.44%> (+0.63%)`	⬆️
onnx	`97.22% <100.00%> (+1.21%)`	⬆️
openvino	`0.00% <0.00%> (-90.79%)`	⬇️
ptq	`63.87% <62.94%> (-24.80%)`	⬇️

nncf/quantization/algorithms/weight_compression/algorithm.py

daniil-lyakhov · 2024-01-09T13:57:47Z

nncf/quantization/algorithms/weight_compression/torch_backend.py

+            if edge.input_port_id == weight_port_id:
+                weight_node = prev_node


Is it possible to get specific edge by index instead of a search through all ancestors?

I think this is possible, but I did not find such API in the NNCFGraph. It looks like we need to think about it.

nncf/quantization/algorithms/weight_compression/torch_backend.py

nncf/quantization/algorithms/weight_compression/weight_lowering.py

nncf/quantization/algorithms/weight_compression/backend.py

daniil-lyakhov · 2024-01-12T13:57:17Z

tests/openvino/native/quantization/test_weights_compression.py

+from nncf.quantization.algorithms.weight_compression.config import WeightCompressionConfig
+from nncf.quantization.algorithms.weight_compression.mixed_precision import MIXED_PRECISION_CRITERIA
+from nncf.quantization.algorithms.weight_compression.weight_lowering import get_integer_quantization_error


FYI
I made a ticket to make a template tests for weights compression algorithm to reuse references and tests No 129446

Thanks, good suggestion!

ljaljushkin · 2024-01-12T15:02:32Z

Calculated the perplexity on wikitext with the proposed changes and before them.
There is a small difference starting from the 3rd digit.
@alexsu52 have you noticed any difference in your tests?

model	mode	w/o PR	w/ PR
stabilityai_stablelm-3b-4e1t	int4_sym_g64_r80_data	10.6759	10.6770
stabilityai_stablelm-3b-4e1t	int4_sym_g64_r80	10.8315	10.8321

I use the same version of OpenVINO
The .xml files are identical except different names of constants, the compression config per layer is the same, but the .bin files are different. I assume there is some difference in quantization.

ljaljushkin · 2024-01-12T15:44:35Z

nncf/quantization/algorithms/weight_compression/algorithm.py

+                "group_size": self._group_size,
+                "ratio": self._ratio,
+                "all_layers": self._all_layers,
+                "ignored_scope": self._ignored_scope,


What about checking for the empty ignored scope and dumping when it has some values only?
IMHO, empty ignored scope adds a noise

Thanks for your suggestion. This behavior corresponds to ignore-scoped methods such as nncf.quantize(). I think this needs to be changed for all methods with ignored scope at once. Please open a ticket to take action on your suggestion.

opened ticket 129593

nncf/quantization/algorithms/weight_compression/algorithm.py

alexsu52 · 2024-01-14T14:10:54Z

Calculated the perplexity on wikitext with the proposed changes and before them. There is a small difference starting from the 3rd digit. @alexsu52 have you noticed any difference in your tests?
model mode w/o PR w/ PR
stabilityai_stablelm-3b-4e1t int4_sym_g64_r80_data 10.6759 10.6770
stabilityai_stablelm-3b-4e1t int4_sym_g64_r80 10.8315 10.8321

I use the same version of OpenVINO The .xml files are identical except different names of constants, the compression config per layer is the same, but the .bin files are different. I assume there is some difference in quantization.

I'm guessing the following code made the changes

nncf/nncf/quantization/algorithms/weight_compression/weight_lowering.py

Line 139 in cf71969

scale = scale.astype(weight.dtype)

and

nncf/nncf/quantization/algorithms/weight_compression/weight_lowering.py

Line 163 in cf71969

compressed_weights = compressed_weights.astype(dtype=weight.dtype)

I would note that I did not change references in the tests.

ljaljushkin · 2024-01-15T13:13:33Z

Calculated the perplexity on wikitext with the proposed changes and before them. There is a small difference starting from the 3rd digit. @alexsu52 have you noticed any difference in your tests?
model mode w/o PR w/ PR
stabilityai_stablelm-3b-4e1t int4_sym_g64_r80_data 10.6759 10.6770
stabilityai_stablelm-3b-4e1t int4_sym_g64_r80 10.8315 10.8321
I use the same version of OpenVINO The .xml files are identical except different names of constants, the compression config per layer is the same, but the .bin files are different. I assume there is some difference in quantization.

I'm guessing the following code made the changes

nncf/nncf/quantization/algorithms/weight_compression/weight_lowering.py

Line 139 in cf71969

scale = scale.astype(weight.dtype)

and

nncf/nncf/quantization/algorithms/weight_compression/weight_lowering.py

Line 163 in cf71969

compressed_weights = compressed_weights.astype(dtype=weight.dtype)

I would note that I did not change references in the tests.

Yes, after commenting lines with casting scale and zp, the perplexity become the same.
Agree that casting is needed, otherwise decompression might be incorrect.

) cherry pick to release #2333

alexsu52 force-pushed the as/8bit_weights_compression branch 4 times, most recently from d6e8c73 to 85f3afb Compare December 24, 2023 14:48

ljaljushkin reviewed Dec 28, 2023

View reviewed changes

nncf/quantization/algorithms/weight_compression/openvino_backend.py Outdated Show resolved Hide resolved

alexsu52 force-pushed the as/8bit_weights_compression branch 3 times, most recently from 1b1e4fa to 6302489 Compare January 8, 2024 14:35

github-actions bot added the documentation Improvements or additions to documentation label Jan 8, 2024

alexsu52 force-pushed the as/8bit_weights_compression branch from 6302489 to 6532240 Compare January 8, 2024 16:58

alexsu52 force-pushed the as/8bit_weights_compression branch 3 times, most recently from 91da031 to 95aea75 Compare January 8, 2024 17:44

alexsu52 marked this pull request as ready for review January 9, 2024 05:00

alexsu52 requested a review from a team as a code owner January 9, 2024 05:00

alexsu52 requested review from andreyanufr, daniil-lyakhov and AlexanderDokuchaev January 9, 2024 05:01

alexsu52 force-pushed the as/8bit_weights_compression branch from 95aea75 to 9f0654d Compare January 9, 2024 06:43

daniil-lyakhov requested changes Jan 9, 2024

View reviewed changes

alexsu52 force-pushed the as/8bit_weights_compression branch 2 times, most recently from 333439d to d7cc206 Compare January 10, 2024 06:11

alexsu52 requested a review from daniil-lyakhov January 12, 2024 12:48

AlexanderDokuchaev requested changes Jan 12, 2024

View reviewed changes

nncf/quantization/algorithms/weight_compression/backend.py Outdated Show resolved Hide resolved

daniil-lyakhov reviewed Jan 12, 2024

View reviewed changes

daniil-lyakhov approved these changes Jan 12, 2024

View reviewed changes

ljaljushkin reviewed Jan 12, 2024

View reviewed changes

nncf/quantization/algorithms/weight_compression/algorithm.py Outdated Show resolved Hide resolved

alexsu52 requested a review from AlexanderDokuchaev January 14, 2024 13:27

alexsu52 requested a review from ljaljushkin January 14, 2024 14:11

alexsu52 force-pushed the as/8bit_weights_compression branch from cf71969 to 1e0cd8a Compare January 15, 2024 08:53

ljaljushkin approved these changes Jan 15, 2024

View reviewed changes

andreyanufr approved these changes Jan 15, 2024

View reviewed changes

alexsu52 added 7 commits January 16, 2024 14:23

support Pytorch model in the weight compression algorithm

56f5b9c

replied to comments

559ca1f

replied to comments

1d7530b

use PTSharedFnInsertionCommand in torch backend

cf5e1fd

replied to comments

44383dd

pytorch tests passed

605aac1

added tests for tensor functions

7dbf702

alexsu52 force-pushed the as/8bit_weights_compression branch from 34ce756 to 7dbf702 Compare January 16, 2024 10:23

ruff happy

0d36dd9

AlexanderDokuchaev approved these changes Jan 16, 2024

View reviewed changes

alexsu52 added 2 commits January 17, 2024 11:32

rework the compress_weights behavior for PyTorch model

a27422d

passed openvino test

3b5413c

alexsu52 merged commit 203a2cf into openvinotoolkit:develop Jan 17, 2024
9 checks passed

alexsu52 mentioned this pull request Jan 17, 2024

Generalization the weight compression algorithm to PyTorch model. #2400

Merged

nikita-malininn pushed a commit that referenced this pull request Jan 18, 2024

Generalization the weight compression algorithm to PyTorch model. (#2400

7ee15bf

) cherry pick to release #2333

kshpv mentioned this pull request Feb 12, 2025

[Good First Issue][NNCF]: Add ONNX support of data-free Weight Compression Algorithm #3273

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalization the weight compression algorithm to PyTorch model. #2333

Generalization the weight compression algorithm to PyTorch model. #2333

alexsu52 commented Dec 22, 2023 •

edited

Loading

codecov bot commented Jan 8, 2024 •

edited

Loading

daniil-lyakhov Jan 9, 2024

alexsu52 Jan 12, 2024

daniil-lyakhov Jan 12, 2024

alexsu52 Jan 14, 2024

ljaljushkin commented Jan 12, 2024 •

edited

Loading

ljaljushkin Jan 12, 2024

alexsu52 Jan 14, 2024

ljaljushkin Jan 15, 2024

alexsu52 commented Jan 14, 2024

ljaljushkin commented Jan 15, 2024

		if edge.input_port_id == weight_port_id:
		weight_node = prev_node

Generalization the weight compression algorithm to PyTorch model. #2333

Generalization the weight compression algorithm to PyTorch model. #2333

Conversation

alexsu52 commented Dec 22, 2023 • edited Loading

Changes

Reason for changes

Related tickets

Tests

codecov bot commented Jan 8, 2024 • edited Loading

Codecov Report

daniil-lyakhov Jan 9, 2024

Choose a reason for hiding this comment

alexsu52 Jan 12, 2024

Choose a reason for hiding this comment

daniil-lyakhov Jan 12, 2024

Choose a reason for hiding this comment

alexsu52 Jan 14, 2024

Choose a reason for hiding this comment

ljaljushkin commented Jan 12, 2024 • edited Loading

ljaljushkin Jan 12, 2024

Choose a reason for hiding this comment

alexsu52 Jan 14, 2024

Choose a reason for hiding this comment

ljaljushkin Jan 15, 2024

Choose a reason for hiding this comment

alexsu52 commented Jan 14, 2024

ljaljushkin commented Jan 15, 2024

alexsu52 commented Dec 22, 2023 •

edited

Loading

codecov bot commented Jan 8, 2024 •

edited

Loading

ljaljushkin commented Jan 12, 2024 •

edited

Loading