You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
NNCF supports OpenVINO, Torch and TorchFX backends for weight compression algorithm - nncf.compress_weights(). The goal of this issue is to expand support to the ONNX backend. The structure of the NNCF code is designed in a way that this could be done quite straightforward, but it requires attention to detail.
What needs to be done?
The task is to implement data-free int8 and uint8 Weight Compression algorithm support which includes:
Implement WeightCompressionAlgoBackend for ONNX:
We already have this implemented for OpenVINO, Torch, and TorchFX, so you can use those as references.
Some methods like insert_adapters, _get_statistics_for_weights_compression, and dump_parameters can be skipped.
The goal is to make sure we can run nncf.compress_weights(onnx_model) and get a ONNX model with compressed weights in int8, uin8 formats.
Test the Compression:
Ensure that running nncf.compress_weights(onnx_model) actually produces a compressed ONNX model.
Add Initial Tests:
This is super important to prove that the algorithm works correctly.
There are two types of tests we need: Conformance Tests: Add a tinyllama_data_free case for ONNX, similar to what we have for OpenVINO. Note: the test, a good starting point is to read readme Unit Tests: We'll need to add some unit tests. Note: this is a unitetsts for OpenVINO, Torch and TorchFX
This can be split into some subtasks to develop/review faster. This is up to you and can be discussed.
If you have any questions or need guidance, feel free to ask in the comments or reach out to the maintainers.
Example Pull Requests
Adding support of data free for Torch - #2333
Adding support of data free for TorchFX - #2891
Hi @kshpv I'd like to work on this issue. I have experience with ONNX and can help implement the weight compression algorithm support for the ONNX backend.
Context
NNCF supports OpenVINO, Torch and TorchFX backends for weight compression algorithm -
nncf.compress_weights()
. The goal of this issue is to expand support to the ONNX backend. The structure of the NNCF code is designed in a way that this could be done quite straightforward, but it requires attention to detail.What needs to be done?
The task is to implement data-free int8 and uint8 Weight Compression algorithm support which includes:
Implement WeightCompressionAlgoBackend for ONNX:
We already have this implemented for OpenVINO, Torch, and TorchFX, so you can use those as references.
Some methods like
insert_adapters
,_get_statistics_for_weights_compression
, anddump_parameters
can be skipped.The goal is to make sure we can run
nncf.compress_weights(onnx_model)
and get a ONNX model with compressed weights in int8, uin8 formats.Test the Compression:
Ensure that running
nncf.compress_weights(onnx_model)
actually produces a compressed ONNX model.Add Initial Tests:
This is super important to prove that the algorithm works correctly.
There are two types of tests we need:
Conformance Tests: Add a tinyllama_data_free case for ONNX, similar to what we have for OpenVINO. Note: the test, a good starting point is to read readme
Unit Tests: We'll need to add some unit tests. Note: this is a unitetsts for OpenVINO, Torch and TorchFX
This can be split into some subtasks to develop/review faster. This is up to you and can be discussed.
If you have any questions or need guidance, feel free to ask in the comments or reach out to the maintainers.
Example Pull Requests
Adding support of data free for Torch - #2333
Adding support of data free for TorchFX - #2891
Resources
Contact points
@kshpv
The description is not full and will be updated
The text was updated successfully, but these errors were encountered: