Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support SmoothQuant for ORT static quantization #16288

Merged
merged 26 commits into from
Jul 27, 2023
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
ab3d43f
Support SmoothQuant
mengniwang95 Jun 6, 2023
8eb3520
add ut and dependence
mengniwang95 Jun 7, 2023
c1ccdd5
fix python format
mengniwang95 Jun 8, 2023
7b5e7f9
fix python format
mengniwang95 Jun 8, 2023
e385a30
Fix dependency and model
mengniwang95 Jun 14, 2023
5094bb4
fix python format
mengniwang95 Jun 14, 2023
636ffd5
fix python format
mengniwang95 Jun 15, 2023
13adeab
enhance ut
mengniwang95 Jun 20, 2023
d7bc884
update requirements
mengniwang95 Jul 10, 2023
ebced60
Update ThirdPartyNotices.txt
mengniwang95 Jul 10, 2023
4a3da03
Update requirements.txt
mengniwang95 Jul 15, 2023
0c5e242
Update requirements.txt
mengniwang95 Jul 15, 2023
9af6db2
Update test_quantize_static.py
mengniwang95 Jul 15, 2023
4aa01d1
Update test_quantize_static.py
mengniwang95 Jul 15, 2023
8aa2886
Merge pull request #1 from microsoft/main
mengniwang95 Jul 17, 2023
a4e1d92
Update quantize.py
mengniwang95 Jul 17, 2023
25fcc9a
Update quantize.py
mengniwang95 Jul 17, 2023
93fc0f6
Update quantize.py
mengniwang95 Jul 17, 2023
d5a30c7
Update quantize.py
mengniwang95 Jul 17, 2023
b76cc62
Merge pull request #2 from microsoft/main
mengniwang95 Jul 20, 2023
b2d7a07
Update Dockerfile.arm64
mengniwang95 Jul 20, 2023
1677511
Update Dockerfile.arm64
mengniwang95 Jul 20, 2023
cec5086
Update Dockerfile.arm64
mengniwang95 Jul 22, 2023
ec1ab87
Update requirements.txt
mengniwang95 Jul 22, 2023
56bb3e3
Merge pull request #3 from microsoft/main
mengniwang95 Jul 25, 2023
20307c2
Update test_quantize_static.py
mengniwang95 Jul 25, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 50 additions & 0 deletions onnxruntime/python/tools/quantization/quantize.py
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,16 @@ def __init__(
a DeQuantizeLinear node. If False, it remains floating-point bias and does not insert
any quantization nodes associated with biases.
This extra option is only effective when quant_format is QuantFormat.QDQ.
SmoothQuant = True/False :
Default is False. If enabled, SmoothQuant algorithm will be applied before quantization to do
fake input channel quantization.
SmoothQuantAlpha = float :
Default is 0.5. It only works if SmoothQuant is True. It controls the difficulty of weight
and activation quantization. A larger alpha value could be used on models with more significant
activation outliers to migrate more quantization difficulty to weights.
SmoothQuantFolding = True/False :
Default is True. It only works if SmoothQuant is True. If enabled, inserted Mul ops during
SmoothQuant will be folded into the previous op if the previous op is foldable.
execution_provider : A enum indicates the Execution Provider such as: CPU, TRT, NNAPI, SNE, etc.
Raises:
ValueError: Raise ValueError if execution provider is unknown
Expand Down Expand Up @@ -330,6 +340,16 @@ def quantize_static(
Default is 0.01. Constant smoothing factor to use when computing the moving average of the
minimum and maximum values. Effective only when the calibration method selected is MinMax and
when CalibMovingAverage is set to True.
SmoothQuant = True/False :
Default is False. If enabled, SmoothQuant algorithm will be applied before quantization to do
fake input channel quantization.
SmoothQuantAlpha = float :
Default is 0.5. It only works if SmoothQuant is True. It controls the difficulty of weight
and activation quantization. A larger alpha value could be used on models with more significant
activation outliers to migrate more quantization difficulty to weights.
SmoothQuantFolding = True/False :
Default is True. It only works if SmoothQuant is True. If enabled, inserted Mul ops during
SmoothQuant will be folded into the previous op if the previous op is foldable.
"""

extra_options = extra_options or {}
Expand Down Expand Up @@ -362,6 +382,36 @@ def quantize_static(
key: extra_options.get(name) for (name, key) in calib_extra_options_keys if name in extra_options
}

if extra_options.get("SmoothQuant", False):
import importlib

try:
importlib.import_module("neural_compressor.adaptor.ox_utils.smooth_quant")
except Exception as e:
logging.error(f"{e}.")
raise RuntimeError("neural-compressor is not correctly installed. Please check your environment.") from e

import copy

from neural_compressor.adaptor.ox_utils.smooth_quant import ORTSmoothQuant

from .quant_utils import save_and_reload_model

def inc_dataloader():
data_reader = copy.deepcopy(calibration_data_reader)
for data in data_reader:
yield data, None

orig_nodes = [i.name for i in model.graph.node]
dataloader = inc_dataloader()
sq = ORTSmoothQuant(model_input, dataloader, reduce_range)
del dataloader
model = sq.transform(
extra_options.get("SmoothQuantAlpha", 0.5), extra_options.get("SmoothQuantFolding", True)
).model
nodes_to_exclude.extend([i.name for i in model.graph.node if i.name not in orig_nodes])
model = save_and_reload_model(model)

with tempfile.TemporaryDirectory(prefix="ort.quant.") as quant_tmp_dir:
calibrator = create_calibrator(
model,
Expand Down
13 changes: 13 additions & 0 deletions onnxruntime/test/python/quantization/test_quantize_static.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,19 @@ def test_static_quant_config(self):
check_model_correctness(self, self._model_fp32_path, quant_model_path, data_reader.get_next())
data_reader.rewind()

def test_smooth_quant(self):
tianleiwu marked this conversation as resolved.
Show resolved Hide resolved
data_reader = InputFeedsNegOneZeroOne(10, {"input": [1, self._channel_size, 1, 3]})
Fixed Show fixed Hide fixed
quant_config = StaticQuantConfig(data_reader, extra_options={"SmoothQuant": True})
quant_model_path = str(Path(self._tmp_model_dir.name) / "quant.config.onnx")
quantize(self._model_fp32_path, quant_model_path, quant_config)

data_reader.rewind()
check_model_correctness(self, self._model_fp32_path, quant_model_path, data_reader.get_next())
data_reader.rewind()

model = onnx.load(quant_model_path)
self.assertIn("Mul", [i.op_type for i in model.graph.node])


if __name__ == "__main__":
unittest.main()
1 change: 1 addition & 0 deletions requirements-dev.txt
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,4 @@ scipy
sympy
wheel
setuptools>=41.4.0
neural-compressor
tianleiwu marked this conversation as resolved.
Show resolved Hide resolved