3.x SQ supports calib_func for auto-tune #1812

violetch24 · 2024-05-23T08:04:45Z

Type of Change

sq supports calib_func for auto-tune, no need for dataloader

Description

Layer-wise & block-wise enable
Add ut check auto-tune
Check llm examples

Expected Behavior & Potential Risk

How has this PR been tested?

Dependency Change?

Signed-off-by: Cheng, Zixuan <[email protected]>

github-actions · 2024-05-23T12:47:39Z

🌩️ Required checks status: Pending 🟡

Groups summary

🟡 Code Scan Tests workflow

Check ID	Status
Code-Scan	no_status	❓
Code-Scan (Bandit Code Scan Bandit)	no_status	❓
Code-Scan (DocStyle Code Scan DocStyle)	no_status	❓
Code-Scan (Pylint Code Scan Pylint)	no_status	❓

These checks are required after the changes to neural_compressor/torch/algorithms/smooth_quant/utility.py.

🟡 Model Tests 3x workflow

Check ID	Status
Model-Test-3x	no_status	❓
Model-Test-3x (Generate Report GenerateReport)	no_status	❓
Model-Test-3x (Run PyTorch Model opt_125m_woq_gptq_int4)	no_status	❓
Model-Test-3x (Run PyTorch Model opt_125m_woq_gptq_int4_dq_bnb)	no_status	❓
Model-Test-3x (Run PyTorch Model opt_125m_woq_gptq_int4_dq_ggml)	no_status	❓

These checks are required after the changes to neural_compressor/torch/algorithms/smooth_quant/utility.py.

🟡 Unit Tests 3x-PyTorch workflow

Check ID	Status
UT-3x-Torch	no_status	❓
UT-3x-Torch (Coverage Compare CollectDatafiles)	no_status	❓
UT-3x-Torch (Unit Test 3x Torch Unit Test 3x Torch)	no_status	❓
UT-3x-Torch (Unit Test 3x Torch baseline Unit Test 3x Torch baseline)	no_status	❓

These checks are required after the changes to neural_compressor/torch/algorithms/smooth_quant/utility.py, test/3x/torch/quantization/test_smooth_quant.py.

Thank you for your contribution! 💜

Note
This comment is automatically generated and will be updates every 180 seconds within the next 6 hours. If you have any other questions, contact chensuyue or XuehaoSun for help.

xin3he

Please refer to #1810
Capture input data from run_fn and rebuild CapturedDataloader for SQ calibration

xin3he · 2024-05-24T09:18:13Z

neural_compressor/torch/algorithms/smooth_quant/utility.py

@@ -465,6 +473,9 @@ def forward_wrapper(model, input, device=torch.device("cpu")):  # pragma: no cov
            output = model(*input)
        except:
            output = model(input)
+    elif isinstance(input, zip):
+        for args, kwargs in input:
+            output = model(*args, **kwargs)


xin3he · 2024-05-24T09:23:51Z

neural_compressor/torch/algorithms/smooth_quant/utility.py


        block_modules = {}
        for key in self.block_names:
            block_modules[key] = get_module(self.model, key)
        self._add_blockwise_observer(block_modules)

-        forward_wrapper(self.model, input, self.device)  ##disable quant and get fp32 output
+        # get input args and kwargs for the first block, then do forward
+        total_block_args, total_block_kwargs = get_hidden_states(self.model, calib_sample_num, calib_func)


seems like we fused all output into once instead of per input? Will it impact the final result @yintong-lu

violetch24 · 2024-05-28T05:51:26Z

See new design in #1821

violetch24 and others added 4 commits May 23, 2024 16:01

3.x SQ supports calib_func for auto-tune

0c45b36

Signed-off-by: Cheng, Zixuan <[email protected]>

Merge branch 'master' into zixuan/3.x_sq_auto

c3eb6a1

minor fix

b4870d8

Signed-off-by: Cheng, Zixuan <[email protected]>

fix hidden_states

4a0c8b9

Signed-off-by: Cheng, Zixuan <[email protected]>

violetch24 requested review from yintong-lu and xin3he May 23, 2024 12:47

violetch24 marked this pull request as ready for review May 23, 2024 12:47

xin3he requested changes May 24, 2024

View reviewed changes

violetch24 closed this May 28, 2024

violetch24 deleted the zixuan/3.x_sq_auto branch May 28, 2024 05:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3.x SQ supports calib_func for auto-tune #1812

3.x SQ supports calib_func for auto-tune #1812

violetch24 commented May 23, 2024

github-actions bot commented May 23, 2024

xin3he left a comment

xin3he May 24, 2024

xin3he May 24, 2024

violetch24 commented May 28, 2024 •

edited

Loading

3.x SQ supports calib_func for auto-tune #1812

3.x SQ supports calib_func for auto-tune #1812

Conversation

violetch24 commented May 23, 2024

Type of Change

Description

Expected Behavior & Potential Risk

How has this PR been tested?

Dependency Change?

github-actions bot commented May 23, 2024

🌩️ Required checks status: Pending 🟡

Groups summary

xin3he left a comment

Choose a reason for hiding this comment

xin3he May 24, 2024

Choose a reason for hiding this comment

xin3he May 24, 2024

Choose a reason for hiding this comment

violetch24 commented May 28, 2024 • edited Loading

violetch24 commented May 28, 2024 •

edited

Loading