Fix WOQ Linear pack slow issue #1828

Kaihui-intel · 2024-05-30T08:24:52Z

Type of Change

bug fix

Description

solution: use numpy for pack_tensor/unpack_tensor
The RTN quant for microsoft/Phi-3-mini-4k-instruct has been optimized to approximately four times its original performance.

As we found that cuda is more quick than cpu in some cases, we use below behavior as default.

def pack_tensor():
  if 'cuda' in self.device:  # may be xpu also needs it
      pack_tensor_w_torch()
  else:
      pack_tensor_w_numpy()

Expected Behavior & Potential Risk

the expected behavior that triggered by this PR

How has this PR been tested?

how to reproduce the test (including hardware information)

Dependency Change?

any library dependency introduced or removed

Signed-off-by: Kaihui-intel <[email protected]>

github-actions · 2024-05-30T08:25:14Z

⛈️ Required checks status: Has failure 🔴

Warning
If you do not have the access to re-run the Probot, please contact XuehaoSun for help. If you push a new commit, all of the workflow will be re-triggered.

Groups summary

🟢 Code Scan Tests workflow

Check ID	Status
Code-Scan	success	✅
Code-Scan (Bandit Code Scan Bandit)	success	✅
Code-Scan (DocStyle Code Scan DocStyle)	success	✅
Code-Scan (Pylint Code Scan Pylint)	success	✅

These checks are required after the changes to neural_compressor/torch/algorithms/weight_only/modules.py.

🟢 Model Tests 3x workflow

Check ID	Status
Model-Test-3x	success	✅
Model-Test-3x (Generate Report GenerateReport)	success	✅
Model-Test-3x (Run PyTorch Model opt_125m_woq_gptq_int4)	success	✅
Model-Test-3x (Run PyTorch Model opt_125m_woq_gptq_int4_dq_bnb)	success	✅
Model-Test-3x (Run PyTorch Model opt_125m_woq_gptq_int4_dq_ggml)	success	✅

These checks are required after the changes to neural_compressor/torch/algorithms/weight_only/modules.py.

🔴 Unit Tests 3x-PyTorch workflow

Check ID	Status	Error details
UT-3x-Torch	failure		❌
UT-3x-Torch (Coverage Compare CollectDatafiles)	failure	download	❌
UT-3x-Torch (Unit Test 3x Torch Unit Test 3x Torch)	success		✅
UT-3x-Torch (Unit Test 3x Torch baseline Unit Test 3x Torch baseline)	success		✅

These checks are required after the changes to neural_compressor/torch/algorithms/weight_only/modules.py.

Thank you for your contribution! 💜

Note
This comment is automatically generated and will be updates every 180 seconds within the next 6 hours. If you have any other questions, contact chensuyue or XuehaoSun for help.

for more information, see https://pre-commit.ci

Signed-off-by: Kaihui-intel <[email protected]>

neural_compressor/torch/algorithms/weight_only/modules.py

Signed-off-by: Kaihui-intel <[email protected]>

for more information, see https://pre-commit.ci

Kaihui-intel added 11 commits May 22, 2024 13:33

adapt v0.2

32a6612

Signed-off-by: Kaihui-intel <[email protected]>

Merge branch 'master' of https://github.com/intel/neural-compressor

194026e

Merge branch 'master' of https://github.com/intel/neural-compressor

61da0cb

Merge branch 'master' of https://github.com/intel/neural-compressor

e12e3a9

add timestep

afa010f

Signed-off-by: Kaihui-intel <[email protected]>

add threadpool

8a6f851

Signed-off-by: Kaihui-intel <[email protected]>

pack timestep

68776e7

Signed-off-by: Kaihui-intel <[email protected]>

use numpy to pack/unpack

78cf705

Signed-off-by: Kaihui-intel <[email protected]>

add acclerator sync

3487ef5

Signed-off-by: Kaihui-intel <[email protected]>

clean timesteps

bd03c80

Signed-off-by: Kaihui-intel <[email protected]>

rebase master

ced6e60

Signed-off-by: Kaihui-intel <[email protected]>

Kaihui-intel requested a review from xin3he May 30, 2024 08:24

[pre-commit.ci] auto fixes from pre-commit.com hooks

7102312

for more information, see https://pre-commit.ci

xin3he approved these changes May 31, 2024

View reviewed changes

xin3he requested a review from changwangss May 31, 2024 05:38

changwangss approved these changes May 31, 2024

View reviewed changes

add cuda pack

acf5635

Signed-off-by: Kaihui-intel <[email protected]>

xin3he approved these changes Jun 1, 2024

View reviewed changes

neural_compressor/torch/algorithms/weight_only/modules.py Outdated Show resolved Hide resolved

Kaihui-intel and others added 2 commits June 3, 2024 09:47

update logic

ff17970

Signed-off-by: Kaihui-intel <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

dc242a9

for more information, see https://pre-commit.ci

chensuyue merged commit da1ada2 into master Jun 3, 2024
28 of 30 checks passed

chensuyue deleted the kaihui/pack branch June 3, 2024 05:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix WOQ Linear pack slow issue #1828

Fix WOQ Linear pack slow issue #1828

Kaihui-intel commented May 30, 2024 •

edited by xin3he

Loading

github-actions bot commented May 30, 2024 •

edited

Loading

Fix WOQ Linear pack slow issue #1828

Fix WOQ Linear pack slow issue #1828

Conversation

Kaihui-intel commented May 30, 2024 • edited by xin3he Loading

Type of Change

Description

Expected Behavior & Potential Risk

How has this PR been tested?

Dependency Change?

github-actions bot commented May 30, 2024 • edited Loading

⛈️ Required checks status: Has failure 🔴

Groups summary

Kaihui-intel commented May 30, 2024 •

edited by xin3he

Loading

github-actions bot commented May 30, 2024 •

edited

Loading