Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix WOQ Linear pack slow issue #1828

Merged
merged 15 commits into from
Jun 3, 2024
Merged

Fix WOQ Linear pack slow issue #1828

merged 15 commits into from
Jun 3, 2024

Conversation

Kaihui-intel
Copy link
Contributor

@Kaihui-intel Kaihui-intel commented May 30, 2024

Type of Change

bug fix

Description

solution: use numpy for pack_tensor/unpack_tensor
The RTN quant for microsoft/Phi-3-mini-4k-instruct has been optimized to approximately four times its original performance.

As we found that cuda is more quick than cpu in some cases, we use below behavior as default.

def pack_tensor():
  if 'cuda' in self.device:  # may be xpu also needs it
      pack_tensor_w_torch()
  else:
      pack_tensor_w_numpy()

Expected Behavior & Potential Risk

the expected behavior that triggered by this PR

How has this PR been tested?

how to reproduce the test (including hardware information)

Dependency Change?

any library dependency introduced or removed

@Kaihui-intel Kaihui-intel requested a review from xin3he May 30, 2024 08:24
Copy link

github-actions bot commented May 30, 2024

⛈️ Required checks status: Has failure 🔴

Warning
If you do not have the access to re-run the Probot, please contact XuehaoSun for help. If you push a new commit, all of the workflow will be re-triggered.

Groups summary

🟢 Code Scan Tests workflow
Check ID Status Error details
Code-Scan success
Code-Scan (Bandit Code Scan Bandit) success
Code-Scan (DocStyle Code Scan DocStyle) success
Code-Scan (Pylint Code Scan Pylint) success

These checks are required after the changes to neural_compressor/torch/algorithms/weight_only/modules.py.

🟢 Model Tests 3x workflow
Check ID Status Error details
Model-Test-3x success
Model-Test-3x (Generate Report GenerateReport) success
Model-Test-3x (Run PyTorch Model opt_125m_woq_gptq_int4) success
Model-Test-3x (Run PyTorch Model opt_125m_woq_gptq_int4_dq_bnb) success
Model-Test-3x (Run PyTorch Model opt_125m_woq_gptq_int4_dq_ggml) success

These checks are required after the changes to neural_compressor/torch/algorithms/weight_only/modules.py.

🔴 Unit Tests 3x-PyTorch workflow
Check ID Status Error details
UT-3x-Torch failure
UT-3x-Torch (Coverage Compare CollectDatafiles) failure download
UT-3x-Torch (Unit Test 3x Torch Unit Test 3x Torch) success
UT-3x-Torch (Unit Test 3x Torch baseline Unit Test 3x Torch baseline) success

These checks are required after the changes to neural_compressor/torch/algorithms/weight_only/modules.py.


Thank you for your contribution! 💜

Note
This comment is automatically generated and will be updates every 180 seconds within the next 6 hours. If you have any other questions, contact chensuyue or XuehaoSun for help.

@xin3he xin3he requested a review from changwangss May 31, 2024 05:38
Signed-off-by: Kaihui-intel <[email protected]>
@chensuyue chensuyue merged commit da1ada2 into master Jun 3, 2024
28 of 30 checks passed
@chensuyue chensuyue deleted the kaihui/pack branch June 3, 2024 05:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants