-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix WOQ Linear pack slow issue #1828
Conversation
Signed-off-by: Kaihui-intel <[email protected]>
Signed-off-by: Kaihui-intel <[email protected]>
Signed-off-by: Kaihui-intel <[email protected]>
Signed-off-by: Kaihui-intel <[email protected]>
Signed-off-by: Kaihui-intel <[email protected]>
Signed-off-by: Kaihui-intel <[email protected]>
Signed-off-by: Kaihui-intel <[email protected]>
Signed-off-by: Kaihui-intel <[email protected]>
⛈️ Required checks status: Has failure 🔴
Groups summary🟢 Code Scan Tests workflow
These checks are required after the changes to 🟢 Model Tests 3x workflow
These checks are required after the changes to 🔴 Unit Tests 3x-PyTorch workflow
These checks are required after the changes to Thank you for your contribution! 💜
|
for more information, see https://pre-commit.ci
Signed-off-by: Kaihui-intel <[email protected]>
Signed-off-by: Kaihui-intel <[email protected]>
for more information, see https://pre-commit.ci
Type of Change
bug fix
Description
solution: use numpy for pack_tensor/unpack_tensor
The RTN quant for
microsoft/Phi-3-mini-4k-instruct
has been optimized to approximately four times its original performance.As we found that cuda is more quick than cpu in some cases, we use below behavior as default.
Expected Behavior & Potential Risk
the expected behavior that triggered by this PR
How has this PR been tested?
how to reproduce the test (including hardware information)
Dependency Change?
any library dependency introduced or removed