-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ARM CPU] Add Fp16 kernels for MatMulNBits #22651
Conversation
There seem to be conflicts in |
3e095fc
to
98b1e5f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
initial review
return CompInt8; | ||
} | ||
// Fallback to fp16. If fp16 optimized path is not available, it will further fall back to fp32. | ||
return CompFp16; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this will return CompFp16 even if accuracy_level_attr is CompFp32?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a point of using CompFp32 for fp16 input if CompFp16 is available. converting fp16 to fp32 does not bring more precision, and the casting only makes the performance worse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it doesn't make sense for fp16 input. for fp16 input, what do you think about treating the default accuracy level value (unset) as CompFp16 and treating an explicit accuracy level of CompFp32 as an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if accuracy 1 is given for fp16 input, maybe show a warning and use compFp16?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, warning is good too
resolved In reply to: 2445691891 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can commit the suggested changes from lintrunner.
PackedQuantBData = reinterpret_cast<std::byte*>(MlasAlignAddress(PackedQuantBWorkspace, 32)); | ||
QuantBBlkSum = reinterpret_cast<T*>(PackedQuantBData + PackedQuantBDataSize); | ||
QuantBBlkSum = reinterpret_cast<T*>(MlasAlignAddress(QuantBBlkSum, MlasQNBitQuantBBlkSumAlignment())); | ||
PackedQuantBScale = reinterpret_cast<T*>(reinterpret_cast<std::byte*>(QuantBBlkSum) + BlkSumSize); |
Check failure
Code scanning / CodeQL
Suspicious pointer scaling High
float
This pointer might have type
MLFloat16
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. had a few comments.
template <typename ElementType> | ||
std::vector<ElementType> RandomVectorUniform( | ||
typename std::enable_if_t<!std::is_same_v<ElementType, MLAS_FP16>, std::vector<ElementType>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: would it be simpler to have a specialization for MLAS_FP16
instead of two enable_if
s?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this is a .h file included in multiple .cpp files, using specialization will trigger redefinition. so I chose to use enable_if
57ef96b
to
037db3f
Compare
037db3f
to
8050f0a
Compare
### Description A break down PR of #22651 Add fp16 kernels. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
### Description A break-down PR of #22651 Op API change only. - add template to functions and classes that support fp32 and fp16 - rename functions, classes and files that support fp32 and fp16 from SQNBxxx to QNBxxx ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
### Description A breakdown PR of #22651 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
### Description A break down PR of microsoft#22651 Add fp16 kernels. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
### Description A break-down PR of microsoft#22651 Op API change only. - add template to functions and classes that support fp32 and fp16 - rename functions, classes and files that support fp32 and fp16 from SQNBxxx to QNBxxx ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
### Description A breakdown PR of microsoft#22651 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
### Description A break down PR of #22651 Add fp16 kernels. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
### Description A break-down PR of #22651 Op API change only. - add template to functions and classes that support fp32 and fp16 - rename functions, classes and files that support fp32 and fp16 from SQNBxxx to QNBxxx ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
### Description A breakdown PR of #22651 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
### Description A break down PR of microsoft#22651 Add fp16 kernels. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
### Description A break-down PR of microsoft#22651 Op API change only. - add template to functions and classes that support fp32 and fp16 - rename functions, classes and files that support fp32 and fp16 from SQNBxxx to QNBxxx ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
### Description A breakdown PR of microsoft#22651 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
### Description A break down PR of microsoft#22651 Add fp16 kernels. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
### Description A break-down PR of microsoft#22651 Op API change only. - add template to functions and classes that support fp32 and fp16 - rename functions, classes and files that support fp32 and fp16 from SQNBxxx to QNBxxx ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
### Description A breakdown PR of microsoft#22651 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
### Description A break down PR of microsoft#22651 Add fp16 kernels. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
### Description A break-down PR of microsoft#22651 Op API change only. - add template to functions and classes that support fp32 and fp16 - rename functions, classes and files that support fp32 and fp16 from SQNBxxx to QNBxxx ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
### Description A breakdown PR of microsoft#22651 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
Description
Add Fp16 kernels for MatMulNBits.
Support Fp16 A calculate using accuracy 2.
BlkLen:128/Symmetric:0/HasBias:1
Motivation and Context
Add cross-device data type support.