-
-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update SIMD sections #21
Conversation
Thank you @jan-wassenberg, I will review it in the upcoming days. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jan-wassenberg, for the write-up.
There are no major rewrites, so I expect this PR to be merged quickly.
Don't you mind if I will make cosmetic changes myself (maybe in a subsequent commit)?
chapters/10-Optimizing-Computations/10-4 Compiler Intrinsics.md
Outdated
Show resolved
Hide resolved
@@ -4,7 +4,7 @@ typora-root-url: ..\..\img | |||
|
|||
## Compiler Intrinsics {#sec:secIntrinsics} | |||
|
|||
There are types of applications that have very few hotspots that call for tuning them heavily. However, compilers do not always do what we want in terms of generated code in those hot places. For example, a program does some computation in a loop which the compiler vectorizes in a suboptimal way. It usually involves some tricky or specialized algorithms, for which we can come up with a better sequence of instructions. It can be very hard or even impossible to make the compiler generate the desired assembly code using standard constructs of the C and C++ languages. | |||
There are types of applications that have hotspots worth tuning heavily. However, compilers do not always do what we want in terms of generated code in those hot places. For example, a program does some computation in a loop which the compiler vectorizes in a suboptimal way. It usually involves some tricky or specialized algorithms, for which we can come up with a better sequence of instructions. It can be very hard or even impossible to make the compiler generate the desired assembly code using standard constructs of the C and C++ languages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[TODO for Denis] Example has a bug if the trip count is not a multiple of 4.
Might be worth adding a section on Intel AMX? |
Yes, please feel free to make cosmetic changes in a subsequent commit :) |
hm, from my perspective it seems a bit early to write about, personally I have not gathered much experience with it yet. There would also be SME and the various RISC-V extensions, plus POWER10. (BTW you mean it is available in SPR, right?) |
I haven't used AMX myself yet. :) Though it's on my TODO list. I have seen a short manual about programming with AMX intrinsics -- will try to find it. I wonder if compilers generate AMX code... I've heard AMX requires a special data layout (or transposition). Let's not make a whole section about it, but I think mentioning matrix extensions is worth it. |
OK, I've added brief mention of the two AMX plus SME :) |
Thanks @jan-wassenberg , I have done a few cosmetic updates and I'm ready to merge this PR. |
Very nice, thanks for making the changes. Looks good to me 👍 |
No description provided.