Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update SIMD sections #21

Merged
merged 5 commits into from
Jul 18, 2023
Merged

Update SIMD sections #21

merged 5 commits into from
Jul 18, 2023

Conversation

jan-wassenberg
Copy link
Contributor

No description provided.

@dendibakh
Copy link
Owner

Thank you @jan-wassenberg, I will review it in the upcoming days.

Copy link
Owner

@dendibakh dendibakh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jan-wassenberg, for the write-up.
There are no major rewrites, so I expect this PR to be merged quickly.
Don't you mind if I will make cosmetic changes myself (maybe in a subsequent commit)?

chapters/3-CPU-Microarchitecture/3-7 SIMD.md Show resolved Hide resolved
chapters/10-Optimizing-Computations/10-3 Vectorization.md Outdated Show resolved Hide resolved
@@ -4,7 +4,7 @@ typora-root-url: ..\..\img

## Compiler Intrinsics {#sec:secIntrinsics}

There are types of applications that have very few hotspots that call for tuning them heavily. However, compilers do not always do what we want in terms of generated code in those hot places. For example, a program does some computation in a loop which the compiler vectorizes in a suboptimal way. It usually involves some tricky or specialized algorithms, for which we can come up with a better sequence of instructions. It can be very hard or even impossible to make the compiler generate the desired assembly code using standard constructs of the C and C++ languages.
There are types of applications that have hotspots worth tuning heavily. However, compilers do not always do what we want in terms of generated code in those hot places. For example, a program does some computation in a loop which the compiler vectorizes in a suboptimal way. It usually involves some tricky or specialized algorithms, for which we can come up with a better sequence of instructions. It can be very hard or even impossible to make the compiler generate the desired assembly code using standard constructs of the C and C++ languages.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[TODO for Denis] Example has a bug if the trip count is not a multiple of 4.

chapters/10-Optimizing-Computations/10-3 Vectorization.md Outdated Show resolved Hide resolved
@cf-natali
Copy link
Contributor

Might be worth adding a section on Intel AMX?
It's not available in Sapphire Rapids and not that well known yet.

@jan-wassenberg
Copy link
Contributor Author

Thank you @jan-wassenberg, for the write-up. There are no major rewrites, so I expect this PR to be merged quickly. Don't you mind if I will make cosmetic changes myself (maybe in a subsequent commit)?

Yes, please feel free to make cosmetic changes in a subsequent commit :)

@jan-wassenberg
Copy link
Contributor Author

Might be worth adding a section on Intel AMX? It's not available in Sapphire Rapids and not that well known yet.

hm, from my perspective it seems a bit early to write about, personally I have not gathered much experience with it yet. There would also be SME and the various RISC-V extensions, plus POWER10. (BTW you mean it is available in SPR, right?)

@dendibakh
Copy link
Owner

Might be worth adding a section on Intel AMX? It's not available in Sapphire Rapids and not that well known yet.

hm, from my perspective it seems a bit early to write about, personally I have not gathered much experience with it yet. There would also be SME and the various RISC-V extensions, plus POWER10. (BTW you mean it is available in SPR, right?)

I haven't used AMX myself yet. :) Though it's on my TODO list. I have seen a short manual about programming with AMX intrinsics -- will try to find it. I wonder if compilers generate AMX code... I've heard AMX requires a special data layout (or transposition).

Let's not make a whole section about it, but I think mentioning matrix extensions is worth it.

@jan-wassenberg
Copy link
Contributor Author

OK, I've added brief mention of the two AMX plus SME :)

@dendibakh
Copy link
Owner

Thanks @jan-wassenberg , I have done a few cosmetic updates and I'm ready to merge this PR.
Let me know if you have anything else to change. If not, then we're done here.

@jan-wassenberg
Copy link
Contributor Author

Very nice, thanks for making the changes. Looks good to me 👍

@dendibakh dendibakh merged commit b21ed3b into dendibakh:main Jul 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants