Update SIMD sections #21

jan-wassenberg · 2023-06-19T12:15:45Z

No description provided.

dendibakh · 2023-06-20T22:21:14Z

Thank you @jan-wassenberg, I will review it in the upcoming days.

dendibakh

Thank you @jan-wassenberg, for the write-up.
There are no major rewrites, so I expect this PR to be merged quickly.
Don't you mind if I will make cosmetic changes myself (maybe in a subsequent commit)?

chapters/3-CPU-Microarchitecture/3-7 SIMD.md

chapters/10-Optimizing-Computations/10-3 Vectorization.md

chapters/10-Optimizing-Computations/10-4 Compiler Intrinsics.md

dendibakh · 2023-06-21T21:07:24Z

chapters/10-Optimizing-Computations/10-4 Compiler Intrinsics.md

@@ -4,7 +4,7 @@ typora-root-url: ..\..\img

 ## Compiler Intrinsics {#sec:secIntrinsics}

-There are types of applications that have very few hotspots that call for tuning them heavily. However, compilers do not always do what we want in terms of generated code in those hot places. For example, a program does some computation in a loop which the compiler vectorizes in a suboptimal way. It usually involves some tricky or specialized algorithms, for which we can come up with a better sequence of instructions. It can be very hard or even impossible to make the compiler generate the desired assembly code using standard constructs of the C and C++ languages.
+There are types of applications that have hotspots worth tuning heavily. However, compilers do not always do what we want in terms of generated code in those hot places. For example, a program does some computation in a loop which the compiler vectorizes in a suboptimal way. It usually involves some tricky or specialized algorithms, for which we can come up with a better sequence of instructions. It can be very hard or even impossible to make the compiler generate the desired assembly code using standard constructs of the C and C++ languages.


[TODO for Denis] Example has a bug if the trip count is not a multiple of 4.

chapters/10-Optimizing-Computations/10-3 Vectorization.md

chapters/10-Optimizing-Computations/10-4 Compiler Intrinsics.md

cf-natali · 2023-07-16T10:53:45Z

Might be worth adding a section on Intel AMX?
It's not available in Sapphire Rapids and not that well known yet.

jan-wassenberg · 2023-07-17T09:44:59Z

Thank you @jan-wassenberg, for the write-up. There are no major rewrites, so I expect this PR to be merged quickly. Don't you mind if I will make cosmetic changes myself (maybe in a subsequent commit)?

Yes, please feel free to make cosmetic changes in a subsequent commit :)

jan-wassenberg · 2023-07-17T10:37:17Z

Might be worth adding a section on Intel AMX? It's not available in Sapphire Rapids and not that well known yet.

hm, from my perspective it seems a bit early to write about, personally I have not gathered much experience with it yet. There would also be SME and the various RISC-V extensions, plus POWER10. (BTW you mean it is available in SPR, right?)

dendibakh · 2023-07-17T11:30:56Z

Might be worth adding a section on Intel AMX? It's not available in Sapphire Rapids and not that well known yet.

hm, from my perspective it seems a bit early to write about, personally I have not gathered much experience with it yet. There would also be SME and the various RISC-V extensions, plus POWER10. (BTW you mean it is available in SPR, right?)

I haven't used AMX myself yet. :) Though it's on my TODO list. I have seen a short manual about programming with AMX intrinsics -- will try to find it. I wonder if compilers generate AMX code... I've heard AMX requires a special data layout (or transposition).

Let's not make a whole section about it, but I think mentioning matrix extensions is worth it.

jan-wassenberg · 2023-07-17T12:05:36Z

OK, I've added brief mention of the two AMX plus SME :)

chapters/3-CPU-Microarchitecture/3-7 SIMD.md

dendibakh · 2023-07-17T19:01:58Z

Thanks @jan-wassenberg , I have done a few cosmetic updates and I'm ready to merge this PR.
Let me know if you have anything else to change. If not, then we're done here.

jan-wassenberg · 2023-07-18T07:48:16Z

Very nice, thanks for making the changes. Looks good to me 👍

Update SIMD sections

ab4b085

dendibakh reviewed Jun 21, 2023

View reviewed changes

Update SIMD sections

a25ba9d

Update SIMD sections

dd5c4d8

Cosmetic updates to SIMD uarch section

4a0d022

dendibakh reviewed Jul 17, 2023

View reviewed changes

chapters/3-CPU-Microarchitecture/3-7 SIMD.md Outdated Show resolved Hide resolved

Cosmetic updates. part2

850ac32

dendibakh merged commit b21ed3b into dendibakh:main Jul 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update SIMD sections #21

Update SIMD sections #21

jan-wassenberg commented Jun 19, 2023

dendibakh commented Jun 20, 2023

dendibakh left a comment

dendibakh Jun 21, 2023

cf-natali commented Jul 16, 2023

jan-wassenberg commented Jul 17, 2023

jan-wassenberg commented Jul 17, 2023

dendibakh commented Jul 17, 2023

jan-wassenberg commented Jul 17, 2023

dendibakh commented Jul 17, 2023

jan-wassenberg commented Jul 18, 2023

Update SIMD sections #21

Update SIMD sections #21

Conversation

jan-wassenberg commented Jun 19, 2023

dendibakh commented Jun 20, 2023

dendibakh left a comment

Choose a reason for hiding this comment

dendibakh Jun 21, 2023

Choose a reason for hiding this comment

cf-natali commented Jul 16, 2023

jan-wassenberg commented Jul 17, 2023

jan-wassenberg commented Jul 17, 2023

dendibakh commented Jul 17, 2023

jan-wassenberg commented Jul 17, 2023

dendibakh commented Jul 17, 2023

jan-wassenberg commented Jul 18, 2023