Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Symmetric QGEMM kernel for ARMv8 A55 chip #10754

Merged
merged 4 commits into from
Mar 7, 2022

Conversation

chenfucn
Copy link
Contributor

@chenfucn chenfucn commented Mar 3, 2022

ARM a55 micro-architecture (with dot product instructions), similar to a53, is widely used as little cores in big.Little configurations. A55 has a narrower memory load/store hardware, where a 128b load instruction would block the pipeline for 2 whole cycles, during which no other instructions can be executed. On the other hand, a 64b load instruction can be duo issued with many other instructions.

This change adds a Symmetric QGEMM kernel for a55 micro-architecture, where we replace

ldr q4,[x1],#16

with

ldr d4,[x1],#8
ldr x11,[x1],#8
ins v4.d[1],x11

so that we can try to hide the memory load cycles behind computing cycles in the kernel.

yufenglee
yufenglee previously approved these changes Mar 3, 2022
Copy link
Member

@yufenglee yufenglee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@yihonglyu
Copy link
Contributor

Do you have a performance number with the change?

@chenfucn chenfucn merged commit 50a6f09 into microsoft:master Mar 7, 2022
@chenfucn chenfucn deleted the cfu_symma55 branch March 7, 2022 16:41
lavanyax pushed a commit to intel/onnxruntime that referenced this pull request Mar 29, 2022
ARM a55 micro-architecture (with dot product instructions), similar to a53, is widely used as little cores in big.Little configurations. A55 has a narrower memory load/store hardware, where a 128b load instruction would block the pipeline for 2 whole cycles, during which no other instructions can be executed. On the other hand, a 64b load instruction can be duo issued with many other instructions.

This change adds a Symmetric QGEMM kernel for a55 micro-architecture, where we replace

ldr q4,[x1],#16

with

ldr d4,[x1],#8
ldr x11,[x1],#8
ins v4.d[1],x11

so that we can try to hide the memory load cycles behind computing cycles in the kernel.

Co-authored-by: Chen Fu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants