Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create branch according to cpu core uarch #10521

Merged
merged 1 commit into from
Feb 14, 2022

Conversation

chenfucn
Copy link
Contributor

Description:

This is a preparation change for a bigger goal.

On ARM64 CPUs with Big.Little, different cores are always the same architecture but different micro-architecture. Specifically, it is often that the little core has narrow memory buses that makes 128b load very slow. While if we always use 64b load in our kernels, the code will run slower on big cores. As a result, we need to run different code on different cores to achieve better performance.

This change constructs a manifold that pivot based on the core micro-architecture of the current core, so that we can develop and call different kernels accordingly.

Copy link
Member

@yufenglee yufenglee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@chenfucn chenfucn merged commit 58f80c1 into microsoft:master Feb 14, 2022
@chenfucn chenfucn deleted the cfu_uarch branch February 14, 2022 23:16
chenfucn added a commit that referenced this pull request Feb 25, 2022
Prev merged pull request has a bug:

#10521

It was aimed to detect current CPU core micro-architecture and select a best suited kernel. Unfortunately it assumes that a thread can never migrate from one core to another.

This change tries to fix that problem. It introduces about 2-5% performance degradation on symmetric quantized matmul

Co-authored-by: Chen Fu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants