Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CPU][ARM][x64]Snippets MatMul via brgemm emitter and executor #28304

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

chenhu-wang
Copy link
Contributor

@chenhu-wang chenhu-wang commented Jan 8, 2025

Details:

  • Snippets MatMul via brgemm emitter and executor on aarch64 with TPP
  • Snippets MatMul via brgemm emitter and executor on x64 with TPP

Tickets:

@chenhu-wang chenhu-wang requested review from a team as code owners January 8, 2025 06:59
@github-actions github-actions bot added the category: CPU OpenVINO CPU plugin label Jan 8, 2025
@chenhu-wang chenhu-wang marked this pull request as draft January 8, 2025 07:28
@github-actions github-actions bot added the category: build OpenVINO cmake script / infra label Jan 9, 2025
@chenhu-wang chenhu-wang force-pushed the chenhu/snipppets_matmul_via_executor_on_arm branch 3 times, most recently from a5b829d to 6ca4f1b Compare January 9, 2025 08:09
@chenhu-wang chenhu-wang marked this pull request as ready for review January 13, 2025 06:37
@chenhu-wang chenhu-wang requested a review from a team as a code owner January 13, 2025 06:37
@chenhu-wang chenhu-wang force-pushed the chenhu/snipppets_matmul_via_executor_on_arm branch 15 times, most recently from 982e2c2 to 6e05cb1 Compare January 15, 2025 05:33
@chenhu-wang
Copy link
Contributor Author

chenhu-wang commented Jan 15, 2025

@a-sidorova, Could you please review as well, as you are reviewing #28229. The test cases passed on arm for snippets MatMul. Thank you!

src/plugins/intel_cpu/src/nodes/subgraph.cpp Outdated Show resolved Hide resolved
void jit_brgemm_emitter::emit_impl(const std::vector<size_t>& in, const std::vector<size_t>& out) const {
validate_arguments(in, out);
std::unordered_set<size_t> exclude = {};
store_context(exclude);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that we will merge #27391 soon. This PR efficently provides efficient work with reg spills - we will able to spill only needed (live) registers

Just for information and to align with other our activities 😊

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please create the ticket to support optimized reg spills in jit_brgemm emitter on arm and leave the comment with todo here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CVS-162498 is created and comment is added in code, thanks Alexandra!

if (ENABLE_SNIPPETS_LIBXSMM_TPP)
ov_add_compiler_flags(-Wno-missing-declarations)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate why you need to add this flag? Can we avoid it?

Copy link
Contributor Author

@chenhu-wang chenhu-wang Jan 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to suppress "warning as error" in compile libxsmm. Otherwise there are compile error such as "intel_cpu/thirdparty/libxsmm/src/generator_common_aarch64.c:60:6: error: no previous declaration for ‘libxsmm_generator_vcvt_f32i8_aarch64_sve’ [-Werror=missing-declarations]"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please leave a comment that we have to use this flag to avoid thirdparty's compilation errors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment is added!

cmake/features.cmake Show resolved Hide resolved
@a-sidorova a-sidorova self-assigned this Jan 15, 2025
@v-Golubev v-Golubev self-assigned this Jan 16, 2025
@chenhu-wang chenhu-wang force-pushed the chenhu/snipppets_matmul_via_executor_on_arm branch from 6e05cb1 to 96e274c Compare January 22, 2025 06:36
Copy link
Contributor

@a-sidorova a-sidorova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No more major comments from my side 👍🏼

LGTM

Comment on lines 20 to 22
return BrgemmGenericKernelConfig::operator==(rhs) &&
(get_static_params() == rhs.get_static_params() ||
*get_static_params() == *(rhs.get_static_params()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m_compile_flags is the info combined from m_static_params->m_compile_flags and beta

Oh, I see. Then I agree - beta is already handled. Thank you for the explanation!

@a-sidorova a-sidorova added this to the 2025.1 milestone Feb 14, 2025
@a-sidorova a-sidorova added the platform: arm OpenVINO on ARM / ARM64 label Feb 14, 2025
@chenhu-wang chenhu-wang force-pushed the chenhu/snipppets_matmul_via_executor_on_arm branch 5 times, most recently from ce991bb to 2a7d14e Compare February 14, 2025 14:41
Comment on lines +146 to +155
gemm_p.a.primary = in1;
gemm_p.b.primary = in0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in0 and in1 look like mixed up: I'd say that A input should be in0, not in1. However, in x64 impl, there is the same situation... Do you have any idea why it is done in this way?
@IvanNovoselov or maybe you have a secret TPP knowledge why we form runtime args in such way? :)

Copy link
Contributor Author

@chenhu-wang chenhu-wang Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data is row major in ov, MatMul in libxsmm assume data is column major. Exchange in0 and in1 could avoid data repack. The M/N, lda/ldb and in0/in1 precisions are also exchanged in libxsmm_create_gemm_shape(). @IvanNovoselov could you confirm it or correct me if I misunderstand it. Thank you!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, got it, thanks for the explanation! Maybe we could leave an explanatory comment then to avoid potential questions in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment added!

@chenhu-wang chenhu-wang force-pushed the chenhu/snipppets_matmul_via_executor_on_arm branch 3 times, most recently from ce3e097 to 5411abb Compare February 17, 2025 06:35
Copy link
Contributor

@v-Golubev v-Golubev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Comment on lines +146 to +155
gemm_p.a.primary = in1;
gemm_p.b.primary = in0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, got it, thanks for the explanation! Maybe we could leave an explanatory comment then to avoid potential questions in the future?

@v-Golubev v-Golubev assigned IvanNovoselov and unassigned v-Golubev Feb 17, 2025
@chenhu-wang chenhu-wang force-pushed the chenhu/snipppets_matmul_via_executor_on_arm branch from 5411abb to 3560cd9 Compare February 18, 2025 09:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: build OpenVINO cmake script / infra category: CPU OpenVINO CPU plugin platform: arm OpenVINO on ARM / ARM64
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants