Add kleidiai as thirdparty #27331

mory91 · 2024-10-30T09:09:24Z

Details:

This PR aims to add KleidiAI third-party library.

alvoron · 2024-11-25T08:01:02Z

build_jenkins

alvoron · 2024-11-28T08:00:01Z

build_jenkins

src/CMakeLists.txt

src/plugins/intel_cpu/CMakeLists.txt

src/plugins/intel_cpu/thirdparty/CMakeLists.txt

ilya-lavrenov · 2024-11-28T10:25:46Z

src/plugins/intel_cpu/CMakeLists.txt

@@ -175,6 +176,11 @@ if(DNNL_USE_ACL)
    set(OV_CPU_WITH_ACL ON)
 endif()

+if(ENABLE_KLEIDIAI_FOR_CPU)
+    add_definitions(-DOV_CPU_WITH_KLEIDIAI)
+    set(OV_CPU_WITH_KLEIDIAI ON)


looks like it's not used

I plan to use it later

ok, why ENABLE_KLEIDIAI_FOR_CPU is not enough?

src/plugins/intel_cpu/CMakeLists.txt

alvoron · 2024-11-28T17:17:00Z

build_jenkins

alvoron · 2024-11-29T08:04:24Z

build_jenkins

alvoron · 2024-11-29T09:01:36Z

build_jenkins

alvoron · 2024-11-29T09:33:53Z

build_jenkins

alvoron · 2024-11-29T10:59:06Z

build_jenkins

alvoron · 2024-11-29T12:20:43Z

build_jenkins

alvoron · 2024-12-02T07:59:52Z

build_jenkins

alvoron · 2024-12-02T10:38:26Z

build_jenkins

alvoron · 2024-12-03T13:38:24Z

build_jenkins

praasz · 2024-12-17T05:54:19Z

src/plugins/intel_cpu/tests/functional/utils/cpu_test_utils.cpp

@@ -218,8 +218,8 @@ void CPUTestsBase::CheckPluginRelatedResultsImpl(const std::shared_ptr<const ov:

            auto primType = getExecValue(ov::exec_model_info::IMPL_TYPE);

-            ASSERT_TRUE(primTypeCheck(primType))
-                << "primType is unexpected : " << primType << " Expected : " << selectedType;
+//            ASSERT_TRUE(primTypeCheck(primType))


Should be removed?

github-actions · 2025-01-01T00:25:57Z

This PR will be closed in a week because of 2 weeks of no activity.

github-actions · 2025-01-17T00:22:56Z

This PR will be closed in a week because of 2 weeks of no activity.

NishantPrabhuFujitsu · 2025-01-27T08:56:25Z

Hi. I was trying to use these changes locally to see if KleidiAI gets used for fp32 inference. I see that the ACL executor still gets used and not Kleidi's. Does OpenVINO have to be built with any special flags for this to work, or is integration not complete yet? I have detailed my experiment setup below for reference.

Setup

I replicated the changes in my fork along with the following changes in intel_cpu/src/nodes/executors/fullyconnected_implementations.cpp (see // <<< ADDED BY ME >>> blocks):

using LayoutConfig = std::vector<LayoutType>;
static const LayoutConfig dnnlFCLayoutConfig{LayoutType::ncsp, LayoutType::ncsp, LayoutType::ncsp, LayoutType::ncsp};
static const LayoutConfig aclFCLayoutConfig{LayoutType::ncsp, LayoutType::ncsp, LayoutType::ncsp, LayoutType::ncsp};
// <<< ADDED BY ME >>>
static const LayoutConfig kleidiaiFCLayoutConfig{LayoutType::ncsp, LayoutType::ncsp, LayoutType::ncsp, LayoutType::ncsp};

template <dnnl::impl::cpu::x64::cpu_isa_t ISA>
struct Require {
    bool operator()() {
        return dnnl::impl::cpu::x64::mayiuse(ISA);
    }
};

// clang-format off
static const TypeMapping dnnlFCTypeMapping {
    // {src, wei, bia, dst}                                   pt<src, wei, bias, dst>
    {{_bf16, _bf16 | _f32, _any, _bf16 | _f32},               pt(bypass(), bypass(), use<3>(), bypass())},
    {{_f16, _f16, _any, _f16 | _f32},                         pt(bypass(), bypass(), use<3>(), bypass())},
    // integer precision outputs are not supported for float precision inputs
    {{_f32 | _bf16 | _f16, _any, _any, _i8 | _u8},            pt(bypass(), bypass(), use<0>(), use<0>())},
    // compresses float weights which do not match input data precision
    {{_f32, _half_float, _any, _any | _any},                  pt(bypass(), bypass(), use<0>(), use<0>())},
    {{_bf16, _f16, _any, _any | _any},                        pt(bypass(), bypass(), use<0>(), use<0>())},
    {{_f16, _bf16, _any, _any | _any},                        pt(bypass(), bypass(), use<0>(), use<0>())},
    // quantization configuration
    // int8 inner_product does not support f16 output and bias
    {{_u8 | _i8, _i8, _u8 | _i8 | _i32 | _bf16 | _f32 | _undefined, _u8 | _i8 | _i32 | _bf16 | _f32}, pt(bypass(), bypass(), bypass(),  bypass())},
    {{_u8 | _i8, _i8, _f16, _u8 | _i8 | _i32 | _bf16 | _f32}, pt(bypass(), bypass(), just<f32>(), bypass())},
    {{_u8 | _i8, _i8, _any, _any}, pt(bypass(), bypass(), just<f32>(), just<f32>())},
    // compresses int weights (@todo more strict requrements for output precision?)
    {{_bf16, _u8 | _i8 | _nf4 | _u4 | _i4 | _f4e2m1, _any, _any},       pt(bypass(), bypass(), use<0>(), use<0>()),
     Require<dnnl::impl::cpu::x64::avx512_core_bf16>()}, // Ticket 122347
    {{_bf16, _u8 | _i8 | _nf4 | _u4 | _i4 | _f4e2m1, _any, _any},       pt(just<f32>(), bypass(), just<f32>(), just<f32>())},
    {{_f32,  _u8 | _i8 | _nf4 | _u4 | _i4 | _f4e2m1, _any, _any},       pt(bypass(), bypass(), use<0>(), use<0>())},
    // @todo should we fallback to FPXX instead of _f32?
    {{_any, _any, _any, _any},                                pt(just<f32>(), just<f32>(), just<f32>(), just<f32>())},
    // @todo explicitly cover configuration limitations for oneDNN on ARM
};

static const TypeMapping aclFCTypeMapping {
    // {src, wei, bia, dst}                  pt<src, wei, bias, dst>
    {{_f32 | _f16, _f32 | _f16, _any, _any}, pt(bypass(), bypass(), use<0>(), use<0>())},
    {{_any, _any, _any, _any},               pt(just<f32>(), just<f32>(), just<f32>(), just<f32>())}
};
// <<< ADDED BY ME >>>
static const TypeMapping kleidiaiFCTypeMapping {
    // {src, wei, bia, dst}                 pt<src, wei, bias, dst>
    {{_f32, _f32, _any, _f32},              pt(bypass(), bypass(), use<0>(), bypass())},
    {{_any, _any, _any, _any},              pt(just<f32>(), just<f32>(), just<f32>(), just<f32>())}
};

static const TypeMapping aclLowpFCTypeMapping {
    // {src, wei, bia, dst}                  pt<src, wei, bias, dst>
    {{_i8, _i8, _any, _f32},                 pt(bypass(), bypass(), use<3>(), bypass())}
};

static const MappingNotation dnnlConvolutionMappingNotation {
    ARG_SRC, ARG_WEI, ARG_BIAS, ARG_DST
};

static const MappingNotation aclFullyConnectedMappingNotation {
    ARG_SRC, ARG_WEI, ARG_BIAS, ARG_DST
};
// <<< ADDED BY ME >>>
static const MappingNotation kleidiaiFullyConnectedMappingNotation {
    ARG_SRC, ARG_WEI, ARG_BIAS, ARG_DST
};

and this change to requiresFallback method in the CPU instance definition.

// requiresFallback
[](const FCConfig& config) -> ov::optional<executor::Config<FCAttrs>> {
    return requiresFallbackCommon(config,
                                    kleidiaiFCTypeMapping,
                                    kleidiaiFCLayoutConfig,
                                    kleidiaiFullyConnectedMappingNotation);
},

Then I built OpenVINO with the usual commands:

cmake -DCMAKE_BUILD_TYPE=Release -DENABLE_PYTHON=ON -DENABLE_WHEEL=ON ..
cmake --build . --parallel 32

and installed the generated .whl file before running my inference script. Inference was run on Graviton3 on AWS.

NishantPrabhuFujitsu · 2025-01-28T10:55:07Z

@dmitry-gorokhov Any insights on the above?

dmitry-gorokhov · 2025-02-03T08:57:05Z

@NishantPrabhuFujitsu I just tried to build this PR with enabled tests (-DENABLE_TESTS=ON) and run corresponding FC tests: ./ov_cpu_func_tests --gtest_filter=*FC_KLEIDIAI_2D*. Most of the tests are executed via MatMulKleidiAIExecutor, others are not supported due to conditions in:

openvino/src/plugins/intel_cpu/src/nodes/executors/fullyconnected_implementations.cpp

Lines 224 to 228 in e3424a0

    
           VERIFY(noPostOps(config), UNSUPPORTED_POST_OPS); 
        
           VERIFY(noSparseDecompression(config), UNSUPPORTED_SPARSE_WEIGHTS); 
        
           VERIFY(noWeightsDecompression(config), UNSUPPORTED_WEIGHTS_DECOMPRESSION); 
        
           VERIFY(everyone_is(f32, srcType(config), weiType(config), dstType(config)), UNSUPPORTED_SRC_PRECISIONS); 
        
           return MatMulKleidiAIExecutor::supports(config);

. In other words the executor works as expected.

I am not sure which workload you are trying to run and what's the difference in the graphs patterns. I would recommend to check which condition from the above code link returns false

NishantPrabhuFujitsu · 2025-02-03T09:12:47Z

src/plugins/intel_cpu/src/nodes/executors/kleidiai/kleidiai_mm.cpp

+}
+
+bool MatMulKleidiAIExecutor::supports(const FCConfig& config) {
+    if (!config.attrs.weightsNonTransposed)


@dmitry-gorokhov I investigated further and found that this check (line 34) fails causing Kleidi executor to not get called. Is this behaviour expected? I was just running inference for an LLM in the exact same way as I have for the contributions I have made in the past.

I will try running the tests in the meantime.

It is a gap in current MatMulKleidiAIExecutor coverage.
@alvoron generously ageed to help. He will extend MatMulKleidiAIExecutor to support !config.attrs.weightsNonTransposed case, so Kleidi will be used on regular LLMs.
Meanwhile I would recommend to work with ov_cpu_func_tests as a most convinient way to extend MatMulKleidiAIExecutor coverage on new precisions.

Sounds good. Thanks @alvoron, looking forward to getting this to work soon. In the meantime, I'll work on integrating the int8 microkernels.

@NishantPrabhuFujitsu I did some changes to support weights transpose.
I picked the current PR changes, rebased to the latest master and applied weights transpose changes.
Could you please try my PR?
#28830
I checked that all smoke_FC_KLEIDIAI_2D tests passed. It includes several tests with weightsNonTransposed that executed by kleidiai, so, I assume, you can try weightsNonTransposed cases as well.
Please let me know if any issues are observed, I'll fix it.

@alvoron I tried your PR, and matmuls in LLM inference (weightsNonTransposed case) are now executed by Kleidi. Thanks for helping!

However, I have noticed the following drawbacks.

Inference with kleidi is really slow. Please find below some benchmarking results where I compare kleidi with gemm:acl for f32:f32:f32 single-prompt inference.

To generate these results, I exported TinyLlama-1.1B-Chat-v1.0 with optimum in fp32 weight format and used f32 precision hint during inference for both cases.

Inference with kleidi consumes a lot of memory. While running the above benchmark, inference with ACL needed <6 GB RAM while kleidi consumed >100 GB of RAM (and was going to consume even more); I had to cut the benchmarking short to prevent the process from getting killed. I am currently not sure what's the cause of this.

Let me know if you have any insights on the above. I'll investigate further from my end as well, while working on integrating the int8 microkernels.

Thanks @NishantPrabhuFujitsu, glad to know it works now.
I left couple of comments in #28830. These recommendations should help to dramatically improve the perf and avoid memory leaks.

NishantPrabhuFujitsu · 2025-02-03T15:02:18Z

Also when I try building with -DENABLE_TESTS=ON for some reason the build gets stuck. In my last build attempt, it was stuck in the state below for 20+ mins after which I decided to stop it. I'm not sure if this is expected or an issue at my end, but if you're aware of this situation please let me know.

...
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/eltwise_chain.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/fullyconnected_strided_inputs_outputs.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/fuse_muladd_ewsimple.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/fuse_non0_output_port.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/fuse_scaleshift_and_fakequantize.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/fuse_split_concat_pair_to_interpolate.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/fuse_transpose_reorder.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/index_add_scatter_elements_update.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/init_state_inplace_conflicts.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/inplace_edge.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/inplace_resolve_io.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/input_noreorder_eltwise_bf16.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/input_output_tensor_reuse.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/input_tensor_roi.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/lora_pattern.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/matmul_decompress_convert.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/matmul_strided_inputs_outputs.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/merge_transpose_reorder.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/ngram.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/not_fused_conv_simple_op.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/read_value_assign.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/remove_convert.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/reshape_chain.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/reshape_fc.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/reshape_inplace.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/reshape_permute_conv_permute_reshape_act.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/sdpa_group_beam_search.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/seq_native_order.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/shape_infer_subgraph.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/shapeof_any_layout.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/split_concat_add.cpp.o
[100%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/split_matmul_concat.cpp.o
[100%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/stateful_init_graph.cpp.o
[100%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/static_zero_dims.cpp.o
[100%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/strided_slice_zero_dims.cpp.o
[100%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/tile_with_two_output_edges.cpp.o
[100%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/undefined_et.cpp.o
[100%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/classes/concat_sdp.cpp.o
[100%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/classes/conv_concat.cpp.o
[100%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/classes/conv_maxpool_activ.cpp.o
[100%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/classes/eltwise_chain.cpp.o
[100%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/classes/fuse_transpose_reorder.cpp.o
[100%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/classes/matmul_weights_decompression.cpp.o
[100%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/classes/undefined_et.cpp.o
[100%] Linking CXX executable /home/nishant/workspace/llm/openvino/konark_openvino/bin/aarch64/Release/ov_cpu_func_tests
[100%] Built target ov_cpu_func_tests

dmitry-gorokhov · 2025-02-04T05:15:44Z

Also when I try building with -DENABLE_TESTS=ON for some reason the build gets stuck. In my last build attempt, it was stuck in the state below for 20+ mins after which I decided to stop it. I'm not sure if this is expected or an issue at my end, but if you're aware of this situation please let me know.

...
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/eltwise_chain.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/fullyconnected_strided_inputs_outputs.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/fuse_muladd_ewsimple.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/fuse_non0_output_port.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/fuse_scaleshift_and_fakequantize.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/fuse_split_concat_pair_to_interpolate.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/fuse_transpose_reorder.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/index_add_scatter_elements_update.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/init_state_inplace_conflicts.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/inplace_edge.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/inplace_resolve_io.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/input_noreorder_eltwise_bf16.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/input_output_tensor_reuse.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/input_tensor_roi.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/lora_pattern.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/matmul_decompress_convert.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/matmul_strided_inputs_outputs.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/merge_transpose_reorder.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/ngram.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/not_fused_conv_simple_op.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/read_value_assign.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/remove_convert.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/reshape_chain.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/reshape_fc.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/reshape_inplace.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/reshape_permute_conv_permute_reshape_act.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/sdpa_group_beam_search.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/seq_native_order.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/shape_infer_subgraph.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/shapeof_any_layout.cpp.o
[ 99%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/split_concat_add.cpp.o
[100%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/split_matmul_concat.cpp.o
[100%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/stateful_init_graph.cpp.o
[100%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/static_zero_dims.cpp.o
[100%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/strided_slice_zero_dims.cpp.o
[100%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/tile_with_two_output_edges.cpp.o
[100%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/common/undefined_et.cpp.o
[100%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/classes/concat_sdp.cpp.o
[100%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/classes/conv_concat.cpp.o
[100%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/classes/conv_maxpool_activ.cpp.o
[100%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/classes/eltwise_chain.cpp.o
[100%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/classes/fuse_transpose_reorder.cpp.o
[100%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/classes/matmul_weights_decompression.cpp.o
[100%] Building CXX object src/plugins/intel_cpu/tests/functional/CMakeFiles/ov_cpu_func_tests.dir/custom/subgraph_tests/src/classes/undefined_et.cpp.o
[100%] Linking CXX executable /home/nishant/workspace/llm/openvino/konark_openvino/bin/aarch64/Release/ov_cpu_func_tests
[100%] Built target ov_cpu_func_tests

This is smt unknown. Haven't seen before.
What is your compiler and OS versions?

NishantPrabhuFujitsu · 2025-02-04T05:48:11Z

I am compiling with GCC 12.3.0 on Ubuntu 22.04.5 LTS, kernel version 6.8.0-1021-aws. The machine is AWS Graviton3 with 32 cores. The exact build commands used (after installing required dependencies) is:

openvino/build$ cmake -DCMAKE_BUILD_TYPE=Release -DENABLE_PYTHON=ON -DENABLE_WHEEL=ON -DENABLE_TESTS=ON ..
openvino/build$ cmake --build . --parallel 32

alvoron · 2025-02-05T08:45:26Z

I am compiling with GCC 12.3.0 on Ubuntu 22.04.5 LTS, kernel version 6.8.0-1021-aws. The machine is AWS Graviton3 with 32 cores. The exact build commands used (after installing required dependencies) is:
openvino/build$ cmake -DCMAKE_BUILD_TYPE=Release -DENABLE_PYTHON=ON -DENABLE_WHEEL=ON -DENABLE_TESTS=ON ..
openvino/build$ cmake --build . --parallel 32

I'll try to reproduce it on AWS.

UPD: I was not able to reproduce the issue using gcc 11.4.0 (Ubuntu 22.04.5 LTS / 6.8.0-1021-aws). Build was completed successfully using your commands.
I upgraded gcc to 12.3.0 (I have to add Ubuntu Toolchain Test PPA for that) and then I was able to reproduce the issue. I'll review what causes the issue. However, I was able to build ov_cpu_func_tests using gcc-12 if I set specific target in cmake command: cmake --build . --target ov_cpu_func_tests --parallel 32

To avoid the issue I'd suggest to downgrade to gcc-11, taking into account that Ubuntu 22.04 comes with GCC 11 by default.
Or if you'd like to stay on gcc-12, could you try to build specific targets only, openvino_intel_cpu_plugin or ov_cpu_func_tests?

NishantPrabhuFujitsu · 2025-02-06T09:51:02Z

@alvoron I was able to compile successfully using gcc-11, so I'll stick with that for now. There's no requirement to use gcc-12 specifically.

dmitry-gorokhov · 2025-02-07T05:25:41Z

Since we will not merge this PR, I would suggest to move all further work/discussions into #28830

### Details: - `kleidiai` is added as git submodule - `kleidiai` is built statically and linked into cpu plugin library - MatMul kleidiai executor is added - weights transpose is supported in MatMul kleidiai executor - Initial implementation is inherited from #27331 ### Tickets: - *ticket-id*

mory91 requested review from a team as code owners October 30, 2024 09:09

github-actions bot added category: CPU OpenVINO CPU plugin category: build OpenVINO cmake script / infra category: dependency_changes Pull requests that update a dependency file labels Oct 30, 2024

sys-openvino-ci added the ExternalPR External contributor label Oct 30, 2024

Add kleidiai as thirdparty

4b52286

mory91 force-pushed the add-kleidiai-thirdparty branch from f05c178 to 4b52286 Compare October 30, 2024 09:18

dmitry-gorokhov self-assigned this Oct 30, 2024

dmitry-gorokhov added the platform: arm OpenVINO on ARM / ARM64 label Oct 30, 2024

mory91 added 3 commits November 14, 2024 01:42

Add fp32 executor

a745ed1

Add matmul executor

28ee428

latest

279fa8d

Simple kleidi usage

738af13

mory91 requested a review from a team as a code owner November 28, 2024 07:53

mory91 requested review from ilya-lavrenov and removed request for a team November 28, 2024 07:53

ilya-lavrenov requested changes Nov 28, 2024

View reviewed changes

add patch to kleidiai cmakelists

263c414

added missed case into ExecutorTypeToString

9bf3229

ENABLE_KLEIDIAI_FOR_CPU on ARM platforma only

152ea43

remove __ARM_NEON in kai_common.h

49e45c6

implement kai_cast_f16_f32 for non __ARM_NEON platforms

fd38d83

wrap OV_CPU_INSTANCE_KLEIDIAI into aarch64 define

11cec75

Fix cmake

daa4060

Add kleidiai types

83e287b

Don't support nonTransposed

7ce9c6b

Add parallel implementation

e3424a0

praasz reviewed Dec 17, 2024

View reviewed changes

github-actions bot added the Stale label Jan 1, 2025

dmitry-gorokhov removed the Stale label Jan 2, 2025

github-actions bot added the Stale label Jan 17, 2025

dmitry-gorokhov removed the Stale label Jan 17, 2025

NishantPrabhuFujitsu reviewed Feb 3, 2025

View reviewed changes

alvoron mentioned this pull request Feb 5, 2025

[CPU][ARM] KleidiAI integration and KleidiAI MM executor #28830

Merged

dmitry-gorokhov closed this Feb 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add kleidiai as thirdparty #27331

Add kleidiai as thirdparty #27331

mory91 commented Oct 30, 2024

alvoron commented Nov 25, 2024

alvoron commented Nov 28, 2024

ilya-lavrenov Nov 28, 2024

mory91 Nov 29, 2024

ilya-lavrenov Nov 29, 2024

alvoron commented Nov 28, 2024

alvoron commented Nov 29, 2024

alvoron commented Nov 29, 2024

alvoron commented Nov 29, 2024

alvoron commented Nov 29, 2024

alvoron commented Nov 29, 2024

alvoron commented Dec 2, 2024

alvoron commented Dec 2, 2024

alvoron commented Dec 3, 2024

praasz Dec 17, 2024

github-actions bot commented Jan 1, 2025

github-actions bot commented Jan 17, 2025

NishantPrabhuFujitsu commented Jan 27, 2025 •

edited

Loading

NishantPrabhuFujitsu commented Jan 28, 2025

dmitry-gorokhov commented Feb 3, 2025

NishantPrabhuFujitsu Feb 3, 2025 •

edited

Loading

dmitry-gorokhov Feb 3, 2025

NishantPrabhuFujitsu Feb 4, 2025

alvoron Feb 5, 2025

NishantPrabhuFujitsu Feb 6, 2025 •

edited

Loading

dmitry-gorokhov Feb 7, 2025 •

edited

Loading

NishantPrabhuFujitsu commented Feb 3, 2025

dmitry-gorokhov commented Feb 4, 2025

NishantPrabhuFujitsu commented Feb 4, 2025 •

edited

Loading

alvoron commented Feb 5, 2025 •

edited

Loading

NishantPrabhuFujitsu commented Feb 6, 2025 •

edited

Loading

dmitry-gorokhov commented Feb 7, 2025 •

edited

Loading

Add kleidiai as thirdparty #27331

Add kleidiai as thirdparty #27331

Conversation

mory91 commented Oct 30, 2024

Details:

alvoron commented Nov 25, 2024

alvoron commented Nov 28, 2024

ilya-lavrenov Nov 28, 2024

Choose a reason for hiding this comment

mory91 Nov 29, 2024

Choose a reason for hiding this comment

ilya-lavrenov Nov 29, 2024

Choose a reason for hiding this comment

alvoron commented Nov 28, 2024

alvoron commented Nov 29, 2024

alvoron commented Nov 29, 2024

alvoron commented Nov 29, 2024

alvoron commented Nov 29, 2024

alvoron commented Nov 29, 2024

alvoron commented Dec 2, 2024

alvoron commented Dec 2, 2024

alvoron commented Dec 3, 2024

praasz Dec 17, 2024

Choose a reason for hiding this comment

github-actions bot commented Jan 1, 2025

github-actions bot commented Jan 17, 2025

NishantPrabhuFujitsu commented Jan 27, 2025 • edited Loading

Setup

NishantPrabhuFujitsu commented Jan 28, 2025

dmitry-gorokhov commented Feb 3, 2025

NishantPrabhuFujitsu Feb 3, 2025 • edited Loading

Choose a reason for hiding this comment

dmitry-gorokhov Feb 3, 2025

Choose a reason for hiding this comment

NishantPrabhuFujitsu Feb 4, 2025

Choose a reason for hiding this comment

alvoron Feb 5, 2025

Choose a reason for hiding this comment

NishantPrabhuFujitsu Feb 6, 2025 • edited Loading

Choose a reason for hiding this comment

dmitry-gorokhov Feb 7, 2025 • edited Loading

Choose a reason for hiding this comment

NishantPrabhuFujitsu commented Feb 3, 2025

dmitry-gorokhov commented Feb 4, 2025

NishantPrabhuFujitsu commented Feb 4, 2025 • edited Loading

alvoron commented Feb 5, 2025 • edited Loading

NishantPrabhuFujitsu commented Feb 6, 2025 • edited Loading

dmitry-gorokhov commented Feb 7, 2025 • edited Loading

NishantPrabhuFujitsu commented Jan 27, 2025 •

edited

Loading

NishantPrabhuFujitsu Feb 3, 2025 •

edited

Loading

NishantPrabhuFujitsu Feb 6, 2025 •

edited

Loading

dmitry-gorokhov Feb 7, 2025 •

edited

Loading

NishantPrabhuFujitsu commented Feb 4, 2025 •

edited

Loading

alvoron commented Feb 5, 2025 •

edited

Loading

NishantPrabhuFujitsu commented Feb 6, 2025 •

edited

Loading

dmitry-gorokhov commented Feb 7, 2025 •

edited

Loading