SVE optimised float WSSJ kernel #2917

rakshithgb-fujitsu · 2024-09-26T14:17:46Z

This PR introduces performance optimizations for the WSSJ Float using SVE intrinsics, resulting in significant improvements for the SVM algorithms on ARM.

Key Improvements:
Boser Method: 22% performance gain, leading to faster computation and better resource utilization.
Thunder Method: 5% performance gain, enhancing efficiency in scenarios where this method is used.

Changes:
Code Updates: New SVE intrinsics based WSSJ Float kernel.

Impact:
Performance: Faster processing times and improved efficiency for SVM algorithms observed and documented on single core.

Performance on single core (aws instance: c7g.8xl graviton3):

rakshithgb-fujitsu · 2024-09-26T14:18:25Z

@keeranroth have a look at this.

keeranroth

Can't comment too much on the algorithm, as I'm not familiar with SVM code. But it looks believable. There are just some style points I picked up on. Will let someone knowledgeable about the application area give some more guidance.

I can't help but feel that there is some instruction level parallelism being left on the table. A lot of the instructions are dependent. Having the masking, you might be able to duplicate the work, and simply select the result that you want at the end. I feel as though this implementation is going to be using only one pipeline at the moment. Would need profiling to confirm, though

cpp/daal/src/algorithms/svm/svm_train_common_sve_impl.i

keeranroth · 2024-09-26T14:41:41Z

cpp/daal/src/algorithms/svm/svm_train_common_sve_impl.i

+    }
+    else
+    {
+        DAAL_ASSERT((sign & (sign - 1)) == 0) // used to make sure sign is always having 1 bit set


Not sure what get sign returns, but this assert is also true when sign = 0, so the comment isn't correct. I suspect this might not be what you want to be checking on the result of getSign

The idea was to keep this optimization done under check, since low = 0x2, if it were to ever change this debug assert would help. Where getSign is defined here - https://github.com/oneapi-src/oneDAL/blob/f8a395311b4b19be1859545d7a19ee1969d8d9ee/cpp/daal/src/algorithms/svm/svm_train_common.h#L92

cpp/daal/src/algorithms/svm/svm_train_common_sve_impl.i

rakshithgb-fujitsu · 2024-09-26T15:34:13Z

Can't comment too much on the algorithm, as I'm not familiar with SVM code. But it looks believable. There are just some style points I picked up on. Will let someone knowledgeable about the application area give some more guidance.

I can't help but feel that there is some instruction level parallelism being left on the table. A lot of the instructions are dependent. Having the masking, you might be able to duplicate the work, and simply select the result that you want at the end. I feel as though this implementation is going to be using only one pipeline at the moment. Would need profiling to confirm, though

We do think that there is more room for improvement, and we are exploring more ideas, we've finalized on this version for now and if we do optimize it further will raise another PR for it. If you do see any instruction level bottlenecks, please do point it out.

napetrov · 2024-09-26T17:18:43Z

Great to see specialization for algo

napetrov · 2024-09-26T17:18:49Z

/intelci: run

samir-nasibli · 2024-10-02T08:37:46Z

@rakshithgb-fujitsu thank you! please address formatter check hints

rakshithgb-fujitsu · 2024-10-03T05:50:56Z

@rakshithgb-fujitsu thank you! please address formatter check hints

not sure what is failing, clang-format.sh shows there is nothing to format. Also unable to add labels.

cpp/daal/src/algorithms/svm/svm_train_common_sve_impl.i

Co-authored-by: Alexander Andreev <[email protected]>

rakshithgb-fujitsu requested review from Alexsandruss, samir-nasibli and Alexandr-Solovev as code owners September 26, 2024 14:17

keeranroth reviewed Sep 26, 2024

View reviewed changes

rakshithgb-fujitsu added 5 commits October 3, 2024 11:06

optimisation of wssj in sve

990f0b4

clang format

64df4f4

compile bug fix

bead00e

update comments

9e2b0da

clang format

7dee7d0

rakshithgb-fujitsu force-pushed the svm_wssj_float_sve branch from 5bc8689 to 7dee7d0 Compare October 3, 2024 05:36

Alexsandruss added enhancement perf Performance optimization labels Oct 3, 2024

Alexsandruss reviewed Oct 3, 2024

View reviewed changes

cpp/daal/src/algorithms/svm/svm_train_common_sve_impl.i Outdated Show resolved Hide resolved

rakshithgb-fujitsu and others added 2 commits October 3, 2024 21:05

Update cpp/daal/src/algorithms/svm/svm_train_common_sve_impl.i

dfa93ed

Co-authored-by: Alexander Andreev <[email protected]>

Merge branch 'oneapi-src:main' into svm_wssj_float_sve

fffc08a

Alexsandruss approved these changes Oct 4, 2024

View reviewed changes

Alexsandruss merged commit 84c9bbb into uxlfoundation:main Oct 4, 2024
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SVE optimised float WSSJ kernel #2917

SVE optimised float WSSJ kernel #2917

rakshithgb-fujitsu commented Sep 26, 2024 •

edited

Loading

rakshithgb-fujitsu commented Sep 26, 2024

keeranroth left a comment

keeranroth Sep 26, 2024

rakshithgb-fujitsu Sep 26, 2024

rakshithgb-fujitsu commented Sep 26, 2024 •

edited

Loading

napetrov commented Sep 26, 2024

napetrov commented Sep 26, 2024

samir-nasibli commented Oct 2, 2024

rakshithgb-fujitsu commented Oct 3, 2024

SVE optimised float WSSJ kernel #2917

SVE optimised float WSSJ kernel #2917

Conversation

rakshithgb-fujitsu commented Sep 26, 2024 • edited Loading

rakshithgb-fujitsu commented Sep 26, 2024

keeranroth left a comment

Choose a reason for hiding this comment

keeranroth Sep 26, 2024

Choose a reason for hiding this comment

rakshithgb-fujitsu Sep 26, 2024

Choose a reason for hiding this comment

rakshithgb-fujitsu commented Sep 26, 2024 • edited Loading

napetrov commented Sep 26, 2024

napetrov commented Sep 26, 2024

samir-nasibli commented Oct 2, 2024

rakshithgb-fujitsu commented Oct 3, 2024

rakshithgb-fujitsu commented Sep 26, 2024 •

edited

Loading

rakshithgb-fujitsu commented Sep 26, 2024 •

edited

Loading