-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SVE optimised float WSSJ kernel #2917
SVE optimised float WSSJ kernel #2917
Conversation
@keeranroth have a look at this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't comment too much on the algorithm, as I'm not familiar with SVM code. But it looks believable. There are just some style points I picked up on. Will let someone knowledgeable about the application area give some more guidance.
I can't help but feel that there is some instruction level parallelism being left on the table. A lot of the instructions are dependent. Having the masking, you might be able to duplicate the work, and simply select the result that you want at the end. I feel as though this implementation is going to be using only one pipeline at the moment. Would need profiling to confirm, though
} | ||
else | ||
{ | ||
DAAL_ASSERT((sign & (sign - 1)) == 0) // used to make sure sign is always having 1 bit set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what get sign returns, but this assert is also true when sign = 0, so the comment isn't correct. I suspect this might not be what you want to be checking on the result of getSign
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was to keep this optimization done under check, since low = 0x2
, if it were to ever change this debug assert would help. Where getSign
is defined here - https://github.com/oneapi-src/oneDAL/blob/f8a395311b4b19be1859545d7a19ee1969d8d9ee/cpp/daal/src/algorithms/svm/svm_train_common.h#L92
We do think that there is more room for improvement, and we are exploring more ideas, we've finalized on this version for now and if we do optimize it further will raise another PR for it. If you do see any instruction level bottlenecks, please do point it out. |
Great to see specialization for algo |
/intelci: run |
@rakshithgb-fujitsu thank you! please address formatter check hints |
5bc8689
to
7dee7d0
Compare
not sure what is failing, |
Co-authored-by: Alexander Andreev <[email protected]>
This PR introduces performance optimizations for the WSSJ Float using SVE intrinsics, resulting in significant improvements for the SVM algorithms on ARM.
Key Improvements:
Boser Method: 22% performance gain, leading to faster computation and better resource utilization.
Thunder Method: 5% performance gain, enhancing efficiency in scenarios where this method is used.
Changes:
Code Updates: New SVE intrinsics based WSSJ Float kernel.
Impact:
Performance: Faster processing times and improved efficiency for SVM algorithms observed and documented on single core.
Performance on single core (aws instance: c7g.8xl graviton3):