-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of masked functions using native masked intrinsic functions #142
Comments
|
fpetrogalli
pushed a commit
that referenced
this issue
Mar 13, 2018
This commit adds support for the AArch64 Scalable Vector Extension (SVE) [1]. The vector functions are provided to target Vector Length Agnostic (VLA) execution [2]. To build SLEEF with SVE support, a compiler that support the SVE Arm C Language Extensions (ACLE) [2] must be used. At the time of publishing this patch, the only compiler with SVE ACLE support is Arm Compiler for HPC [3]. The Cmake configuration expectes Arm Instruction Emulator (ArmIE) [4] to execute the tests on native AArch64 hardware without SVE support. The SVE target is build without taking advantage of the native masking capabilities of SVE. This will be targeted in a upcoming release of SLEEF, together with the AVX512F native masking capabilities [5]. Additional changes introduced in this patch are: 1. The mkrename* script have been modified to support VLA names in the functions. In particular, 'x' is used to represent the vector length of the SVE symbols. 2. '__sizeless_struct' is a prototype language extension only implemented by Arm Compiler For HPC [3] to allow the declaration of SVE tuple types as described in section 3.4 of Arm C Language Extensions for SVE [2]. 3. A new 'iutsve' executable is generated to test the SVE functions. [1] https://developer.arm.com/products/software-development-tools/hpc/sve [2] https://developer.arm.com/docs/100987/0000 [3] https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc [4] https://developer.arm.com/products/software-development-tools/hpc/arm-instruction-emulator [5] #142
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I made a pull request for implementing masked functions by combining the current unmasked functions and a selection(blending) function.
#139
However, there is a concern on the performance of this implementation, since the ALUs for unused elements are all active. This could lead to increased power consumption and generated heat by the computer. It is considered better to implement the masked functions in such a way that they utilize native masked intrinsic functions.
My plan is to approve the above PR, and after the release of version 3.2, we will start implementing masked functions using native masked intrinsic functions in the following way.
All existing FP functions in the helper files will be converted to masked functions. For example,
for an unmasked intrinsic function, and
for a masked intrinsic function.
Then, the implementation of each math function would be like the following.
The mask argument is assumed to be optimized away if it is not used.
The text was updated successfully, but these errors were encountered: