Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of masked functions using native masked intrinsic functions #142

Open
shibatch opened this issue Jan 7, 2018 · 1 comment

Comments

@shibatch
Copy link
Owner

shibatch commented Jan 7, 2018

I made a pull request for implementing masked functions by combining the current unmasked functions and a selection(blending) function.

#139

However, there is a concern on the performance of this implementation, since the ALUs for unused elements are all active. This could lead to increased power consumption and generated heat by the computer. It is considered better to implement the masked functions in such a way that they utilize native masked intrinsic functions.

My plan is to approve the above PR, and after the release of version 3.2, we will start implementing masked functions using native masked intrinsic functions in the following way.

All existing FP functions in the helper files will be converted to masked functions. For example,

vdouble vadd_vd_vd_vd_vo(vdouble x, vdouble y, vopmask m) {
  return vaddq_f64(x, y);
}

for an unmasked intrinsic function, and

vdouble vadd_vd_vd_vd_vo(vdouble x, vdouble y, vopmask m) {
  return svadd_f64_x(m, x, y);
}

for a masked intrinsic function.

Then, the implementation of each math function would be like the following.

static const inline vdouble xsin(vdouble arg, vopmask mask) { ... }

EXPORT const vdouble Sleef_sindX_u35YYY(vdouble arg) {
  return xsin(arg, SLEEF_OPMASK_ALLONE);
}

EXPORT const vdouble Sleef_mask_sindX_u35YYY(vdouble arg, vopmask mask) {
  return xsin(arg, mask);
}

The mask argument is assumed to be optimized away if it is not used.

@fpetrogalli
Copy link

fpetrogalli commented Mar 8, 2018

static const inline vdouble xfdim_base(vdouble arg1, vdouble arg2, vopmask mask) { ... }

EXPORT const vdouble xfdim(vdouble arg1, vdouble arg2) {
  return xfdim_base(arg1, arg2, SLEEF_OPMASK_ALLONE);
}

EXPORT const vdouble xfdim_mask(vdouble arg1, vdouble arg2, vopmask mask) {
  return xfdim_base(arg1, arg2, mask);
}

  1. add the masked intrinsics that are needed in the AVX512F header
  2. use tester for the unmasked version, and tester3 for the masked one to do bit-to-bit testing of the masked version versus the unmasked one
  3. and then after doing one function (fdim) we go wide to the rest, not for the whole library but for groups of functions (say first all the vfloat(vfloat) signatures, then the vfloat(vfloat,vfloat), and so on
  4. when we are done with all the functions, we remove the unmasked intrinsics from the helper files.

fpetrogalli pushed a commit that referenced this issue Mar 13, 2018
This commit adds support for the AArch64 Scalable Vector Extension
(SVE) [1]. The vector functions are provided to target Vector Length
Agnostic (VLA) execution [2].

To build SLEEF with SVE support, a compiler that support the SVE Arm C
Language Extensions (ACLE) [2] must be used.

At the time of publishing this patch, the only compiler with SVE ACLE
support is Arm Compiler for HPC [3].

The Cmake configuration expectes Arm Instruction Emulator (ArmIE) [4]
to execute the tests on native AArch64 hardware without SVE support.

The SVE target is build without taking advantage of the native masking
capabilities of SVE. This will be targeted in a upcoming release of
SLEEF, together with the AVX512F native masking capabilities [5].

Additional changes introduced in this patch are:

1. The mkrename* script have been modified to support VLA names in the
   functions. In particular, 'x' is used to represent the vector
   length of the SVE symbols.

2. '__sizeless_struct' is a prototype language extension only
   implemented by Arm Compiler For HPC [3] to allow the declaration of
   SVE tuple types as described in section 3.4 of Arm C Language
   Extensions for SVE [2].

3. A new 'iutsve' executable is generated to test the SVE functions.

[1] https://developer.arm.com/products/software-development-tools/hpc/sve
[2] https://developer.arm.com/docs/100987/0000
[3] https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc
[4] https://developer.arm.com/products/software-development-tools/hpc/arm-instruction-emulator
[5] #142
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants