* add single- and double-precision vmaxnum()/vminnum() adhering to IE… #109

blu · 2017-11-26T18:23:57Z

…EE-754-2008 maxNum()/minNum() semantics

make xfmaxf(), xfminf(), xfmax() and xfmin() use vmaxnum() and vminnum()
add vminnum()/vmaxnum() capabilities to arch SSE2, SSE3, SSE4.1, AVX, AVX2 and AArch64

blu · 2017-11-26T18:25:56Z

This is a redux of PR #106, implementing the changes requested by @fpetrogalli-arm

blu · 2017-11-26T18:45:55Z

A few extra remarks about the change:

change provides generic vminnum()/vmaxnum() in src/libm/dd.h and src/libm/df.h -- those are really generic, making no assumptions about the NaN semantics of the underlying vmin/vmax!
all x86-64 targets (sans AVX512, sorry) get vminnum()/vmaxnum() implementations close to the original xfmin/xfmax in src/libm/sleefsimddp.c and src/libm/sleefsimdsp.c but this time using vmin()/vmax() instead of the original vgt()+vsel() combo.
AArch64 gets the single-instruction vminnum()/vmaxnum() from PR * make xfmaxf(), xfminf(), xfmax() and xfmin() aware of ISAs with min… #106.

fpetrogalli · 2017-11-27T14:55:14Z

src/arch/helperadvsimd.h

@@ -28,6 +28,16 @@

 #define ISANAME "AArch64 AdvSIMD"

+#ifdef SLEEF_SINGLE_MINMAXNUM_AVAILABLE
+#undef SLEEF_SINGLE_MINMAXNUM_AVAILABLE


This #ifdef-guarded #undef is not what we want here. If for any reason the SLEEF_SINGLE_MINMAXNUM_AVAILABLE macro is already defined when the C preprocessor reaches this point, we do want to have an error here as it means that there is something wrong in the code, as the SLEEF_SINGLE_MINMAXNUM_AVAILABLE macro should be defined only once in all the compilation.
The preprocessor fires a warning in case of duplicate macro, but I think that an #error preprocessor invocation would be more appropriate.

…EE-754-2008 maxNum()/minNum() semantics * make xfmaxf(), xfminf(), xfmax() and xfmin() use vmaxnum() and vminnum() * add vminnum()/vmaxnum() capabilities to arch SSE2, SSE3, SSE4.1, AVX, AVX2 and AArch64

blu · 2017-11-27T22:15:40Z

@fpetrogalli-arm, change's been updated as per your review

fpetrogalli · 2017-11-28T08:18:17Z

src/libm/df.h

+}
+
+static INLINE CONST vfloat vminnum_vf_vf_vf(vfloat x, vfloat y) {
+  return vmin_vf_vf_vf(vsel_vf_vo_vf_vf(visnan_vo_vf(x), y, x), vsel_vf_vo_vf_vf(visnan_vo_vf(y), x, y));


Apologies for raising another question. This is the original implementation of the code that is currently in sleefsimdsp.c. For the x86 helper files, you have changed such implementation with return vsel_vf_vo_vf_vf(visnan_vo_vf(y), x, vmin_vf_vf_vf(x, y));. The new one seems to be better, would it be possible to replace this old one with the new one? If the answer is yes, I would suggest you to make the generic implementation in df.h as return vsel_vf_vo_vf_vf(visnan_vo_vf(y), x, vmin_vf_vf_vf(x, y)); and remove the code (#ifdefs and function definitions) you have added in the x86 helper files. This reasoning about vmin holds also for the vmax function and the double precision versions in dd.h.

I'm not sure I follow. This is the original (i.e. currently in master) implementation in sleefsimdsp.c:

EXPORT CONST vfloat xfminf(vfloat x, vfloat y) { return vsel_vf_vo_vf_vf(visnan_vo_vf(y), x, vsel_vf_vo_vf_vf(vgt_vo_vf_vf(y, x), x, y)); }

The original version is not really generic -- it's tuned to the semantics of the x86 minps/maxps/vminps/vmaxps instructions; that's why the new x86-specific versions of vminnum/vmaxnum are slightly-optimized versions of the original, with vsel(vgt()) changed to vmin().

The proposed in this PR generic versions of vminnum/vmaxnum found in df.h/dd.h are actually generic, i.e. agnostic of the peculiarities of the underlying min/max vector ops. The proposed implementation of generic vminnum/vmaxnum in df.h/dd.h and specialized versions in the respective ISA helpers is the most logical implementation of this functionality.

The original implementation is optimized in terms of the number of operations. When min and max instructions are available, I think NAN handling is not required. So, it just needs one instruction, like the following.

EXPORT CONST vdouble xfmax(vdouble x, vdouble y) { return vmax_vd_vd_vd(x, y); }

@shibatch, aren't xfmax()/xfmin() supposed to implement IEEE754-2008 minNum/maxNum semantics? The way the original is implemented, it tries to adhere to minNum/maxNum assuming x86 unerlying semantics (which are not natively minNum/maxNum-compliant).

Here's a quick example of how the original would behave:

x: num, y: nan original_xfmin(x, y) -> x x: nan, y: num original_xfmin(x, y) -> y

In contrast to a x86 vmin():

x: num, y: nan x86_vmin(x, y) -> nan x: nan, y: num x86_vmin(x, y) -> y

NAN handling should be implemented in those instructions.

http://www.felixcloutier.com/x86/MAXPD.html

But the instructions in ARM is IEEE754-compliant.

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0802b/CIHFCJCF.html

If we seriously try to optimize this, it is better to make it choose the implementation according to IEEE754-compliance of the instructions.
I think this is a good opportunity to do this.

@shibatch , are you saying that we should get completely rid of the old implementation and use the ones that @blu added in the x86 helper header files as generic implementation to be used in [dd|df].h?

This makes sense to me too.

@fpetrogalli-arm, looking again at the new generic version, I don't think it would schedule better on any of the current popular uarchs than the original xfmin()/xfmax(). So I think the original xfmin()/xfmax() should become the generic version; the one issue with the new x86 version becoming the generic is that it won't work on ISAs with fmin/fmax semantics returning NaN when either of the args is NaN.

In a situation like this, I usually write the exported functions (xfmin()/xfmax() in this case) in a generic way. Here, being generic means that the function returns the correct value regardless of the architecture. If I write an optimized version that works only for specific vector extensions, it is chosen with #if macro.

Basically, the exported functions (those functions beginning with x) should not be called by other implementation of functions. I also implement functions for internal usage, which do not handle non-numbers.

@shibatch @fpetrogalli-arm Ok, give me a couple of hours (currently preoccupied) to show you what I mean.

fpetrogalli · 2017-11-28T11:30:38Z

@shibatch , are you happy for me to merge this?

…ession from pre-vminnum/vmaxnum xfmin()/xfmax() for better performance

shibatch

Yeah, looks good.

vmaxnum_v*_v*_v* and vminnum_v*_v*_v* are helper functions for utilizing instructions for choosing larger and smaller elements from two given vectors, and they are introduced in the following PR. #109 It is said that vmaxnmq and vminnmq on aarch64 are IEEE754-conformant, but I recently found that handling of signalling NaN by these instructions is not conforming to the specification of fmin and fmax functions in the ANSI C standard. This patch fixes that problem. It also adds regression test for checking signaling nan handling in fmin and fmax.

fpetrogalli requested changes Nov 27, 2017

View reviewed changes

* add single- and double-precision vmaxnum()/vminnum() adhering to IE…

010189a

…EE-754-2008 maxNum()/minNum() semantics * make xfmaxf(), xfminf(), xfmax() and xfmin() use vmaxnum() and vminnum() * add vminnum()/vmaxnum() capabilities to arch SSE2, SSE3, SSE4.1, AVX, AVX2 and AArch64

fpetrogalli requested changes Nov 28, 2017

View reviewed changes

fpetrogalli approved these changes Nov 28, 2017

View reviewed changes

refactor generic vminnum()/vmaxnum() in dd.h and df.h to use the expr…

a68faf5

…ession from pre-vminnum/vmaxnum xfmin()/xfmax() for better performance

shibatch approved these changes Nov 29, 2017

View reviewed changes

fpetrogalli merged commit 0ff74a0 into shibatch:master Nov 29, 2017

fpetrogalli mentioned this pull request Nov 29, 2017

* make xfmaxf(), xfminf(), xfmax() and xfmin() aware of ISAs with min… #106

Closed

blu deleted the minmaxNum branch November 29, 2017 16:39

shibatch mentioned this pull request Jan 5, 2018

[aarch64] Fix fmin and fmax on aarch64 #140

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

* add single- and double-precision vmaxnum()/vminnum() adhering to IE… #109

* add single- and double-precision vmaxnum()/vminnum() adhering to IE… #109

blu commented Nov 26, 2017

blu commented Nov 26, 2017

blu commented Nov 26, 2017 •

edited

Loading

fpetrogalli Nov 27, 2017

blu commented Nov 27, 2017

fpetrogalli Nov 28, 2017

blu Nov 28, 2017

shibatch Nov 28, 2017

blu Nov 28, 2017 •

edited

Loading

shibatch Nov 28, 2017

shibatch Nov 28, 2017 •

edited

Loading

fpetrogalli Nov 28, 2017

blu Nov 28, 2017

shibatch Nov 28, 2017

blu Nov 28, 2017

fpetrogalli commented Nov 28, 2017

shibatch left a comment

* add single- and double-precision vmaxnum()/vminnum() adhering to IE… #109

* add single- and double-precision vmaxnum()/vminnum() adhering to IE… #109

Conversation

blu commented Nov 26, 2017

blu commented Nov 26, 2017

blu commented Nov 26, 2017 • edited Loading

Choose a reason for hiding this comment

blu commented Nov 27, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

blu Nov 28, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shibatch Nov 28, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fpetrogalli commented Nov 28, 2017

shibatch left a comment

Choose a reason for hiding this comment

blu commented Nov 26, 2017 •

edited

Loading

blu Nov 28, 2017 •

edited

Loading

shibatch Nov 28, 2017 •

edited

Loading