-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
* add single- and double-precision vmaxnum()/vminnum() adhering to IE… #109
Conversation
This is a redux of PR #106, implementing the changes requested by @fpetrogalli-arm |
A few extra remarks about the change:
|
src/arch/helperadvsimd.h
Outdated
@@ -28,6 +28,16 @@ | |||
|
|||
#define ISANAME "AArch64 AdvSIMD" | |||
|
|||
#ifdef SLEEF_SINGLE_MINMAXNUM_AVAILABLE | |||
#undef SLEEF_SINGLE_MINMAXNUM_AVAILABLE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This #ifdef
-guarded #undef
is not what we want here. If for any reason the SLEEF_SINGLE_MINMAXNUM_AVAILABLE
macro is already defined when the C preprocessor reaches this point, we do want to have an error here as it means that there is something wrong in the code, as the SLEEF_SINGLE_MINMAXNUM_AVAILABLE
macro should be defined only once in all the compilation.
The preprocessor fires a warning in case of duplicate macro, but I think that an #error
preprocessor invocation would be more appropriate.
…EE-754-2008 maxNum()/minNum() semantics * make xfmaxf(), xfminf(), xfmax() and xfmin() use vmaxnum() and vminnum() * add vminnum()/vmaxnum() capabilities to arch SSE2, SSE3, SSE4.1, AVX, AVX2 and AArch64
@fpetrogalli-arm, change's been updated as per your review |
src/libm/df.h
Outdated
} | ||
|
||
static INLINE CONST vfloat vminnum_vf_vf_vf(vfloat x, vfloat y) { | ||
return vmin_vf_vf_vf(vsel_vf_vo_vf_vf(visnan_vo_vf(x), y, x), vsel_vf_vo_vf_vf(visnan_vo_vf(y), x, y)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for raising another question. This is the original implementation of the code that is currently in sleefsimdsp.c
. For the x86 helper files, you have changed such implementation with return vsel_vf_vo_vf_vf(visnan_vo_vf(y), x, vmin_vf_vf_vf(x, y));
. The new one seems to be better, would it be possible to replace this old one with the new one? If the answer is yes, I would suggest you to make the generic implementation in df.h
as return vsel_vf_vo_vf_vf(visnan_vo_vf(y), x, vmin_vf_vf_vf(x, y));
and remove the code (#ifdef
s and function definitions) you have added in the x86 helper files. This reasoning about vmin holds also for the vmax function and the double precision versions in dd.h
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow. This is the original (i.e. currently in master) implementation in sleefsimdsp.c
:
EXPORT CONST vfloat xfminf(vfloat x, vfloat y) {
return vsel_vf_vo_vf_vf(visnan_vo_vf(y), x, vsel_vf_vo_vf_vf(vgt_vo_vf_vf(y, x), x, y));
}
The original version is not really generic -- it's tuned to the semantics of the x86 minps/maxps/vminps/vmaxps
instructions; that's why the new x86-specific versions of vminnum/vmaxnum
are slightly-optimized versions of the original, with vsel(vgt())
changed to vmin()
.
The proposed in this PR generic versions of vminnum/vmaxnum
found in df.h/dd.h
are actually generic, i.e. agnostic of the peculiarities of the underlying min/max vector ops. The proposed implementation of generic vminnum/vmaxnum
in df.h/dd.h
and specialized versions in the respective ISA helpers is the most logical implementation of this functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original implementation is optimized in terms of the number of operations. When min and max instructions are available, I think NAN handling is not required. So, it just needs one instruction, like the following.
EXPORT CONST vdouble xfmax(vdouble x, vdouble y) {
return vmax_vd_vd_vd(x, y);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shibatch, aren't xfmax()/xfmin()
supposed to implement IEEE754-2008 minNum/maxNum semantics? The way the original is implemented, it tries to adhere to minNum/maxNum assuming x86 unerlying semantics (which are not natively minNum/maxNum-compliant).
Here's a quick example of how the original would behave:
x: num, y: nan
original_xfmin(x, y) -> x
x: nan, y: num
original_xfmin(x, y) -> y
In contrast to a x86 vmin()
:
x: num, y: nan
x86_vmin(x, y) -> nan
x: nan, y: num
x86_vmin(x, y) -> y
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NAN handling should be implemented in those instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the instructions in ARM is IEEE754-compliant.
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0802b/CIHFCJCF.html
If we seriously try to optimize this, it is better to make it choose the implementation according to IEEE754-compliance of the instructions.
I think this is a good opportunity to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fpetrogalli-arm, looking again at the new generic version, I don't think it would schedule better on any of the current popular uarchs than the original xfmin()/xfmax()
. So I think the original xfmin()/xfmax()
should become the generic version; the one issue with the new x86 version becoming the generic is that it won't work on ISAs with fmin/fmax semantics returning NaN when either of the args is NaN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a situation like this, I usually write the exported functions (xfmin()/xfmax() in this case) in a generic way. Here, being generic means that the function returns the correct value regardless of the architecture. If I write an optimized version that works only for specific vector extensions, it is chosen with #if macro.
Basically, the exported functions (those functions beginning with x) should not be called by other implementation of functions. I also implement functions for internal usage, which do not handle non-numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shibatch @fpetrogalli-arm Ok, give me a couple of hours (currently preoccupied) to show you what I mean.
@shibatch , are you happy for me to merge this? |
…ession from pre-vminnum/vmaxnum xfmin()/xfmax() for better performance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, looks good.
vmaxnum_v*_v*_v* and vminnum_v*_v*_v* are helper functions for utilizing instructions for choosing larger and smaller elements from two given vectors, and they are introduced in the following PR. #109 It is said that vmaxnmq and vminnmq on aarch64 are IEEE754-conformant, but I recently found that handling of signalling NaN by these instructions is not conforming to the specification of fmin and fmax functions in the ANSI C standard. This patch fixes that problem. It also adds regression test for checking signaling nan handling in fmin and fmax.
…EE-754-2008 maxNum()/minNum() semantics