Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

* add single- and double-precision vmaxnum()/vminnum() adhering to IE… #109

Merged
merged 2 commits into from
Nov 29, 2017
Merged

Conversation

blu
Copy link
Contributor

@blu blu commented Nov 26, 2017

…EE-754-2008 maxNum()/minNum() semantics

  • make xfmaxf(), xfminf(), xfmax() and xfmin() use vmaxnum() and vminnum()
  • add vminnum()/vmaxnum() capabilities to arch SSE2, SSE3, SSE4.1, AVX, AVX2 and AArch64

@blu
Copy link
Contributor Author

blu commented Nov 26, 2017

This is a redux of PR #106, implementing the changes requested by @fpetrogalli-arm

@blu
Copy link
Contributor Author

blu commented Nov 26, 2017

A few extra remarks about the change:

  • change provides generic vminnum()/vmaxnum() in src/libm/dd.h and src/libm/df.h -- those are really generic, making no assumptions about the NaN semantics of the underlying vmin/vmax!
  • all x86-64 targets (sans AVX512, sorry) get vminnum()/vmaxnum() implementations close to the original xfmin/xfmax in src/libm/sleefsimddp.c and src/libm/sleefsimdsp.c but this time using vmin()/vmax() instead of the original vgt()+vsel() combo.
  • AArch64 gets the single-instruction vminnum()/vmaxnum() from PR * make xfmaxf(), xfminf(), xfmax() and xfmin() aware of ISAs with min… #106.

@@ -28,6 +28,16 @@

#define ISANAME "AArch64 AdvSIMD"

#ifdef SLEEF_SINGLE_MINMAXNUM_AVAILABLE
#undef SLEEF_SINGLE_MINMAXNUM_AVAILABLE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This #ifdef-guarded #undef is not what we want here. If for any reason the SLEEF_SINGLE_MINMAXNUM_AVAILABLE macro is already defined when the C preprocessor reaches this point, we do want to have an error here as it means that there is something wrong in the code, as the SLEEF_SINGLE_MINMAXNUM_AVAILABLE macro should be defined only once in all the compilation.
The preprocessor fires a warning in case of duplicate macro, but I think that an #error preprocessor invocation would be more appropriate.

…EE-754-2008 maxNum()/minNum() semantics

* make xfmaxf(), xfminf(), xfmax() and xfmin() use vmaxnum() and vminnum()
* add vminnum()/vmaxnum() capabilities to arch SSE2, SSE3, SSE4.1, AVX, AVX2 and AArch64
@blu
Copy link
Contributor Author

blu commented Nov 27, 2017

@fpetrogalli-arm, change's been updated as per your review

src/libm/df.h Outdated
}

static INLINE CONST vfloat vminnum_vf_vf_vf(vfloat x, vfloat y) {
return vmin_vf_vf_vf(vsel_vf_vo_vf_vf(visnan_vo_vf(x), y, x), vsel_vf_vo_vf_vf(visnan_vo_vf(y), x, y));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for raising another question. This is the original implementation of the code that is currently in sleefsimdsp.c. For the x86 helper files, you have changed such implementation with return vsel_vf_vo_vf_vf(visnan_vo_vf(y), x, vmin_vf_vf_vf(x, y));. The new one seems to be better, would it be possible to replace this old one with the new one? If the answer is yes, I would suggest you to make the generic implementation in df.h as return vsel_vf_vo_vf_vf(visnan_vo_vf(y), x, vmin_vf_vf_vf(x, y)); and remove the code (#ifdefs and function definitions) you have added in the x86 helper files. This reasoning about vmin holds also for the vmax function and the double precision versions in dd.h.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow. This is the original (i.e. currently in master) implementation in sleefsimdsp.c:

EXPORT CONST vfloat xfminf(vfloat x, vfloat y) {
  return vsel_vf_vo_vf_vf(visnan_vo_vf(y), x, vsel_vf_vo_vf_vf(vgt_vo_vf_vf(y, x), x, y));
}

The original version is not really generic -- it's tuned to the semantics of the x86 minps/maxps/vminps/vmaxps instructions; that's why the new x86-specific versions of vminnum/vmaxnum are slightly-optimized versions of the original, with vsel(vgt()) changed to vmin().

The proposed in this PR generic versions of vminnum/vmaxnum found in df.h/dd.h are actually generic, i.e. agnostic of the peculiarities of the underlying min/max vector ops. The proposed implementation of generic vminnum/vmaxnum in df.h/dd.h and specialized versions in the respective ISA helpers is the most logical implementation of this functionality.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original implementation is optimized in terms of the number of operations. When min and max instructions are available, I think NAN handling is not required. So, it just needs one instruction, like the following.

EXPORT CONST vdouble xfmax(vdouble x, vdouble y) {
   return vmax_vd_vd_vd(x, y);
}

Copy link
Contributor Author

@blu blu Nov 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shibatch, aren't xfmax()/xfmin() supposed to implement IEEE754-2008 minNum/maxNum semantics? The way the original is implemented, it tries to adhere to minNum/maxNum assuming x86 unerlying semantics (which are not natively minNum/maxNum-compliant).

Here's a quick example of how the original would behave:

x: num, y: nan
original_xfmin(x, y) -> x

x: nan, y: num
original_xfmin(x, y) -> y

In contrast to a x86 vmin():

x: num, y: nan
x86_vmin(x, y) -> nan

x: nan, y: num
x86_vmin(x, y) -> y

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NAN handling should be implemented in those instructions.

http://www.felixcloutier.com/x86/MAXPD.html

Copy link
Owner

@shibatch shibatch Nov 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the instructions in ARM is IEEE754-compliant.

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0802b/CIHFCJCF.html

If we seriously try to optimize this, it is better to make it choose the implementation according to IEEE754-compliance of the instructions.
I think this is a good opportunity to do this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shibatch , are you saying that we should get completely rid of the old implementation and use the ones that @blu added in the x86 helper header files as generic implementation to be used in [dd|df].h?

This makes sense to me too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fpetrogalli-arm, looking again at the new generic version, I don't think it would schedule better on any of the current popular uarchs than the original xfmin()/xfmax(). So I think the original xfmin()/xfmax() should become the generic version; the one issue with the new x86 version becoming the generic is that it won't work on ISAs with fmin/fmax semantics returning NaN when either of the args is NaN.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a situation like this, I usually write the exported functions (xfmin()/xfmax() in this case) in a generic way. Here, being generic means that the function returns the correct value regardless of the architecture. If I write an optimized version that works only for specific vector extensions, it is chosen with #if macro.

Basically, the exported functions (those functions beginning with x) should not be called by other implementation of functions. I also implement functions for internal usage, which do not handle non-numbers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shibatch @fpetrogalli-arm Ok, give me a couple of hours (currently preoccupied) to show you what I mean.

@fpetrogalli
Copy link
Collaborator

@shibatch , are you happy for me to merge this?

…ession from pre-vminnum/vmaxnum xfmin()/xfmax() for better performance
Copy link
Owner

@shibatch shibatch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, looks good.

@fpetrogalli fpetrogalli merged commit 0ff74a0 into shibatch:master Nov 29, 2017
@blu blu deleted the minmaxNum branch November 29, 2017 16:39
shibatch added a commit that referenced this pull request Jan 16, 2018
vmaxnum_v*_v*_v* and vminnum_v*_v*_v* are helper functions for utilizing instructions for choosing larger and smaller elements from two given vectors, and they are introduced in the following PR.

#109

It is said that vmaxnmq and vminnmq on aarch64 are IEEE754-conformant, but I recently found that handling of signalling NaN by these instructions is not conforming to the specification of fmin and fmax functions in the ANSI C standard.

This patch fixes that problem. It also adds regression test for checking signaling nan handling in fmin and fmax.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants