-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discuss: -ffast-math and other math optimization flags #782
Comments
I think most if not all flags would be safe to use. For
|
if (!(val > INT32_MIN && val < INT32_MAX + 1LL)) And I think that |
I think both of our uses of The docs on The most common change is that The largest difference is surprising: the compiler seems to be much less aggressive about inlining floating-point math routines, and inserts calls to <_change_alpha>:
ucvtf d1, w1
ldr b2, [x0]
ucvtf d2, d2
fmov d3, #1.00000000
fsub d3, d3, d0
fmul d0, d1, d0
fmadd d0, d3, d2, d0
mov x8, #-0x3e20000000000000
fmov d1, x8
fcmp d0, d1
fccmp d0, d0, #0x1, hi
mov x8, #0x41e0000000000000
fmov d1, x8
fccmp d0, d1, #0x0, vc
fcvtzs w8, d0
csel w8, wzr, w8, ge
strb w8, [x0]
ret -> <_change_alpha>:
stp d9, d8, [sp, #-0x30]!
stp x20, x19, [sp, #0x10]
stp x29, x30, [sp, #0x20]
add x29, sp, #0x20
mov x19, x0
ldr w8, [x0]
and w20, w8, #0xffffff00
and w8, w8, #0xff
ucvtf d1, w1
ucvtf d2, w8
fmov d3, #1.00000000
fsub d3, d3, d0
fmul d0, d1, d0
fmadd d8, d3, d2, d0
fmov d0, d8
bl 0x2a09c [rcombs note: symbol stub for isnand]
mov x8, #0x41e0000000000000
fmov d0, x8
fcmp d8, d0
mov x8, #-0x3e20000000000000
fmov d0, x8
fccmp d8, d0, #0x4, lt
ccmp w0, #0x0, #0x0, gt
fcvtzs w8, d8
and w8, w8, #0xff
csel w8, wzr, w8, ne
orr w8, w8, w20
str w8, [x19]
ldp x29, x30, [sp, #0x20]
ldp x20, x19, [sp, #0x10]
ldp d9, d8, [sp], #0x30
ret I also saw a few changes like this, which seems like an actual positive improvement (on Firestorm, fcmp+fcsel is 7 cycles, vs 2 cycles for fmax[nm]): - fcmp d0, d6
- fcsel d0, d0, d6, gt
+ fmaxnm d0, d0, d6 However, this same improvement can be had with a change like this: diff --git a/libass/ass_utils.h b/libass/ass_utils.h
index a1beb769..ebc44e78 100644
--- a/libass/ass_utils.h
+++ b/libass/ass_utils.h
@@ -43,8 +43,17 @@
#define MSGL_V 6
#define MSGL_DBG2 7
-#define FFMAX(a,b) ((a) > (b) ? (a) : (b))
-#define FFMIN(a,b) ((a) > (b) ? (b) : (a))
+#define FFMAX(a, b) _Generic(((a)+(b)), \
+ double: fmax((a), (b)), \
+ float: fmaxf((a), (b)), \
+ default: ((a) > (b) ? (a) : (b)) \
+)
+#define FFMIN(a, b) _Generic(((a)+(b)), \
+ double: fmin((a), (b)), \
+ float: fminf((a), (b)), \
+ default: ((a) > (b) ? (b) : (a)) \
+)
+
#define FFMINMAX(c,a,b) FFMIN(FFMAX(c, a), b)
#define ASS_PI 3.14159265358979323846
|
re Except |
Are you sure you not flipped the results?
change_alpha:
ucvtf d31, w1
ldr w1, [x0]
fmov d30, 1.0e+0
and w2, w1, 255
and w1, w1, -256
fsub d30, d30, d0
fmul d0, d31, d0
scvtf d31, w2
fmadd d0, d31, d30, d0
fcmp d0, d0
bvs .L2
mov x2, -4476578029606273024
fmov d31, x2
fcmpe d0, d31
bls .L2
mov x2, 4746794007248502784
fmov d31, x2
fcmpe d0, d31
bge .L2
fcvtzs w2, d0
and w2, w2, 255
orr w1, w1, w2
.L2:
str w1, [x0]
ret
change_alpha:
ucvtf d31, w1
mov x2, -4476578029606273024
ldr w1, [x0]
fmov d29, x2
mov x2, 4746794007248502784
fmov d28, 1.0e+0
fmov d30, x2
and w2, w1, 255
fsub d28, d28, d0
fmul d0, d31, d0
scvtf d31, w2
and w1, w1, -256
mov w3, w1
fmadd d0, d31, d28, d0
fcmpe d0, d29
fcvtzs w2, d0
fccmpe d0, d30, 0, hi
and w2, w2, 255
orr w1, w1, w2
csel w1, w3, w1, ge
str w1, [x0]
ret
One thing is to show the intent to compiler and another is to allow its do the job. Indeed for floating point values, it is generally better to use fmax, but in practice compiler should recognize the pattern. |
I thought I might've, but I double-checked!
The pattern as written ( |
On x86 at least it is compiled directly to double foo(double a, double b) {
return ((a) > (b) ? (a) : (b));
} foo: # @foo
vmaxsd xmm0, xmm0, xmm1
ret If it indeed is not universal for other platforms to have similar behavior, than indeed using fmax directly would help. |
This is exactly the behavior of |
Seeing as this has been brought up again in #806, FWIW I continue to think that even NaNs don’t actually matter to libass and we could/should just be using Of course, an abundance of caution is always good and a careful audit of the relevant code before the switch is flipped won’t hurt. But I don’t think “libass depends on NaNs for correctness” is an accurate statement: I don’t think we ever intended to do this, and we should have no reason to do so. |
I agree. I only mentioned it before, because there are |
This was brought up in #780 (comment).
-ffast-math
is shorthand for several flags:-fno-math-errno
(we decided in build: add -fno-math-errno to allow inlining of math functions #780 that we definitely want this)-funsafe-math-optimizations
, which is shorthand for:-fno-signed-zeros
(probably fine?)-fno-trapping-math
(unclear what this would mean for us?)-fassociative-math
(unsure)-freciprocal-math
(unsure)-ffinite-math-only
(we have a coupleisnan
s; not sure if this is safe wrt those?)-fno-rounding-math
(this is actually the default anyway?)-fno-signaling-nans
(this is also the default)-fcx-limited-range
(we don't use complex numbers)-fexcess-precision=fast
(this is the default)So this leaves
-fno-signed-zeros
,-fno-trapping-math
,-fassociative-math
,-freciprocal-math
, and-ffinite-math-only
up for discussion. One simple test would be to build with and without each flag, and see what code changes and what perf impact they have.I think if we have cases where
-fassociative-math
or-freciprocal-math
help (and are safe), we should probably be explicitly adjusting our floating-point code to allow the compiler to produce optimal output even without those flags.The text was updated successfully, but these errors were encountered: