-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
128 bit arithmetic upgrades #4886
128 bit arithmetic upgrades #4886
Conversation
@@ -724,27 +724,27 @@ public static void negate(Slice decimal) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears that bit shift is slightly better
What do you mean better
? faster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
faster obviously. good catch.
f99e7c1
to
f4b838b
Compare
} | ||
|
||
private static boolean isNegative(int lastRawHigh) | ||
{ | ||
return (lastRawHigh & SIGN_INT_MASK) != 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the SIGN_INT_MASK
still needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used 7 times
long intermediateResult = l0 + r0; | ||
long z0 = intermediateResult; | ||
// Unsigned compare | ||
int overflow = intermediateResult + Long.MIN_VALUE < l0 + Long.MIN_VALUE ? 1 : 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain why it works? Why do we need to add MIN_VALUE
to both sides? Please add a comment in code also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is Long.compareUnsigned() inlined.
I added a comment
presto-spi/src/main/java/io/prestosql/spi/type/UnscaledDecimal128Arithmetic.java
Outdated
Show resolved
Hide resolved
|
||
int z0 = (int) intermediateResult; | ||
intermediateResult = l1 + r1 + overflow; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
intermediateResult
is not need until this point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
corrected
long z0 = l0 - r0; | ||
// Unsigned compare | ||
int overflow = z0 + Long.MIN_VALUE > l0 + Long.MIN_VALUE ? 1 : 0; | ||
long z1 = l1 - r1 - overflow; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
previously we added (intermediateResult >> 32)
, but now we remove overflow
. Is this correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic of subtraction is different than before
The previous one was just simple 4x subtraction on two 32-bit values within 64-bit placeholder. The intermediateResult >> 32 was the remainder, not the overflow.
Now the overflow is just a flag indicating whether r0 > l0 so that we need to "borrow" 2^64 from l1.
It appears that bit shift is slightly faster that &
Addition benchmark relied only on values that needed rescaling, which is a multiplication by a power of 10. Benchmarking unscaled values makes addition benchmark more synthetic.
f4b838b
to
800e684
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm % small comment
long intermediateResult; | ||
intermediateResult = toUnsignedLong(l0) + toUnsignedLong(r0); | ||
long z0 = l0 + r0; | ||
// Long.unsignedCompare() inlined |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not inlined method as unsignedCompare
can return -1, 0 or 1
.
Could you extract unsignedIsSmaller
method and add a comment what it's based on?
presto-spi/src/main/java/io/prestosql/spi/type/UnscaledDecimal128Arithmetic.java
Outdated
Show resolved
Hide resolved
long z0 = l0 - r0; | ||
// Long.unsignedCompare() inlined | ||
// (unsigned) z0 > (unsigned) l0 | ||
int overflow = z0 + Long.MIN_VALUE > l0 + Long.MIN_VALUE ? 1 : 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be called underflow
?
800e684
to
aa89bf9
Compare
As opposed to 4*32 bit values. Done for subtraction as well
aa89bf9
to
53ce46e
Compare
merged, thanks! |
Hi @skrzypo987 , thank you for raising this PR, we want to use the feature, but for some reasons we cannot update presto to the latest version, so can you offer me some test cases if you could receive this message. My email is [email protected], or you can contact me on slack, you can search wupeng in slack. :) |
Addition throughput improved by ~7% by using 2x64 instead of 4x32 bit operations.
Everything improved 0.1-0.5% by changing & to >>>