Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

128 bit arithmetic upgrades #4886

Merged
merged 4 commits into from
Sep 9, 2020

Conversation

skrzypo987
Copy link
Member

@skrzypo987 skrzypo987 commented Aug 19, 2020

Addition throughput improved by ~7% by using 2x64 instead of 4x32 bit operations.
Everything improved 0.1-0.5% by changing & to >>>

@@ -724,27 +724,27 @@ public static void negate(Slice decimal)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that bit shift is slightly better

What do you mean better? faster?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

faster obviously. good catch.

@skrzypo987 skrzypo987 force-pushed the 128-bit-arithmetic-upgrades branch from f99e7c1 to f4b838b Compare August 20, 2020 07:04
@findepi findepi requested a review from dain August 20, 2020 10:31
@skrzypo987
Copy link
Member Author

@kokosing @dain @sopel39 anyone?

}

private static boolean isNegative(int lastRawHigh)
{
return (lastRawHigh & SIGN_INT_MASK) != 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the SIGN_INT_MASK still needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used 7 times

long intermediateResult = l0 + r0;
long z0 = intermediateResult;
// Unsigned compare
int overflow = intermediateResult + Long.MIN_VALUE < l0 + Long.MIN_VALUE ? 1 : 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why it works? Why do we need to add MIN_VALUE to both sides? Please add a comment in code also.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is Long.compareUnsigned() inlined.
I added a comment


int z0 = (int) intermediateResult;
intermediateResult = l1 + r1 + overflow;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intermediateResult is not need until this point

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corrected

long z0 = l0 - r0;
// Unsigned compare
int overflow = z0 + Long.MIN_VALUE > l0 + Long.MIN_VALUE ? 1 : 0;
long z1 = l1 - r1 - overflow;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previously we added (intermediateResult >> 32), but now we remove overflow. Is this correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic of subtraction is different than before
The previous one was just simple 4x subtraction on two 32-bit values within 64-bit placeholder. The intermediateResult >> 32 was the remainder, not the overflow.
Now the overflow is just a flag indicating whether r0 > l0 so that we need to "borrow" 2^64 from l1.

skrzypo987 added 2 commits September 8, 2020 14:15
It appears that bit shift is slightly faster that &
Addition benchmark relied only on values that needed rescaling,
which is a multiplication by a power of 10.
Benchmarking unscaled values makes addition benchmark more synthetic.
@skrzypo987 skrzypo987 force-pushed the 128-bit-arithmetic-upgrades branch from f4b838b to 800e684 Compare September 8, 2020 12:33
@skrzypo987 skrzypo987 requested a review from sopel39 September 8, 2020 12:34
Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % small comment

long intermediateResult;
intermediateResult = toUnsignedLong(l0) + toUnsignedLong(r0);
long z0 = l0 + r0;
// Long.unsignedCompare() inlined
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not inlined method as unsignedCompare can return -1, 0 or 1.

Could you extract unsignedIsSmaller method and add a comment what it's based on?

long z0 = l0 - r0;
// Long.unsignedCompare() inlined
// (unsigned) z0 > (unsigned) l0
int overflow = z0 + Long.MIN_VALUE > l0 + Long.MIN_VALUE ? 1 : 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be called underflow?

@skrzypo987 skrzypo987 force-pushed the 128-bit-arithmetic-upgrades branch from 800e684 to aa89bf9 Compare September 9, 2020 06:21
skrzypo987 added 2 commits September 9, 2020 10:02
As opposed to 4*32 bit values.
Done for subtraction as well
@skrzypo987 skrzypo987 force-pushed the 128-bit-arithmetic-upgrades branch from aa89bf9 to 53ce46e Compare September 9, 2020 08:03
@sopel39 sopel39 merged commit 4e97cb0 into trinodb:master Sep 9, 2020
@sopel39
Copy link
Member

sopel39 commented Sep 9, 2020

merged, thanks!

@sopel39 sopel39 mentioned this pull request Sep 9, 2020
9 tasks
@skrzypo987 skrzypo987 deleted the 128-bit-arithmetic-upgrades branch September 10, 2020 07:18
@martint martint added this to the 342 milestone Sep 24, 2020
@Crossoverrr
Copy link
Member

Hi @skrzypo987 , thank you for raising this PR, we want to use the feature, but for some reasons we cannot update presto to the latest version, so can you offer me some test cases if you could receive this message. My email is [email protected], or you can contact me on slack, you can search wupeng in slack. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

5 participants