128 bit arithmetic upgrades #4886

skrzypo987 · 2020-08-19T08:18:19Z

Addition throughput improved by ~7% by using 2x64 instead of 4x32 bit operations.
Everything improved 0.1-0.5% by changing & to >>>

kokosing · 2020-08-19T10:14:17Z

presto-spi/src/main/java/io/prestosql/spi/type/UnscaledDecimal128Arithmetic.java

@@ -724,27 +724,27 @@ public static void negate(Slice decimal)



It appears that bit shift is slightly better

What do you mean better? faster?

faster obviously. good catch.

skrzypo987 · 2020-09-07T09:16:53Z

@kokosing @dain @sopel39 anyone?

sopel39 · 2020-09-08T10:38:32Z

presto-spi/src/main/java/io/prestosql/spi/type/UnscaledDecimal128Arithmetic.java

    }

    private static boolean isNegative(int lastRawHigh)
    {
-        return (lastRawHigh & SIGN_INT_MASK) != 0;


is the SIGN_INT_MASK still needed?

Used 7 times

sopel39 · 2020-09-08T10:53:54Z

presto-spi/src/main/java/io/prestosql/spi/type/UnscaledDecimal128Arithmetic.java

+        long intermediateResult = l0 + r0;
+        long z0 = intermediateResult;
+        // Unsigned compare
+        int overflow = intermediateResult + Long.MIN_VALUE < l0 + Long.MIN_VALUE ? 1 : 0;


Could you explain why it works? Why do we need to add MIN_VALUE to both sides? Please add a comment in code also.

This is Long.compareUnsigned() inlined.
I added a comment

presto-spi/src/main/java/io/prestosql/spi/type/UnscaledDecimal128Arithmetic.java

sopel39 · 2020-09-08T10:56:45Z

presto-spi/src/main/java/io/prestosql/spi/type/UnscaledDecimal128Arithmetic.java


-        int z0 = (int) intermediateResult;
+        intermediateResult = l1 + r1 + overflow;


intermediateResult is not need until this point

sopel39 · 2020-09-08T10:58:14Z

presto-spi/src/main/java/io/prestosql/spi/type/UnscaledDecimal128Arithmetic.java

+        long z0 = l0 - r0;
+        // Unsigned compare
+        int overflow = z0 + Long.MIN_VALUE > l0 + Long.MIN_VALUE ? 1 : 0;
+        long z1 = l1 - r1 - overflow;


previously we added (intermediateResult >> 32), but now we remove overflow. Is this correct?

The logic of subtraction is different than before
The previous one was just simple 4x subtraction on two 32-bit values within 64-bit placeholder. The intermediateResult >> 32 was the remainder, not the overflow.
Now the overflow is just a flag indicating whether r0 > l0 so that we need to "borrow" 2^64 from l1.

It appears that bit shift is slightly faster that &

Addition benchmark relied only on values that needed rescaling, which is a multiplication by a power of 10. Benchmarking unscaled values makes addition benchmark more synthetic.

sopel39

lgtm % small comment

sopel39 · 2020-09-08T13:53:02Z

presto-spi/src/main/java/io/prestosql/spi/type/UnscaledDecimal128Arithmetic.java

-        long intermediateResult;
-        intermediateResult = toUnsignedLong(l0) + toUnsignedLong(r0);
+        long z0 = l0 + r0;
+        // Long.unsignedCompare() inlined


It's not inlined method as unsignedCompare can return -1, 0 or 1.

Could you extract unsignedIsSmaller method and add a comment what it's based on?

presto-spi/src/main/java/io/prestosql/spi/type/UnscaledDecimal128Arithmetic.java

sopel39 · 2020-09-08T13:57:31Z

presto-spi/src/main/java/io/prestosql/spi/type/UnscaledDecimal128Arithmetic.java

+        long z0 = l0 - r0;
+        // Long.unsignedCompare() inlined
+        // (unsigned) z0 > (unsigned) l0
+        int overflow = z0 + Long.MIN_VALUE > l0 + Long.MIN_VALUE ? 1 : 0;


should this be called underflow?

As opposed to 4*32 bit values. Done for subtraction as well

sopel39 · 2020-09-09T11:33:39Z

merged, thanks!

Crossoverrr · 2020-11-19T08:18:42Z

Hi @skrzypo987 , thank you for raising this PR, we want to use the feature, but for some reasons we cannot update presto to the latest version, so can you offer me some test cases if you could receive this message. My email is [email protected], or you can contact me on slack, you can search wupeng in slack. :)

cla-bot bot added the cla-signed label Aug 19, 2020

skrzypo987 requested review from sopel39, kokosing and wendigo August 19, 2020 08:18

kokosing reviewed Aug 19, 2020

View reviewed changes

skrzypo987 force-pushed the 128-bit-arithmetic-upgrades branch from f99e7c1 to f4b838b Compare August 20, 2020 07:04

findepi requested a review from dain August 20, 2020 10:31

kokosing approved these changes Sep 7, 2020

View reviewed changes

sopel39 reviewed Sep 8, 2020

View reviewed changes

skrzypo987 added 2 commits September 8, 2020 14:15

Change way of checking sign in 128-bit arithmetic

71a8fcf

It appears that bit shift is slightly faster that &

Add unscaled values to decimal operators addition benchmark

eae6fbc

Addition benchmark relied only on values that needed rescaling, which is a multiplication by a power of 10. Benchmarking unscaled values makes addition benchmark more synthetic.

skrzypo987 force-pushed the 128-bit-arithmetic-upgrades branch from f4b838b to 800e684 Compare September 8, 2020 12:33

skrzypo987 requested a review from sopel39 September 8, 2020 12:34

sopel39 approved these changes Sep 8, 2020

View reviewed changes

sopel39 reviewed Sep 8, 2020

View reviewed changes

presto-spi/src/main/java/io/prestosql/spi/type/UnscaledDecimal128Arithmetic.java Outdated Show resolved Hide resolved

sopel39 reviewed Sep 8, 2020

View reviewed changes

skrzypo987 force-pushed the 128-bit-arithmetic-upgrades branch from 800e684 to aa89bf9 Compare September 9, 2020 06:21

skrzypo987 added 2 commits September 9, 2020 10:02

Make 128-bit addition use 2*64 bit values

c7cb2f0

As opposed to 4*32 bit values. Done for subtraction as well

Extract local variable

53ce46e

skrzypo987 force-pushed the 128-bit-arithmetic-upgrades branch from aa89bf9 to 53ce46e Compare September 9, 2020 08:03

sopel39 merged commit 4e97cb0 into trinodb:master Sep 9, 2020

sopel39 mentioned this pull request Sep 9, 2020

Release notes for 342 #5111

Closed

9 tasks

skrzypo987 deleted the 128-bit-arithmetic-upgrades branch September 10, 2020 07:18

martint added this to the 342 milestone Sep 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

128 bit arithmetic upgrades #4886

128 bit arithmetic upgrades #4886

skrzypo987 commented Aug 19, 2020 •

edited

Loading

kokosing Aug 19, 2020

skrzypo987 Aug 19, 2020

skrzypo987 commented Sep 7, 2020

sopel39 Sep 8, 2020

skrzypo987 Sep 8, 2020

sopel39 Sep 8, 2020

skrzypo987 Sep 8, 2020

sopel39 Sep 8, 2020

skrzypo987 Sep 8, 2020

sopel39 Sep 8, 2020

skrzypo987 Sep 8, 2020

sopel39 left a comment

sopel39 Sep 8, 2020

sopel39 Sep 8, 2020

sopel39 commented Sep 9, 2020

Crossoverrr commented Nov 19, 2020

		@@ -724,27 +724,27 @@ public static void negate(Slice decimal)


		int z0 = (int) intermediateResult;
		intermediateResult = l1 + r1 + overflow;

128 bit arithmetic upgrades #4886

128 bit arithmetic upgrades #4886

Conversation

skrzypo987 commented Aug 19, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

skrzypo987 commented Sep 7, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sopel39 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sopel39 commented Sep 9, 2020

Crossoverrr commented Nov 19, 2020

skrzypo987 commented Aug 19, 2020 •

edited

Loading