Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow rolling aggregations for window functions #464

Conversation

kokosing
Copy link
Member

No description provided.

@cla-bot
Copy link

cla-bot bot commented Mar 12, 2019

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to [email protected]. If you are contributing on behalf of someone else (e.g., your employer), the individual CLA may not be sufficient and your employer may need the Corporate CLA signed.

@kokosing
Copy link
Member Author

It is WIP because I haven't run any tests.

I did simple stupid benchmark on a single node Presto server. Results are below, second run was executed on Presto without this patch:

presto:sf1> select count(a) from (SELECT sum(quantity) OVER (ROWS BETWEEN 2000 PRECEDING AND 2000 FOLLOWING) from lineitem) t(a);
  _col0
---------
 6001215
(1 row)

Query 20190312_185636_00020_9axmd, FINISHED, 1 node
Splits: 38 total, 38 done (100.00%)
0:02 [6M rows, 0B] [2.53M rows/s, 0B/s]

presto:sf1> select count(a) from (SELECT sum(quantity) OVER (ROWS BETWEEN 2000 PRECEDING AND 2000 FOLLOWING) from lineitem) t(a);
Query 20190312_185752_00000_ysa4v failed: No nodes available to run query

presto:sf1> select count(a) from (SELECT sum(quantity) OVER (ROWS BETWEEN 2000 PRECEDING AND 2000 FOLLOWING) from lineitem) t(a);
  _col0
---------
 6001215
(1 row)

Query 20190312_185756_00001_ysa4v, FINISHED, 1 node
Splits: 38 total, 38 done (100.00%)
5:03 [6M rows, 0B] [19.8K rows/s, 0B/s]

@findepi
Copy link
Member

findepi commented Mar 12, 2019

@kokosing

  • Extracted-From: https://github.com/Teradata/presto would be more appropriate
  • first commit has a typo: "Prever"

@sopel39
Copy link
Member

sopel39 commented Mar 13, 2019

@kokosing ping me when you consider this ready for review

@kokosing
Copy link
Member Author

Travis says:

2019-04-09T03:37:04.650-0500	INFO	pool-109-thread-2	io.prestosql.sql.gen.TestExpressionCompiler	FINISHED testBinaryOperatorsIntegerDecimal in 13.76s verified 0 expressions
Terminating due to java.lang.OutOfMemoryError: Java heap space
The 

@kokosing kokosing added WIP and removed WIP labels May 8, 2019
@kokosing kokosing force-pushed the origin/master/108_port_8974_from_legacy_presto branch from 8bd5a0c to bea1565 Compare May 8, 2019 11:08
@cla-bot
Copy link

cla-bot bot commented May 8, 2019

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to [email protected]. For more information, see https://github.com/prestosql/cla.

@cla-bot cla-bot bot removed the WIP label May 8, 2019
@dain dain added the WIP label May 9, 2019
@kokosing kokosing force-pushed the origin/master/108_port_8974_from_legacy_presto branch from bea1565 to cb58f28 Compare July 4, 2019 13:18
@cla-bot
Copy link

cla-bot bot commented Jul 4, 2019

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to [email protected]. For more information, see https://github.com/prestosql/cla.

@cla-bot cla-bot bot removed the WIP label Jul 4, 2019
@cla-bot
Copy link

cla-bot bot commented Jul 8, 2019

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to [email protected]. For more information, see https://github.com/prestosql/cla.

@kokosing kokosing force-pushed the origin/master/108_port_8974_from_legacy_presto branch from c4f7fdb to 2defb0b Compare July 15, 2019 08:04
@cla-bot
Copy link

cla-bot bot commented Jul 15, 2019

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to [email protected]. For more information, see https://github.com/prestosql/cla.

@kokosing kokosing added the WIP label Aug 4, 2019
@kokosing kokosing changed the title [WIP] Allow rolling aggregations for window functions Allow rolling aggregations for window functions Aug 4, 2019
@kokosing kokosing force-pushed the origin/master/108_port_8974_from_legacy_presto branch from 2defb0b to a51c9af Compare August 4, 2019 19:13
@cla-bot
Copy link

cla-bot bot commented Aug 4, 2019

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to [email protected]. For more information, see https://github.com/prestosql/cla.

@cla-bot cla-bot bot removed the WIP label Aug 4, 2019
@kokosing kokosing force-pushed the origin/master/108_port_8974_from_legacy_presto branch from a51c9af to 600dfbc Compare August 4, 2019 21:10
@cla-bot
Copy link

cla-bot bot commented Aug 4, 2019

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to [email protected]. For more information, see https://github.com/prestosql/cla.

@electrum
Copy link
Member

electrum commented Aug 4, 2019

The "Minor fixes" commit looks good.

@sopel39 sopel39 self-requested a review August 6, 2019 08:56
@kokosing kokosing force-pushed the origin/master/108_port_8974_from_legacy_presto branch from 600dfbc to a28b298 Compare August 6, 2019 09:23
@cla-bot
Copy link

cla-bot bot commented Aug 6, 2019

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to [email protected]. For more information, see https://github.com/prestosql/cla.

@electrum electrum self-requested a review September 9, 2019 16:19
@electrum
Copy link
Member

Add a test for rolling sum that includes nulls

Copy link
Member

@electrum electrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. @kokosing I'm assuming you did a complete review as well.

{
// Only include methods which take the same parameters as the corresponding input function
List<Method> removeInputFunctions = FunctionsParserHelper.findPublicStaticMethodsWithAnnotation(clazz, RemoveInputFunction.class).stream()
.filter(method ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be simpler as two filter calls

.filter(method ->
Arrays.equals(method.getParameterTypes(), inputFunction.getParameterTypes())
&& Arrays.deepEquals(method.getParameterAnnotations(), inputFunction.getParameterAnnotations()))
.collect(toImmutableList());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest of this can be replaced with toOptional() from MoreCollectors

@@ -86,6 +86,12 @@ public InternalAggregationFunction specialize(BoundVariables variables, int arit

// Bind provided dependencies to aggregation method handlers
MethodHandle inputHandle = bindDependencies(concreteImplementation.getInputFunction(), concreteImplementation.getInputDependencies(), variables, metadata);
Optional<MethodHandle> removeInputHandle = concreteImplementation.getRemoveInputFunction().map(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's format this like

Optional<MethodHandle> removeInputHandle = concreteImplementation.getRemoveInputFunction().map(
        removeInputFunction -> bindDependencies(removeInputFunction, concreteImplementation.getRemoveInputDependencies(), variables, metadata));

This visually aligns the bindDependencies call with the others, allowing us to easily see that it's similar.

currentStart = 0;
currentEnd = -1;
}
int overlapStart = Integer.max(frameStart, currentStart);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Math.max/min and static import

return;
}
}
// We couldn't or didn't want to modify the accumulation: instead, discard the current accumulation and start fresh.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting nits:

  • add newline before this
  • use single space after colon

@@ -57,6 +57,25 @@ public void testCountRowsOrdered()
.build());
}

@Test
public void testCountRowsRolling()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a test for average as well, since it might be able to catch a problem that count would not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also added tests for PARTITION BY

Add a test for the WindowNode use case for Accumulators.  Fix a few
bugs in the aggregation tests uncovered by the additional coverage.  The
new tests didn't uncover any product bugs.
Add a removeInput() function to some Accumulators, and when it exists,
use it in aggregate window functions to roll the aggregation forward
incrementally. Dramatically speeds up queries such as:
SELECT COUNT(quantity) OVER (ROWS BETWEEN 2000 PRECEDING AND 2000 FOLLOWING)

Extracted from: prestodb/presto#8974
Implement removeInput() in some SUM aggregations, to speed up rolling
window functions.  This requires additional storage in the
AggregationState for the input count, so that the aggregator knows when
its result should become null.

Extracted from: prestodb/presto#8974
@kokosing kokosing force-pushed the origin/master/108_port_8974_from_legacy_presto branch from a28b298 to e6b97d0 Compare September 30, 2019 11:11
@cla-bot
Copy link

cla-bot bot commented Sep 30, 2019

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to [email protected]. For more information, see https://github.com/prestosql/cla.

Copy link
Member Author

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments addressed.

@@ -57,6 +57,25 @@ public void testCountRowsOrdered()
.build());
}

@Test
public void testCountRowsRolling()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also added tests for PARTITION BY

@kokosing kokosing merged commit e8cf714 into trinodb:master Oct 1, 2019
@kokosing kokosing deleted the origin/master/108_port_8974_from_legacy_presto branch October 1, 2019 13:31
@kokosing
Copy link
Member Author

kokosing commented Oct 1, 2019

@alandpost Thank you! It took us a while to merge your work.

@kokosing kokosing mentioned this pull request Oct 1, 2019
6 tasks
@martint martint added this to the 320 milestone Oct 5, 2019
@@ -31,27 +32,34 @@
private LongSumAggregation() {}

@InputFunction
public static void sum(@AggregationState NullableLongState state, @SqlType(StandardTypes.BIGINT) long value)
public static void sum(@AggregationState LongLongState state, @SqlType(StandardTypes.BIGINT) long value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this actually causes significant perf regression as long long stats is much bigger to serialize, cc @martint @dain

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, and just filed an issue to showcase the problem #18818

@jlpetz
Copy link

jlpetz commented Sep 18, 2023

Hi Trino,

This change appears to have resulted in a substantial performance regression for certain queries we have. For example something that was taking 9min is now taking 57min to run, a 6.3x increase. Per #18818

Is this something that could be looked into?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

9 participants