-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inefficient and redundant implementation of aggregation states. #11820
Comments
Option 1 is reasonable. Option 2 is too complicated. Also, we're going to need to make the group id explicit in the calls anyway to be able to support enhanced aggregation implementations that can use vectorization APIs, GPU, etc. |
Option 1 is not reasonable. This is not how aggregations work and would rewrite redesigning the framework. If your goal is to reduce the extra calls in generated aggregation code, then then you could generate a single aggregation state the encodes all nested states into one object. This could help slightly but honestly, I don't think this minor performance change is worth any effort. |
BTW, my guess is the performance change is more related to explicitly passing the groupId value than it is to merging the two states together. Under the covers these grouped states are just arrays, and passing the array index value via a parameter is likely to be faster then dereferencing the array index from a field in the object.... BTW, just a guess |
That's exactly what option 1 is about -- pass the group id explicitly as an argument to the method instead of calling |
but only for some operators. It is doeable by kind of extension and changes in compilers. I am aware that it is the change to the aggregation framework and this is why I've created that issue. |
If I understand you correctly, this what |
I'm not sure you need two methods and two interfaces. It could be that for non partitioned case, group id is just 0. |
Yes, I was thinking about it. Then, you have the inteface like:
I decided that it is too confusing for our users of API (in the future). For internal purpose it is acceptable for me (and good approach). But, it is a debatable issue, that approach is simpler and more elegant. What do you think @dain , @martint ? |
I have been thinking about this for a while now, and think there is an approach that is backward and not too complex (maybe you all have already thought of this). I believe the proposal is to convert the function signature from:
to
We could adapt the old interface to the new interface using basic method handle combinators. Basically we use combinators to process the args as follows:
This can be done completely without byte code generation. So, assuming we want to make this change, we would change the AggregationMetadata to expect the new method signature, and obviously change the aggregation compiler to expect invoke the new signature. Finally, the AggregationFromAnnotationsParser would be changed to detect the old and new patterns. When the old is detected the method handle would be adapted to the new signature (using combinators). Finally, aggregations would be switched to the new style. This work should be based on the PR i have up that rewrites all (except for reduce) aggregations to annotated forms, which will make this easier. Also, that PR adds a number of new features that would conflict with this change. |
Please go to: #11820 (comment) It solves the problem of Now, we have to imagine how our "atomic" states interfaces should look like. We have two methods:
Potentially, we would have the interface:
and generated implementation of
This method causes the These are my rationale and this is why I decided that explicit states for group-id aware states should be separated states. |
Why would it fail? Maybe it could just assert group id is 0 |
It is one option. However, SummaryIt looks like we have a kind of common version. To sum up our discussion, our proposal changes could be expressed as: Make aggregation operators in Trino group oriented instead of not-group oriented. Then, basically we have:
and similarly for serializer / deserializers. Then |
Yes. Also since we are effectively introducing a v2 of the aggregation annotations, we should do a review of the APIs here since anyone using v2 has to rewrite their code. I say this because processing the very flexible super-legacy annotated aggregations makes the code very complex, so the more we can do to say "it must be exactly like this" makes this stuff easier to maintain. Specifically, we should require full annotation of the parameters aggregation, as I did rewrite PR I mentioned above. In that PR, I added support for having multiple accumulator states, and for the aggregations that switched to that mode, I added tighter controls over annotation requirements, and argument ordering. For example, I would require the argument for input method to be:
|
Sketch of the APIv2: AggregationFunction
Serializer / Deserializer
AccumulatorState
@dain , is it ok for you? |
@dain do you mean #11477? Is #12588 also something that should land first? |
@radek-starburst some notes:
@sopel39 Yes I meant #11477. My new PR #12588 moves the function interface to the SPI. Before we make the new interfaces non-experimental, we need to finish this work because there is no way to make this a backwards compatible change for the low level APIs.... so to be super clear, if we decide to land #12588 before this work is done, we must make clear that the function APIs are experimental and will change. Once this work is finalized and landed, then we could make function APIs final (non-experimental). |
edited
On my eye, we should keep the annotation here. At the beginnig we are going to support for
Yeah, I see. We would need to make a comparator groupId-aware as well. But, as you said, we could run in legacy mode and then go further to expanse our change. It should much easier to process that change step by step :). So, as you can see I edited to have the best possible version. It looks like that we:
Right? |
do @dain 's recent changes to aggregate states make this issue obsolete? |
I do not think so - I do not see how they improve partial representation of aggregation states. |
The problem
There are some redundant calls for aggregating operators that use more than one state.
Let's take the real example -
io.trino.operator.aggregation.RealAverageAggregation
This aggregation function bases on two states -
LongState
andDoubleState
. Let's call them atomic states.Currently, for an aggregation functions that uses more than one atomic state,
AccumulatorCompiler
wraps that set of atomic states toRowType
. As a result, to make groupped accumulator stategroupId
aware, the compiler generatessetGroupId
call (memory store) andgetGroupId
(memory load) for every atomic state.It could be not so expensive but it is important to note that:
addInput
method is called per every row. Let's how looks like generated code for:addIntermediate
method method is called per every partial group:evaluateIntermediate
,evaluateFinal
are similar to theaddIntermediate
Especially, that problem rises when we are trying to rewrite decimal operators (sum, avg) to use atomic state. In that case we need to replace the dedicated state with three atomic states. As a result it is impossible to rewrite this without performance degredation.
Some results
Currently, the decmial average function has a dedicated accumulator state (that means - not generated by a compiler):
LongDecimalWithOverflowAndLongState
. It was replaced by me with atomic states and collected microbenchmarkDecimalAverageAggregation.benchmark
. Actually, that metod benchmarksaddInput
method. This one is very performance sensitive because - let's say it one more time - it is executed per every row.The result is following:
Let's take the "atomic" version of
avg
decimal function and apply change that can be describedas: "Do not
setGroupId
andgetGroupId
for every atomic state. Instead of that, just passgroupId
as an argument to theDecimalAverageAggregation.inputLongDecimal
(and, respectively,add specialized methods for getting / setting state value, e.g.. By specialized metod I mean the additional methods that takes the
groupId
as an argument to get / save the value from / to an atomic state.It is just only a demonstration version to show an negative impact of setting / getting
groupId
for every atomic state).Let's look at result now:
To sum up:
*There are some actually redundant memory stores and memory loads executed per row / partial groups. That redundancy is not free - there is a perofmance degradation. This is why that issue was created. In the next section there are presentend sketeches of some possible approaches.
Propostion
There could be some sketch of the approaches we can take here:
@GroupId
and passgroupId
as an argument to methodsaddInput
,addCombine
and so on.That way makes operators
groupId
-aware. Now, operators aregroupId
agnostic.However, this change can be added without any change to existing operators. We could just rewrite some operators that are performance critical - like decimal operators.
AccumulatorMetadata
. That object could track thegroupId
for all atomic states.Moreover, that
AccumulatorMetadata
object could keep the information about potential emptiness of the group - now even for empty (null) groupcombine
,serialize
,desarialize
methods are called. In a result thecombine
method is complicated - bigger, slower, more hardly inlineable - for decimal sum it was proved that it could much faster (like 1.5x - 2x times).This issue should just open the discussion and draw our attention to it.
The text was updated successfully, but these errors were encountered: