Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rate aggregation function #106703

Merged
merged 6 commits into from
Mar 25, 2024
Merged

Add rate aggregation function #106703

merged 6 commits into from
Mar 25, 2024

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Mar 23, 2024

This PR introduces a rate aggregation function to ESQL, with two main changes:

  1. Define of the grouping state for the rate aggregation function:
  • For raw input, the function expects data to arrive in descending timestamp order without gaps; hence, perform a reduction with each incoming entry. Each grouping should consist of at most two entries: one for the starting time and one for the ending time.
  • For intermediate input, the function buffers data as they can arrive out of order, although non-overlapping. This shouldn't have significant issues, as we expect at most two entries per participating pipeline.
  • The intermediate output consists of three blocks: timestamps, values, and compensation. Both timestamps and values can contain multiple entries sorted in descending order by timestamp.
  • This rate function does not support non-grouping aggregation. However, I can enable it if we think otherwise.
  1. Modifies the GroupingAggregatorImplementer code generator to include the timestamp vector block. I explored several options to generate multiple input blocks. However, both generated and code generator are much more complicated in a generic solution. And it's unlikely that we will need another function requires multiple input blocks. Hence, I decided to tweak this class to append the timestamps long vector block when specified.

I have left out some parts of this PR to minimize its PR to make reviewing easier. I plan to follow-up these items:

  • Supporting Ordinal Grouping.
  • Tracking individual states with the CircuitBreaker
  • Testing with GroupingAggregatorFunctionTestCase: requires some changes around null inserting and input channels.
  • Integrating into the language.

Relates #106415

@dnhatn dnhatn force-pushed the rate-function branch 2 times, most recently from ab2c377 to c24b30f Compare March 23, 2024 22:33
@dnhatn dnhatn force-pushed the rate-function branch 9 times, most recently from e4071c3 to 576ddfe Compare March 24, 2024 05:25
@dnhatn dnhatn marked this pull request as ready for review March 24, 2024 18:12
@dnhatn dnhatn requested a review from a team as a code owner March 24, 2024 18:12
@elasticsearchmachine elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:StorageEngine labels Mar 24, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Copy link
Contributor

@breskeby breskeby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small suggestion to keep idiomatic gradle syntaix in build scripts

dnhatn and others added 2 commits March 24, 2024 13:30
Co-authored-by: Rene Groeschke <[email protected]>
Co-authored-by: Rene Groeschke <[email protected]>
Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice @dnhatn! I've scrolled through this change and I think this looks in the direction of how rate would be implemented in compute engine.

@@ -261,29 +261,20 @@ void consume() throws IOException {
}
} else {
int previousTsidOrd = leaf.timeSeriesHashOrd;
boolean breakOnNextTsidChange = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will be fixed when the other pr you opened is merged?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have merged this fix in another PR.

@nik9000
Copy link
Member

nik9000 commented Mar 25, 2024

Integrating into the language.

The magic that makes it so you don't have to test it all. Neat.

@IntermediateState(name = "value", type = "DOUBLE_BLOCK"),
@IntermediateState(name = "compensation", type = "DOUBLE") }
)
public class RateDoubleAggregator {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm somehow totally unsurpised we've started to use one code generation tool to kick off another code generation tool.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm somehow totally unsurpised we've started to use one code generation tool to kick off another code generation tool.

Yeah, I was inspired by your values function and I found it's less error-prone. Do you think we should convert existing functions and tests to use templates as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the next time we find something interesting to do there. I'm not sure it's worth doing without some kind of excuse.


private int dv(int v0, int v1) {
return v0 > v1 ? v1 : v1 - v0;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious if we'd be better off assuming that a reset is a wrap instead. I'm sure there's literature on the topic. It can wait for later though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

value = {
@IntermediateState(name = "timestamp", type = "LONG_BLOCK"),
@IntermediateState(name = "value", type = "INT_BLOCK"),
@IntermediateState(name = "compensation", type = "DOUBLE") }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a compensation for int and long rates? It looks a little like a Kahan sum we shouldn't need that for int and long.

I was going to ask if we need a rate flavor for int and long at all but I think it's right and proper to have one, at least for long. Lots of stuff is going to read it's values as long. And if we can avoid having compensation, all the better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a compensation for int and long rates?

I might be missing something here. But the compensation is the sum of all reset values within this group's range. Perhaps, the term compensation is confusing? Should we consider renaming it to totalReset?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah! Let's rename it so I don't think it's kahan.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've renamed compensation to reset in 89880d4.

@dnhatn
Copy link
Member Author

dnhatn commented Mar 25, 2024

@breskeby @martijnvg @nik9000 Thanks for reviews.

@dnhatn dnhatn merged commit 49cf3cb into elastic:main Mar 25, 2024
14 checks passed
@dnhatn dnhatn deleted the rate-function branch March 25, 2024 19:24
dnhatn added a commit that referenced this pull request Mar 26, 2024
We should track the memory usage of the individual state in the rate 
aggregation function.

Relates #106703
dnhatn added a commit that referenced this pull request Mar 26, 2024
Add support for ordinal grouping in the rate aggregation function.

Relates #106703
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >non-issue :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:StorageEngine v8.14.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants