Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite most aggregations as annotated functions #11477

Merged
merged 27 commits into from
May 27, 2022

Conversation

dain
Copy link
Member

@dain dain commented Mar 14, 2022

Description

NOTE: This is based on #11476, so skip the first few commits that overlap

Convert all of our aggregations to annotated functions (except for reduce which uses lambdas). To do this I extended the annotation system with:

  • Add generic state variables, so a single function can be used for any type regardless of stack type
  • In-out calling convention so generic state variables can be used in operators like equals, hashcode, and comparison
  • Add support for injection into state factories and serializers, so they can access types and operators
  • Add support for multiple state variables in annotated aggregate functions

Related issues, pull requests, and links

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

I'm not sure how much of this is visible to the SPI yet.

( ) No release notes entries required.
( ) Release notes entries required with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Mar 14, 2022
@sopel39
Copy link
Member

sopel39 commented Mar 14, 2022

cc @radek-starburst @lukasz-stec @skrzypo987

@dain dain force-pushed the generic-annotated-aggregation branch 4 times, most recently from 561c20f to e14d83a Compare May 15, 2022 03:17
@dain dain force-pushed the generic-annotated-aggregation branch from e14d83a to 1a6dbd6 Compare May 15, 2022 05:13
Copy link
Member

@electrum electrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commits through "Convert qdigest_agg and merge aggregations to annotated functions" look good

Copy link
Member

@electrum electrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commits through "Convert avg(REAL) aggregation to annotated function" look good

@dain dain force-pushed the generic-annotated-aggregation branch 2 times, most recently from a363a9c to e3af9f6 Compare May 27, 2022 01:31
@dain dain force-pushed the generic-annotated-aggregation branch from e3af9f6 to 7c24aa0 Compare May 27, 2022 04:14
@dain dain merged commit 2b9734d into trinodb:master May 27, 2022
@dain dain deleted the generic-annotated-aggregation branch May 27, 2022 18:32
@github-actions github-actions bot added this to the 383 milestone May 27, 2022
Comment on lines +451 to +459
Optional<Class<?>> nativeContainerType = Arrays.stream(annotations)
.filter(SqlType.class::isInstance)
.map(SqlType.class::cast)
.findFirst()
.map(SqlType::nativeContainerType);
// Note: this cannot be done as a chain due to strange generic type mismatches
if (nativeContainerType.isPresent() && !nativeContainerType.get().equals(Object.class)) {
parameterType = nativeContainerType.get();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Challenge accepted ;)

Suggested change
Optional<Class<?>> nativeContainerType = Arrays.stream(annotations)
.filter(SqlType.class::isInstance)
.map(SqlType.class::cast)
.findFirst()
.map(SqlType::nativeContainerType);
// Note: this cannot be done as a chain due to strange generic type mismatches
if (nativeContainerType.isPresent() && !nativeContainerType.get().equals(Object.class)) {
parameterType = nativeContainerType.get();
}
parameterType = Arrays.stream(annotations)
.filter(SqlType.class::isInstance)
.map(SqlType.class::cast)
.findFirst()
.<Class<?>>map(SqlType::nativeContainerType)
.filter(not(Object.class::equals))
.orElse(parameterType);

@sopel39
Copy link
Member

sopel39 commented May 30, 2022

@dain this does not split aggregation stats into smaller states, right? Hence there shouldn't be perf difference?

@przemekak
Copy link
Member

Here is the report with comparison before and after this change, looks like result are stable without any regressions (maybe even slightly better).
Aggregations refactor comparison.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

5 participants