-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: adding SUM0 definition for aggregate functions #465
Conversation
ACTION NEEDED Substrait follows the Conventional Commits specification for release automation. The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. |
@@ -922,6 +922,69 @@ aggregate_functions: | |||
decomposable: MANY | |||
intermediate: fp64? | |||
return: fp64? | |||
- name: "sum0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An analog could be the multiply function returning one for a set of zero elements. Do we want to make a similar change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you’re correct. We should probably add that in another PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, no need to make this PR do too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
extensions/functions_arithmetic.yaml
Outdated
@@ -922,6 +922,69 @@ aggregate_functions: | |||
decomposable: MANY | |||
intermediate: fp64? | |||
return: fp64? | |||
- name: "sum0" | |||
description: Sum a set of values. The sum of zero elements yields zero. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative is to have one sum function with an option for how to handle the set of zero inputs. Potential complications of course arise if we want to add an option after the fact but is that something we are interested in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point. Let’s discuss further.
Options feature is interesting and that could simplify things, but not sure how to represent nullability in return statements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also pretty easy to get the behavior already with COALESCE(SUM(x), 0)
(though this is not exactly the same because it would return 0 for the sum of all null input where this function would hopefully return null). However, I don't think we need to shy away from functions that can be represented elsewhere. So I think adding a new function or option is still a good idea.
I'm not sure where exactly to draw the line between "new function" and "option to an existing function". I don't have a really strong opinion either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I think it would behave the same as COALESCE(SUM(x), 0)
. It is my understanding that if you had a set of all null values and you do a SUM0() on them, you would get 0, not null.
This is a description I found for SUM0 that I found here https://www.tabnine.com/code/java/classes/org.apache.calcite.sql.fun.SqlSumEmptyIsZeroAggFunction
Sum0 is an aggregator which returns the sum of the values which go into it like Sum. It differs in that when no non null values are applied zero is returned instead of null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then let's update the description to something like...
Sum a set of values. The sum of zero elements yields zero. Null values are ignored.
We should probably clarify this for sum
as well but that can be left for a future PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@westonpace And I think we should probably leave the return type without nulls. Is that okay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good catch. Given this updated understanding I think the return type should be DECALRED_OUTPUT
nullability and a non-nullable return type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I will update the PR. Thanks @wmalpica for your feedback as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given:
sum0([1, 2, 3]) = 6
sum0([]) = 0
What does this result in:
sum0([null, null, null]) = ?
Is it null
or 0
? I think it should be null
. If you agree we should note that in the description. In this case, I think the return
attribute still needs to be i64?
/fp64?
.
sqlite has |
Ah, so total is more of "add all the values that are present" where sum is treating null as unknown. (So for sum adding a known value to an unknown one is still unknown.) |
Note that #401 is probably related |
Yes I think SUM0 would be like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small change.
extensions/functions_arithmetic.yaml
Outdated
values: [ SILENT, SATURATE, ERROR ] | ||
nullability: DECLARED_OUTPUT | ||
decomposable: MANY | ||
intermediate: i64? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
intermediate: i64? | |
intermediate: i64 |
The intermediate is also non-nullable.
extensions/functions_arithmetic.yaml
Outdated
values: [ SILENT, SATURATE, ERROR ] | ||
nullability: DECLARED_OUTPUT | ||
decomposable: MANY | ||
intermediate: i64? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
intermediate: i64? | |
intermediate: i64 |
extensions/functions_arithmetic.yaml
Outdated
values: [ SILENT, SATURATE, ERROR ] | ||
nullability: DECLARED_OUTPUT | ||
decomposable: MANY | ||
intermediate: i64? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
intermediate: i64? | |
intermediate: i64 |
extensions/functions_arithmetic.yaml
Outdated
values: [ SILENT, SATURATE, ERROR ] | ||
nullability: DECLARED_OUTPUT | ||
decomposable: MANY | ||
intermediate: i64? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
intermediate: i64? | |
intermediate: i64 |
extensions/functions_arithmetic.yaml
Outdated
values: [ SILENT, SATURATE, ERROR ] | ||
nullability: DECLARED_OUTPUT | ||
decomposable: MANY | ||
intermediate: fp64? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
intermediate: fp64? | |
intermediate: fp64 |
extensions/functions_arithmetic.yaml
Outdated
values: [ SILENT, SATURATE, ERROR ] | ||
nullability: DECLARED_OUTPUT | ||
decomposable: MANY | ||
intermediate: fp64? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
intermediate: fp64? | |
intermediate: fp64 |
This PR includes a resolution to #259