-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix inconsistencies in aggregations and query library functions. #5368
Conversation
Initial changes to sum to avoid double conversions.
if (isNaN(c)) { | ||
return ${pt.boxed}.NaN; | ||
return Double.NaN; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we handle infinities the way aggs do?
private double currentValueWithSum(long totalNormalCount, long totalNanCount, long totalPositiveInfinityCount,
long totalNegativeInfinityCount, double newSum) {
if (totalNanCount > 0 || (totalPositiveInfinityCount > 0 && totalNegativeInfinityCount > 0)) {
return Double.NaN;
}
if (totalNegativeInfinityCount > 0) {
return Double.NEGATIVE_INFINITY;
}
if (totalPositiveInfinityCount > 0) {
return Double.POSITIVE_INFINITY;
}
if (totalNormalCount == 0) {
return QueryConstants.NULL_DOUBLE;
}
return (double) newSum;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the new short circuit code, we have:
if (isNaN(c) || isNaN(sum)) {
return Double.NaN;
}
I think this is as good as we can do. We can only short circuit on NaN
. The behavior should be the same as the aggs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unit test added.
return NULL_DOUBLE; | ||
} | ||
|
||
return hasZero ? 0 : prod; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're missing a hasInf
case here. Presumably we need that to be Double.NaN
or the correct sided-infinity, I'm honestly not sure which is the most consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Including the new short circuit logic, this is the code.
try ( final ${pt.vectorIterator} vi = values.iterator() ) {
while ( vi.hasNext() ) {
final ${pt.primitive} c = vi.${pt.iteratorNext}();
if (isNaN(c) || isNaN(prod)) {
return Double.NaN;
} else if (Double.isInfinite(c)) {
if (hasZero) {
return Double.NaN;
}
hasInf = true;
} else if (c == 0) {
if (hasInf) {
return Double.NaN;
}
hasZero = true;
}
if (!isNull(c)) {
count++;
prod *= c;
}
}
}
if (count == 0) {
return NULL_DOUBLE;
}
return hasZero ? 0 : prod;
Infinite values behave as expected.
So the code here should be behaving properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unit test added.
|
||
/** | ||
* Returns the cumulative sum. Null values are excluded. | ||
* | ||
* @param values values. | ||
* @return cumulative sum of non-null values. | ||
*/ | ||
public static ${pt.primitive}[] cumsum(${pt.vector} values) { | ||
<#if pt.valueType.isFloat > | ||
public static double[] cumsum(${pt.vector} values) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably need the same kind of infinity handling as in sum
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As in the other cases, the code should be handling infinity fine. Better short-circuiting was added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unit test added.
if (isNaN(v)) { | ||
Arrays.fill(result, i, n, Double.NaN); | ||
return result; | ||
} else if (isNull(result[i - 1])) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe missing infinity handling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better short-circuiting was added. Infinity handling should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unit test added.
vsum += (double) c * w; | ||
<#else> | ||
vsum += c * (double) w; | ||
</#if> | ||
} | ||
} | ||
} | ||
|
||
return vsum; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need some infinity handling here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better short-circuiting was added. Infinity handling should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unit test added.
Labels indicate documentation is required. Issues for documentation have been opened: Community: deephaven/deephaven-docs-community#206 |
Aggregation operations in query library functions and built-in query aggregations are inconsistent. This PR makes them consistent. Query library functions were changed.
percentile
now returns the primitive type.sum
returns a widened type ofdouble
for floating point inputs orlong
for integer inputs.product
returns a widened type ofdouble
for floating point inputs orlong
for integer inputs.cumsum
returns a widened type ofdouble[]
for floating point inputs orlong[]
for integer inputs.cumprod
returns a widened type ofdouble[]
for floating point inputs orlong[]
for integer inputs.wsum
returns a widened type oflong
for all integer inputs anddouble
for inputs containing floating points.Note: Because the types have changed, the NULL return values have changed as well.
Resolves #4023