Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix inconsistencies in aggregations and query library functions. #5368

Merged
merged 14 commits into from
May 6, 2024

Conversation

chipkent
Copy link
Member

@chipkent chipkent commented Apr 16, 2024

Aggregation operations in query library functions and built-in query aggregations are inconsistent. This PR makes them consistent. Query library functions were changed.

  • percentile now returns the primitive type.
  • sum returns a widened type of double for floating point inputs or long for integer inputs.
  • product returns a widened type of double for floating point inputs or long for integer inputs.
  • cumsum returns a widened type of double[] for floating point inputs or long[] for integer inputs.
  • cumprod returns a widened type of double[] for floating point inputs or long[] for integer inputs.
  • wsum returns a widened type of long for all integer inputs and double for inputs containing floating points.

Note: Because the types have changed, the NULL return values have changed as well.

Resolves #4023

engine/function/src/templates/Numeric.ftl Outdated Show resolved Hide resolved
engine/function/src/templates/Numeric.ftl Show resolved Hide resolved
engine/function/src/templates/Numeric.ftl Show resolved Hide resolved
engine/function/src/templates/Numeric.ftl Outdated Show resolved Hide resolved
@chipkent chipkent requested a review from lbooker42 April 16, 2024 16:57
lbooker42
lbooker42 previously approved these changes Apr 16, 2024
if (isNaN(c)) {
return ${pt.boxed}.NaN;
return Double.NaN;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we handle infinities the way aggs do?

    private double currentValueWithSum(long totalNormalCount, long totalNanCount, long totalPositiveInfinityCount,
            long totalNegativeInfinityCount, double newSum) {
        if (totalNanCount > 0 || (totalPositiveInfinityCount > 0 && totalNegativeInfinityCount > 0)) {
            return Double.NaN;
        }
        if (totalNegativeInfinityCount > 0) {
            return Double.NEGATIVE_INFINITY;
        }
        if (totalPositiveInfinityCount > 0) {
            return Double.POSITIVE_INFINITY;
        }
        if (totalNormalCount == 0) {
            return QueryConstants.NULL_DOUBLE;
        }
        return (double) newSum;
    }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They should give the same result, and the current impl has less branching. The current impl could be improved with a short circuit.

image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the new short circuit code, we have:

                if (isNaN(c) || isNaN(sum)) {
                    return Double.NaN;
                }

I think this is as good as we can do. We can only short circuit on NaN. The behavior should be the same as the aggs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit test added.

return NULL_DOUBLE;
}

return hasZero ? 0 : prod;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're missing a hasInf case here. Presumably we need that to be Double.NaN or the correct sided-infinity, I'm honestly not sure which is the most consistent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Including the new short circuit logic, this is the code.

        try ( final ${pt.vectorIterator} vi = values.iterator() ) {
            while ( vi.hasNext() ) {
                final ${pt.primitive} c = vi.${pt.iteratorNext}();

                if (isNaN(c) || isNaN(prod)) {
                    return Double.NaN;
                } else if (Double.isInfinite(c)) {
                    if (hasZero) {
                        return Double.NaN;
                    }
                    hasInf = true;
                } else if (c == 0) {
                    if (hasInf) {
                        return Double.NaN;
                    }
                    hasZero = true;
                }

                if (!isNull(c)) {
                    count++;
                    prod *= c;
                }
            }
        }

        if (count == 0) {
            return NULL_DOUBLE;
        }

        return hasZero ? 0 : prod;

Infinite values behave as expected.
image
So the code here should be behaving properly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit test added.


/**
* Returns the cumulative sum. Null values are excluded.
*
* @param values values.
* @return cumulative sum of non-null values.
*/
public static ${pt.primitive}[] cumsum(${pt.vector} values) {
<#if pt.valueType.isFloat >
public static double[] cumsum(${pt.vector} values) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need the same kind of infinity handling as in sum.

Copy link
Member Author

@chipkent chipkent Apr 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in the other cases, the code should be handling infinity fine. Better short-circuiting was added.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit test added.

engine/function/src/templates/Numeric.ftl Outdated Show resolved Hide resolved
engine/function/src/templates/Numeric.ftl Outdated Show resolved Hide resolved
engine/function/src/templates/Numeric.ftl Outdated Show resolved Hide resolved
if (isNaN(v)) {
Arrays.fill(result, i, n, Double.NaN);
return result;
} else if (isNull(result[i - 1])) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe missing infinity handling?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better short-circuiting was added. Infinity handling should be fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit test added.

engine/function/src/templates/Numeric.ftl Outdated Show resolved Hide resolved
vsum += (double) c * w;
<#else>
vsum += c * (double) w;
</#if>
}
}
}

return vsum;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need some infinity handling here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better short-circuiting was added. Infinity handling should be fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit test added.

kosak
kosak previously approved these changes Apr 16, 2024
@chipkent chipkent dismissed stale reviews from kosak and lbooker42 via aea35f6 April 16, 2024 20:33
@chipkent chipkent requested review from rcaudy, lbooker42 and kosak April 16, 2024 21:32
@chipkent chipkent added the devrel-watch DevRel team is watching label Apr 17, 2024
@chipkent chipkent merged commit 3020365 into deephaven:main May 6, 2024
17 checks passed
@chipkent chipkent deleted the 4023_agg_inconsitencies branch May 6, 2024 17:49
@github-actions github-actions bot locked and limited conversation to collaborators May 6, 2024
@deephaven-internal
Copy link
Contributor

Labels indicate documentation is required. Issues for documentation have been opened:

Community: deephaven/deephaven-docs-community#206

@chipkent chipkent removed the devrel-watch DevRel team is watching label Aug 5, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Vector/Formula Aggregation Inconsistency
5 participants