-
Notifications
You must be signed in to change notification settings - Fork 919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent default values for null equality and NaN equality #11329
Comments
Partially resolved by #11236, which removes |
This issue has been labeled |
I caught another place where cudf/cpp/include/cudf/aggregation.hpp Line 592 in 536ddd0
I will make a follow-up PR that can be merged after #11621. |
How about renaming |
Partially resolves #11329. This helps to align our default behaviors for null and NaN equality across APIs, specifically for `make_collect_set_aggregation` in this PR. All functions should default to treating null values as equal to one another and NaN values as equal to one another. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - David Wendt (https://github.com/davidwendt) - Nghia Truong (https://github.com/ttnghia) URL: #11621
Not quite done yet. @bdice be careful with the wording on your PRs :) GH sees "partially resolves X" as ".... resolves X" and still closes the issue AFAICT |
Partially resolves #11329. This helps to align our default behaviors for null and NaN equality across APIs, specifically for `make_merge_sets_aggregation` in this PR. All functions should default to treating null values as equal to one another and NaN values as equal to one another. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Nghia Truong (https://github.com/ttnghia) - Vyas Ramasubramani (https://github.com/vyasr) - David Wendt (https://github.com/davidwendt) URL: #11952
In
cudf::distinct
, we default to equal nulls and equal NaNs. This aligns with the defaults ofintersect_distinct
,union_distinct
, anddifference_distinct
in PR #11043 (as of this writing - not yet merged).However, there are competing standards:
make_collect_set_aggregation
, we default to equal nulls and unequal NaNs.drop_list_duplicates
, we default to equal nulls and unequal NaNs.We should strive for a consistent set of defaults for how null and NaN equality are handled. I think I would recommend a change in
make_collect_set_aggregation
anddrop_list_duplicates
to align with the other functions mentioned above.Originally posted by @bdice in #11043 (comment)
The text was updated successfully, but these errors were encountered: