-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Make calling to purge_nonempty_nulls
optional in various places
#12567
Comments
Targeting 23.04 release. |
I would like to connect this issue and #12786. I would like to identify any libcudf algorithms that generate nonempty nulls and only add sanitization where it is needed due to implementation details for particular algorithms. |
To add some context to this work, we recently added null sanitization checks in #14559, and also started simplifying nulls checking in #13312. It's also worth mentioning that We should continue to minimize the usage of |
Close this as it is replaced by #17356. |
There were reported performance regressions in
make_lists_column
andmake_structs_column
recently after calling topurge_nonempty_nulls
has been added to these factory functions. We need to sanitize (i.e., remove non-empty nulls) for the input data but both checking and removing non-empty nulls may incur some (even significant) overhead.I propose adding a parameter to the callers of
purge_nonempty_nulls
such as:By having such parameter (
bool sanitize_input
), we can make the calls topurge_nonempty_nulls
optional. In some places such as data IO or some custom kernel, we know for sure that all the nulls are empty thus we will not have to waste the overhead of checking non-empty nulls.The text was updated successfully, but these errors were encountered: