-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] stop using mergeAndSetValidity for any nested type #7485
Comments
Do you want to fix |
Tried to search for |
How we do it is not that important. It would be nice to be able to think about performance too. The code as it is written right now makes a copy of the input column and sets the null mask on it. If we were to call Yes generally we have a lot of places where we might be doing really bad things and we need to actually start putting in place checks to detect these and avoid them before they happen. Because |
In C++, we allow potential pass-through for APIs like this (set null mask). That means we only make a copy if we have to (i.e., when we have to call |
This is fixed because |
Describe the bug
I filed this as a bug, but it really might just be a task.
There are multiple places where we are using
mergeAndSetValidity
on potentially nested types.This operation will overwrite the null mask on a column.
https://github.com/rapidsai/cudf/blob/d24bf11838863739da014d828156e8bf638430ff/java/src/main/native/src/ColumnViewJni.cpp#L1762
This is not safe for any nested type, including strings. CUDF does not want to support inputs for LISTS or STRINGS where the value is NULL, but the offsets are not empty. Calling
mergeAndSetValidity
does not guarantee this and can cause some very subtile bugs in the future with CUDF.We really should change the API in CUDF to throw an exception if we try to do this on anything that is not a fixed width type. We should also switch all of the code to that calls this to do the right thing in these case. That does not typically mean calling into a ColumnView or other similar API and trying to do something similar manually. If we want to do this correctly we need to do an
ifElse
call or something similar. It should not be too bad because that API already makes a copy of the input, so it may not actually be any slower.The text was updated successfully, but these errors were encountered: