-
Notifications
You must be signed in to change notification settings - Fork 849
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NullArray::is_null()
returns false
incorrectly
#4835
Comments
I think this is a good one for someone who wants to make a contribution to Arrow -- it is straightforward and well specified |
This was actually an intentional change in https://github.com/apache/arrow-rs/pull/4691/files#r1292317464 |
I think it is a bug -- having Practically it means we can't use Maybe the problem is that |
The difficulty is the arrow spec very clearly states that a |
Do you agree, from a user's perspective, that |
No, as per the standard a NullArray is not nullable, its values are just null. I agree it is deeply confusing and it would be better if arrow didn't have logical nullability, but I don't really see an obvious way around this |
I don't understand what "is not nullable, its values are just null" means. https://arrow.apache.org/docs/format/Columnar.html#null-layout says
The arrow-rs https://docs.rs/arrow/latest/arrow/array/trait.Array.html#method.is_null says
Therefore shouldn't I didn't find anything contradicting this position in the .fbs format |
The problem is doing this would be inconsistent with the values returned by
Similarly for the corresponding methods on We have the exact same challenge for I agree this is unfortunate and represents a rough edge, but I'm not sure how to coherently handle this |
If this property doesn't hold for some arrays it isn't an invariant, is it? It is not clear that making |
It does hold, what you are proposing would break this 😄
I don't disagree, but I also don't agree that making it return an incorrect value to make it fit people's logical models is a good solution either 😄 |
I think we disagree on what the correct value is 😆 |
Currently the Array methods consistently return nullability solely as determined by the null buffer, irrespective of the array's logical value, i.e. if a dictionary value is null, this won't be reflected in is_null. We could change this, but doing so would significantly regress performance, especially for RunArray. The trade-off in #4691 was to not do this, and instead allow amortising this overhead computing logical_nulls as an explicit "kernel". I agree the current situation is unfortunate, but I do feel strongly that we should be consistent about what these methods mean. |
Maybe we can solve this with some additional documentation Or perhaps we can add a function like I can't imagine we could do better performance-wise than |
I'm open to suggestions, the current docs are pretty clear imo - https://docs.rs/arrow-array/latest/arrow_array/array/trait.Array.html#method.logical_nulls
But what should be do for the other arrays like dictionaries and run arrays, we need to be consistent as to what these methods mean. We can't return logical nullability for some arrays and not others |
PR is #4838 |
Describe the bug
Found by @jonahgao in apache/datafusion#7568
To Reproduce
Prints
Expected behavior
NullArray::is_null()
should return true for all elementsAdditional context
The text was updated successfully, but these errors were encountered: