-
Notifications
You must be signed in to change notification settings - Fork 841
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change Array::logical_nulls
to only copy when necessary
#5209
Conversation
pub use binary_array::*; | ||
|
||
mod boolean_array; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of this re-organisation, it makes it harder to notice if you've missed a re-export
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't do it on purpose -- my editor must have done it. I will revert it if we proceed with this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious if this is showing up in benchmarks as an overhead, a dyn dispatch is already quite expensive. In general something is a bit off if you're calling logical_nulls in a hot loop, as you should probably extract the null mask outside of this loop...
🤔 @tustvold and @Dandandan hvae points out that the overhead of copying NullBuffers is probably small The size of a NullBuffer is 48 bytes. println!("sizeof null buffer: {}", std::mem::size_of::<NullBuffer>());
So this PR would save copying 48 bytes and 1 atomic increment for the common case where the null buffer is not computed |
No, this is only a theoretical concern |
It is hard for me to approve a breaking change without a strong performance justification. It is also worth noting that Cow adds a branch on access, which might actually be far worse than some atomic increments... |
I agree to merge this we need performance benchmarks. I will try and find time to see if I can find some data one way or the other |
Sorry for not closing this sooner. BTW @andygrove added an automatic workflow to close stale PRs (like this one) in DataFusion: apache/datafusion#10046 It may be worth considering something like that for this repo too |
Which issue does this PR close?
Closes #5208
Rationale for this change
I would like to avoid copies when unecessary
What changes are included in this PR?
Cow<NullBuffer>
rather thanNullBuffer
Cow
s requires finagling that is sometimes obtuse)Are there any user-facing changes?
Yes, this is an API change
I tried to make it as easy as possible with documentation