-
Notifications
You must be signed in to change notification settings - Fork 850
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Parquet] Reuse buffer in ByteViewArrayDecoderPlain
#6930
Conversation
ByteViewArrayDecoderPlain
ByteViewArrayDecoderPlain
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @XiangpengHao -- this looks great to me
arrow-buffer/src/buffer/immutable.rs
Outdated
@@ -79,6 +79,21 @@ impl Buffer { | |||
} | |||
} | |||
|
|||
/// Auxiliary method to create a new Buffer | |||
/// | |||
/// This is convenient for converting a [`bytes::Bytes`] to a [`Buffer`] without copying. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
FYI @kylebarron
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this instead be a From implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to recommend a From
implementation in addition to the explicit function
My rationale is that Rust experts are likely to find the From
implementation, but for beginners/intermediate, having an explicit method is easier to find
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the name is very confusing, as it isn't clear what external means. This confusion is not helped by the fact arrow has a Bytes type that isn't currently public but the from_bytes method is.
Just adding a From implementation is a way to decouple this PR from cleaning this mess up #6754
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some ideas of how to document / make this easier to find -- be back shortly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a proposal to add From
impls and documentation:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds great! Let's wait #6939 to be merged and I'll rebase to that commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll rebase to that commit
Done!
Co-authored-by: Raphael Taylor-Davies <[email protected]>
bc838a4
to
0111ae6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @XiangpengHao
Thanks again everyone! |
* reuse buffer in view array * Update parquet/src/arrow/array_reader/byte_view_array.rs Co-authored-by: Raphael Taylor-Davies <[email protected]> * use From<Bytes> instead --------- Co-authored-by: Raphael Taylor-Davies <[email protected]>
* reuse buffer in view array * Update parquet/src/arrow/array_reader/byte_view_array.rs Co-authored-by: Raphael Taylor-Davies <[email protected]> * use From<Bytes> instead --------- Co-authored-by: Raphael Taylor-Davies <[email protected]>
* reuse buffer in view array * Update parquet/src/arrow/array_reader/byte_view_array.rs Co-authored-by: Raphael Taylor-Davies <[email protected]> * use From<Bytes> instead --------- Co-authored-by: Raphael Taylor-Davies <[email protected]>
Which issue does this PR close?
Part of #6921
Rationale for this change
Avoid adding a new buffer to the stringview buffer if the buffer-to-be-add is the same as the last buffer in the buffer list.
This is typically not a problem as we usually have large continuous reads. But it becomes a problem with row-level filtering where we can have hundreds/thousands of small reads over the same buffer.
What changes are included in this PR?
Are there any user-facing changes?