Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Parquet] Reuse buffer in ByteViewArrayDecoderPlain #6930

Merged
merged 3 commits into from
Jan 8, 2025

Conversation

XiangpengHao
Copy link
Contributor

Which issue does this PR close?

Part of #6921

Rationale for this change

Avoid adding a new buffer to the stringview buffer if the buffer-to-be-add is the same as the last buffer in the buffer list.

This is typically not a problem as we usually have large continuous reads. But it becomes a problem with row-level filtering where we can have hundreds/thousands of small reads over the same buffer.

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added parquet Changes to the parquet crate arrow Changes to the arrow crate labels Jan 2, 2025
@XiangpengHao XiangpengHao changed the title [Parqet] Reuse buffer in ByteViewArrayDecoderPlain [Parquet] Reuse buffer in ByteViewArrayDecoderPlain Jan 2, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @XiangpengHao -- this looks great to me

@@ -79,6 +79,21 @@ impl Buffer {
}
}

/// Auxiliary method to create a new Buffer
///
/// This is convenient for converting a [`bytes::Bytes`] to a [`Buffer`] without copying.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

FYI @kylebarron

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this instead be a From implementation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to recommend a From implementation in addition to the explicit function

My rationale is that Rust experts are likely to find the From implementation, but for beginners/intermediate, having an explicit method is easier to find

Copy link
Contributor

@tustvold tustvold Jan 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the name is very confusing, as it isn't clear what external means. This confusion is not helped by the fact arrow has a Bytes type that isn't currently public but the from_bytes method is.

Just adding a From implementation is a way to decouple this PR from cleaning this mess up #6754

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some ideas of how to document / make this easier to find -- be back shortly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great! Let's wait #6939 to be merged and I'll rebase to that commit

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll rebase to that commit

Done!

@github-actions github-actions bot removed the arrow Changes to the arrow crate label Jan 6, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me. Let's wait a day in case @tustvold or @etseidl would like a chance to review as well

Copy link
Contributor

@etseidl etseidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @XiangpengHao

@alamb alamb merged commit a47d996 into apache:main Jan 8, 2025
16 checks passed
@alamb
Copy link
Contributor

alamb commented Jan 8, 2025

Thanks again everyone!

CurtHagenlocher pushed a commit to CurtHagenlocher/arrow-rs that referenced this pull request Jan 13, 2025
* reuse buffer in view array

* Update parquet/src/arrow/array_reader/byte_view_array.rs

Co-authored-by: Raphael Taylor-Davies <[email protected]>

* use From<Bytes> instead

---------

Co-authored-by: Raphael Taylor-Davies <[email protected]>
svencowart pushed a commit to elastiflow/arrow-rs that referenced this pull request Jan 14, 2025
* reuse buffer in view array

* Update parquet/src/arrow/array_reader/byte_view_array.rs

Co-authored-by: Raphael Taylor-Davies <[email protected]>

* use From<Bytes> instead

---------

Co-authored-by: Raphael Taylor-Davies <[email protected]>
totoroyyb pushed a commit to totoroyyb/arrow-rs that referenced this pull request Jan 20, 2025
* reuse buffer in view array

* Update parquet/src/arrow/array_reader/byte_view_array.rs

Co-authored-by: Raphael Taylor-Davies <[email protected]>

* use From<Bytes> instead

---------

Co-authored-by: Raphael Taylor-Davies <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants