Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Buffer documentation, deprecate Buffer::from_bytes add From<Bytes> and From<bytes::Bytes> impls #6939

Merged
merged 5 commits into from
Jan 6, 2025

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jan 4, 2025

Which issue does this PR close?

Rationale for this change

As described by @tv42 #6754 and @kylebarron in #6930 creating Buffers in a zero copy way is confusing, I have also been personally confused by Bytes and bytes::Bytes difference

Also, the fact it is not clear how to make a Buffer in a zero copy manner lead to unintended copies as described by @XiangpengHao in

At least part of this confusion is that Bytes is an internal (non pub) struct that is named the same as bytes::Bytes but appears in the public API

I would like to make the whole situation less confusing

What changes are included in this PR?

  1. Add documentation to Bytes making it clear it is internal and how it is related to bytes::Bytes
  2. Improve documentation on Buffer and add examples of how to construct Buffer from Vec and bytes::Bytes
  3. Add From impls to make it easier to create Buffer in zero copy manner (used in examples)
  4. Deprecate from_bytes to close Public API using private types: Buffer::from_bytes takes unexported Bytes #6754
  5. Update parquet reader to use new From impls
  6. Misc other documentation improvements

Are there any user-facing changes?

  1. Better Docs
  2. From impls for Buffer

@github-actions github-actions bot added parquet Changes to the parquet crate arrow Changes to the arrow crate labels Jan 4, 2025
@alamb alamb changed the title Alamb/better bytes docs Improve Buffer documentation, add From<Bytes> and From<bytes::Bytes> impls Jan 4, 2025
/// `Buffer`s can be sliced and cloned without copying the underlying data and can
/// be created from memory allocated by non-Rust sources such as C/C++.
///
/// # Example: Create a `Buffer` from a `Vec` (without copying)
Copy link
Contributor Author

@alamb alamb Jan 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest of this PR is driven by the examples: converting to/from bytes::Bytes and Vec. I think it will be easier to do this operation now and not get hung up on Bytes vs bytes::Bytes

They are basically an expansion of the nice example @kylebarron added in #6920

}

/// Convert from [`bytes::Bytes`], not internal [`Bytes`] to `Buffer`
impl From<bytes::Bytes> for Buffer {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the From impl proposed by @tustvold in #6930 (comment)

@@ -28,14 +28,18 @@ use crate::buffer::dangling_ptr;

/// A continuous, fixed-size, immutable memory region that knows how to de-allocate itself.
///
/// This structs' API is inspired by the `bytes::Bytes`, but it is not limited to using rust's
/// global allocator nor u8 alignment.
/// Note that this structure is an internal implementation detail of the
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully this will reduce confusion about Bytes (I have also been confused by the name before too, especially when not in an editor/IDE)

// Then we convert `arrow_buffer::Bytes` into `arrow_buffer:Buffer`, which is also zero copy
let buf = arrow_buffer::Buffer::from_bytes(self.buf.clone().into());
// Zero copy convert `bytes::Bytes` into `arrow_buffer:Buffer`
let buf = arrow_buffer::Buffer::from(self.buf.clone());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to use the From impls

@alamb alamb changed the title Improve Buffer documentation, add From<Bytes> and From<bytes::Bytes> impls Improve Buffer documentation, deprecate Buffer::from_bytes add From<Bytes> and From<bytes::Bytes> impls Jan 4, 2025
@alamb
Copy link
Contributor Author

alamb commented Jan 4, 2025

After thinking about this more I am also going to propose deprecating Buffer::from_bytes in favor of the From impl which I think will close #6754

Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me; some minor comments on wording/formatting

arrow-buffer/src/buffer/immutable.rs Outdated Show resolved Hide resolved
}
}

/// Convert from [`bytes::Bytes`], not internal `Bytes` to `Buffer`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Convert from [`bytes::Bytes`], not internal `Bytes` to `Buffer`
/// Convert from [`bytes::Bytes`] (not internal `Bytes`) to `Buffer`

parquet/src/arrow/array_reader/byte_view_array.rs Outdated Show resolved Hide resolved
parquet/src/arrow/array_reader/byte_view_array.rs Outdated Show resolved Hide resolved
arrow-buffer/src/buffer/immutable.rs Outdated Show resolved Hide resolved
arrow-buffer/src/buffer/immutable.rs Outdated Show resolved Hide resolved
@alamb
Copy link
Contributor Author

alamb commented Jan 5, 2025

Thank you for the review @Jefffrey -- I will leave this PR open for another day to allow time for additional comments

@alamb
Copy link
Contributor Author

alamb commented Jan 6, 2025

Thanks @mbrobbel as well for the look. I am merging this one in to keep the train moving and unblock #6930

@alamb alamb merged commit fc245b5 into apache:main Jan 6, 2025
28 checks passed
@alamb alamb deleted the alamb/better_bytes_docs branch January 6, 2025 21:03
CurtHagenlocher pushed a commit to CurtHagenlocher/arrow-rs that referenced this pull request Jan 13, 2025
…rom<Bytes>` and `From<bytes::Bytes>` impls (apache#6939)

* Improve Bytes documentation

* Improve Buffer documentation, add From<Bytes> and From<bytes::Bytes> impls

* avoid linking to private docs

* Deprecate `Buffer::from_bytes`

* Apply suggestions from code review

Co-authored-by: Jeffrey Vo <[email protected]>

---------

Co-authored-by: Jeffrey Vo <[email protected]>
svencowart pushed a commit to elastiflow/arrow-rs that referenced this pull request Jan 14, 2025
…rom<Bytes>` and `From<bytes::Bytes>` impls (apache#6939)

* Improve Bytes documentation

* Improve Buffer documentation, add From<Bytes> and From<bytes::Bytes> impls

* avoid linking to private docs

* Deprecate `Buffer::from_bytes`

* Apply suggestions from code review

Co-authored-by: Jeffrey Vo <[email protected]>

---------

Co-authored-by: Jeffrey Vo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate arrow-flight Changes to the arrow-flight crate parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Public API using private types: Buffer::from_bytes takes unexported Bytes
3 participants