Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: vortex-buffer #1742

Merged
merged 86 commits into from
Dec 30, 2024
Merged

refactor: vortex-buffer #1742

merged 86 commits into from
Dec 30, 2024

Conversation

gatesn
Copy link
Contributor

@gatesn gatesn commented Dec 20, 2024

Just a small one for y'all.

vortex-buffer/src/lib.rs is a decent starting point for this change.

This PR introduces Buffer<T> and BufferMut<T> for as (im)mutable zero-copy runtime-aligned buffers.
You can think of BufferMut as a Vec, but every time it re-allocates to resize, it ensures the new buffer remains aligned as requested.

This is neat. Unlike Vec<T>, it allows you to build buffers with much larger alignment, for example always 128 bytes for FastLanes, or 4096/8192 for page-aligned buffers.

Limitation: no zero-copy from Vec<T>

The implementation essentially wraps Bytes and BytesMut to provide this functionality. As a result, there is one major annoying limitation: we cannot go from Vec<T> to Buffer<T> and back to Vec<T> with zero-copy. Bytes::from_owner is able to take an externally allocated buffer (i.e. Vec<T>), but it will not return it back to you.

I've decided to entirely disallow going from Vec<T> to Buffer<T> to avoid this odd performance foot-gun. Although in doing this, I force a copy Buffer::<T>::copy_from_vec... whereas if we did allow conversion from Vec<T> then we would only pay for the copy if we tried to turn it into a BufferMut<T>. So.... not sure.

We can see an example of this when encoding ALP arrays. The result our library returns is a Vec<T>. So we are forced to copy. One way around this is to allow zero-copy into Buffer<T>, the other way is to make the ALP library generic over V: Default + Extend<T> to allow the library to initialize and append to a collection of elements in some arbitrary container type.

Perhaps an argument for allowing From<Vec<T>> is that we would rarely benefit from into_mut for arrays constructed by hand by users in-memory (in-memory arrays constructed by us should use Buffer<T>). And any arrays loaded from disk, would be loaded into a Buffer<T>...

Wire Break

This PR adds an alignment: u16 field to every flatbuffer Array's buffers. This lets us know the desired alignment for a given buffer deep within the I/O stack, helping to avoid copies later on.

encodings/bytebool/src/array.rs Outdated Show resolved Hide resolved

let arr = match validity {
Validity::AllValid | Validity::NonNullable => {
let bools = match_each_integer_ptype!(indices.ptype(), |$I| {
indices.maybe_null_slice::<$I>()
indices.as_slice::<$I>()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, the name maybe_null_slice was better. Seems to easy for a caller to mistakenly expect as_slice to return the semantic values rather than a slice over underlying buffer that ignores validity

encodings/fastlanes/src/bitpacking/mod.rs Show resolved Hide resolved
/// Write the flatbuffer into a [`Buffer`].
fn write_flatbuffer_bytes(&self) -> Buffer;
/// Write the flatbuffer into a [`Bytes`].
fn write_flatbuffer_bytes(&self) -> Bytes;
Copy link
Member

@lwwmanning lwwmanning Dec 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this also write into the FlatBuffer alias of ConstBuffer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know...

So FlatBuffers don't have any prescribed alignment, so long as the individual values are sufficiently aligned. e.g. if you store a int64 it must be 8 byte aligned.

So the vec that comes out of FlatBufferBuilder might not be 8 byte aligned. We could force it to be (with a possible copy), but the flatbuffer isn't wrong or broken. So it seems an odd API to force the copy when it may not be necessary.

Does that make sense?

@@ -92,7 +92,7 @@ impl VortexReadAt for ObjectStoreReadAt {
// it's coming from a network stream. Internally they optimize the File implementation
// to only perform a single allocation when calling `.bytes().await`, which we
// replicate here by emitting the contents directly into our aligned buffer.
let mut buffer = Vec::with_capacity(len);
let mut buffer = BytesMut::with_capacity(len);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should put this into an aligned thing, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment about bytes::BufMut. Vortex IO can be built almost entirely around the bytes crate. vortex-buffer provides an implementation of BufMut trait that internally preserves alignment. So the caller can choose to pass in a BytesMut or a BufferMut and get the behavior they want.

/// Collects the IPC bytes into a single `Bytes`.
pub fn collect_to_buffer(self) -> VortexResult<Bytes> {
let buffers: Vec<Bytes> = self.try_collect()?;
let mut buffer = BytesMut::with_capacity(buffers.iter().map(|b| b.len()).sum());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when we're allocating these big buffers, should use our aligned version, no? (to at least 8 byte align everything)

vortex-ipc/src/messages/decoder.rs Show resolved Hide resolved
vortex-ipc/src/messages/decoder.rs Outdated Show resolved Hide resolved
@lwwmanning lwwmanning disabled auto-merge December 30, 2024 16:23
@lwwmanning lwwmanning enabled auto-merge (squash) December 30, 2024 16:23
@lwwmanning lwwmanning changed the title vortex-buffer refactor: vortex-buffer Dec 30, 2024
@lwwmanning lwwmanning disabled auto-merge December 30, 2024 16:27
@lwwmanning lwwmanning enabled auto-merge (squash) December 30, 2024 16:27
@lwwmanning lwwmanning merged commit 920b2d2 into develop Dec 30, 2024
20 checks passed
@lwwmanning lwwmanning deleted the ngates/buffers branch December 30, 2024 16:34
@gatesn gatesn restored the ngates/buffers branch December 30, 2024 16:43
@gatesn gatesn mentioned this pull request Dec 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wire-break Includes a break to the serialized IPC or file format
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants