-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: vortex-buffer #1742
refactor: vortex-buffer #1742
Conversation
|
||
let arr = match validity { | ||
Validity::AllValid | Validity::NonNullable => { | ||
let bools = match_each_integer_ptype!(indices.ptype(), |$I| { | ||
indices.maybe_null_slice::<$I>() | ||
indices.as_slice::<$I>() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, the name maybe_null_slice
was better. Seems to easy for a caller to mistakenly expect as_slice
to return the semantic values rather than a slice over underlying buffer that ignores validity
/// Write the flatbuffer into a [`Buffer`]. | ||
fn write_flatbuffer_bytes(&self) -> Buffer; | ||
/// Write the flatbuffer into a [`Bytes`]. | ||
fn write_flatbuffer_bytes(&self) -> Bytes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this also write into the FlatBuffer alias of ConstBuffer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know...
So FlatBuffers don't have any prescribed alignment, so long as the individual values are sufficiently aligned. e.g. if you store a int64 it must be 8 byte aligned.
So the vec that comes out of FlatBufferBuilder might not be 8 byte aligned. We could force it to be (with a possible copy), but the flatbuffer isn't wrong or broken. So it seems an odd API to force the copy when it may not be necessary.
Does that make sense?
@@ -92,7 +92,7 @@ impl VortexReadAt for ObjectStoreReadAt { | |||
// it's coming from a network stream. Internally they optimize the File implementation | |||
// to only perform a single allocation when calling `.bytes().await`, which we | |||
// replicate here by emitting the contents directly into our aligned buffer. | |||
let mut buffer = Vec::with_capacity(len); | |||
let mut buffer = BytesMut::with_capacity(len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should put this into an aligned thing, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment about bytes::BufMut
. Vortex IO can be built almost entirely around the bytes
crate. vortex-buffer
provides an implementation of BufMut
trait that internally preserves alignment. So the caller can choose to pass in a BytesMut
or a BufferMut
and get the behavior they want.
/// Collects the IPC bytes into a single `Bytes`. | ||
pub fn collect_to_buffer(self) -> VortexResult<Bytes> { | ||
let buffers: Vec<Bytes> = self.try_collect()?; | ||
let mut buffer = BytesMut::with_capacity(buffers.iter().map(|b| b.len()).sum()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when we're allocating these big buffers, should use our aligned version, no? (to at least 8 byte align everything)
Just a small one for y'all.
vortex-buffer/src/lib.rs
is a decent starting point for this change.This PR introduces
Buffer<T>
andBufferMut<T>
for as (im)mutable zero-copy runtime-aligned buffers.You can think of
BufferMut
as aVec
, but every time it re-allocates to resize, it ensures the new buffer remains aligned as requested.This is neat. Unlike
Vec<T>
, it allows you to build buffers with much larger alignment, for example always 128 bytes for FastLanes, or 4096/8192 for page-aligned buffers.Limitation: no zero-copy from
Vec<T>
The implementation essentially wraps
Bytes
andBytesMut
to provide this functionality. As a result, there is one major annoying limitation: we cannot go fromVec<T>
toBuffer<T>
and back toVec<T>
with zero-copy.Bytes::from_owner
is able to take an externally allocated buffer (i.e.Vec<T>
), but it will not return it back to you.I've decided to entirely disallow going from
Vec<T>
toBuffer<T>
to avoid this odd performance foot-gun. Although in doing this, I force a copyBuffer::<T>::copy_from_vec
... whereas if we did allow conversion fromVec<T>
then we would only pay for the copy if we tried to turn it into aBufferMut<T>
. So.... not sure.We can see an example of this when encoding ALP arrays. The result our library returns is a
Vec<T>
. So we are forced to copy. One way around this is to allow zero-copy intoBuffer<T>
, the other way is to make the ALP library generic overV: Default + Extend<T>
to allow the library to initialize and append to a collection of elements in some arbitrary container type.Perhaps an argument for allowing
From<Vec<T>>
is that we would rarely benefit frominto_mut
for arrays constructed by hand by users in-memory (in-memory arrays constructed by us should useBuffer<T>
). And any arrays loaded from disk, would be loaded into aBuffer<T>
...Wire Break
This PR adds an
alignment: u16
field to every flatbuffer Array's buffers. This lets us know the desired alignment for a given buffer deep within the I/O stack, helping to avoid copies later on.