-
Notifications
You must be signed in to change notification settings - Fork 388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eager/early serialize of components to arrow in Rust and C++ #7245
Labels
🌊 C++ API
C/C++ API specific
💬 discussion
🪵 Log & send APIs
Affects the user-facing API for all languages
🦀 Rust API
Rust logging API
Comments
Wumpf
added
💬 discussion
🦀 Rust API
Rust logging API
🌊 C++ API
C/C++ API specific
🪵 Log & send APIs
Affects the user-facing API for all languages
labels
Aug 20, 2024
6 tasks
6 tasks
teh-cmc
added a commit
that referenced
this issue
Aug 23, 2024
It doesn't make any sense for a `ComponentBatch` to have any say in what the final `ArrowField` should look like. An `ArrowField` is a `Chunk`/`RecordBatch`/`Schema`-level concern that only makes sense during IO/transport/FFI/storage/etc, and which requires external context that a single `ComponentBatch` on its own has no idea of. --- Part of a lot of clean up I want to while we head towards: * #7245 * #3741
This was referenced Aug 23, 2024
emilk
changed the title
Eagerly serialize components upon
Eager/early serialize of components to arrow in Rust and C++
Nov 4, 2024
Archetype
& ComponentBatch
serialization in Rust and C++
This was referenced Jan 10, 2025
teh-cmc
added a commit
that referenced
this issue
Jan 13, 2025
This introduces `SerializedComponentBatch`, which will become the main type we use to carry user data around internally. ```rust /// The serialized contents of a [`ComponentBatch`] with associated [`ComponentDescriptor`]. /// /// This is what gets logged into Rerun: /// * See [`ComponentBatch`] to easily serialize component data. /// * See [`AsComponents`] for logging serialized data. /// /// [`AsComponents`]: [crate::AsComponents] #[derive(Debug, Clone)] pub struct SerializedComponentBatch { pub array: arrow::array::ArrayRef, // TODO(cmc): Maybe Cow<> this one if it grows bigger. Or intern descriptors altogether, most likely. pub descriptor: ComponentDescriptor, } ``` The goal is to keep the `ComponentBatch` trait isolated at the edge, where it is used as a means of easily converting any data into arrow arrays, instead of simultaneously being used as a means of transporting data around through the internals. `ComponentBatch` is here to stay, if only for its conversion capabilities. This opens a lot of opportunities of improvements in terms of DX, UX and future features (e.g. generics). The two code paths will co-exist for the foreseeable future, until all archetypes have been made eager. * Part of #7245
teh-cmc
added a commit
that referenced
this issue
Jan 13, 2025
This introduces a new `attr.rust.archetype_eager` codegen attribute. When toggled, the associated archetype will now only carry raw Arrow data around, and go through the new eager logging APIs. The attribute has been set on `Points3D`: ![image](https://github.com/user-attachments/assets/cb520e0c-5160-4ff6-b6a3-4bf10b4ac045) Legacy and eagerly-serialized archetypes can co-exist, making it possible to migrate everything incrementally. * DNM: requires #8644 * Part of #7245
teh-cmc
added a commit
that referenced
this issue
Jan 14, 2025
### Related * Part of #7245 ### What Use new eager serialization & update API for transforms. The only breaking change here is that Transform3D is no longer copy, otherwise it's fully compatible. --------- Co-authored-by: Clement Rey <[email protected]>
This was referenced Jan 15, 2025
teh-cmc
pushed a commit
that referenced
this issue
Jan 16, 2025
#8697) ### Related * Part of #7245 ### What What it says on the tin! Commit by commit - first commit does all the easy ones, followed by the trickier ones (just two) Tested by... * mess with tensor * mess with time series in plots example * run `docs/snippets/all/views/timeseries.py` snippet (uses explicit time series) * [x] full check passed
This was referenced Jan 16, 2025
This was referenced Jan 22, 2025
1 task
Wumpf
added a commit
that referenced
this issue
Jan 23, 2025
Wumpf
added a commit
that referenced
this issue
Jan 24, 2025
Wumpf
added a commit
that referenced
this issue
Jan 24, 2025
teh-cmc
pushed a commit
that referenced
this issue
Jan 24, 2025
### Related * sister PR to.. * #8789 * #8785 * #8793 * missed piece of #7245 ### What Ports the Tensor archetype in rust to the new eager serialized interface. Unfortunately this meant I had to remove some direct access methods of the underlying tensor data. Curiously, this didn't affect any of our test/snippet/example code. While doing so I also fixed some wording issues in the (very similar) C++ implementation of `with_dim_names`
teh-cmc
pushed a commit
that referenced
this issue
Jan 24, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
🌊 C++ API
C/C++ API specific
💬 discussion
🪵 Log & send APIs
Affects the user-facing API for all languages
🦀 Rust API
Rust logging API
As we'll soon introduce tagged components and simple multi-datatype components, it gets harder and harder to represent Archetypes (and concrete ComponentBatches) as collection of concrete types.
Let's take the example of a generalized
rotation
component/archetype field which may be represented by various datatypes: we no longer can store concrete types on an archetype and have to type-earse them right away instead.Note that this way C++ and Rust get much closer to the Python SDK in this regard.
This fits very well into our desire to get rid of concrete component types in the SDK languages which today almost always take the form of
struct ComponentType(pub datatypes::TheDataType)
together with myriad of constructors, trait impls and utilities. I.e. a lot of forwarding code.Eager serialization allows us to implement component semantics on archetypes instead with concrete construction methods. E.g.
with_quaternion
andwith_axis_angle
would both populate the multi-datatyperotation
component which gets tagged appropriately.When logging raw component batches/columns this would become more explicit as you're expected to supply a datatype array/collection together with the appropriate component tag (which will still be provided by the SDK, but more in registry fashion rather a
class
/struct
per component). This follows the exact same mechanism of how an archetype construct its internalComponentBatches
.A drawback of this approach is that most accesses of archetypes requires deserialization back into the source datatypes which can be cumbersome in some cases. However, this is what we expect to do when a user reads back data from the store, so this is something that may soon become common-place anyways.
Another nice side effect is that the "ephemeral
rerun::Collection
hazard" goes away as we'd no longer store pointers to user data, making the API a lot safer to use. (rerun::Collection
becomes a pure pass-through type as it should be)rerun::Collection
borrows data too eagerly, making it very easy to cause segfaults & read of invalid data #7081This ticket is a meetup discussion outcome of @jleibs and @Wumpf with some additional input by @emilk
Advantages
f32
orf64
(see https://github.com/rerun-io/rerun/blob/main/design/component_datatypes.md)Related
After #8703 the following types need eager serialization on Rust:
(checked means there's a branch where it's fixed)
The text was updated successfully, but these errors were encountered: