Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch deserialization to an iterator-based interface to avoid allocations #462

Closed
1 task done
piodul opened this issue Jun 24, 2022 · 2 comments · Fixed by #1119
Closed
1 task done

Switch deserialization to an iterator-based interface to avoid allocations #462

piodul opened this issue Jun 24, 2022 · 2 comments · Fixed by #1119
Assignees
Labels
API-breaking This might introduce incompatible API changes area/deserialization cpp-rust-driver-p0 Functionality required by cpp-rust-driver performance Improves performance of existing features
Milestone

Comments

@piodul
Copy link
Collaborator

piodul commented Jun 24, 2022

Currently, after receiving a Rows response from the database, the driver unconditionally parses the row data into an equivalent of Vec<Vec<Option<CqlValue>>. This results in at least N + 1 allocations (where N is the number of rows) and possibly more if there are some composite CqlValues in the result such as collections or UDTs.

The most common way to interpret the data - I suppose it is the most common way, at least - is to convert the rows into some typed representation before consuming them. For example, if the data is fetched from a table which has three int columns, then they can be converted to (i32, i32, i32). Note that this representation contains all data in-line without any allocation, so Vec<Option<CqlValue>> is quite a costly intermediate representation.

I propose to cut the middleman out and adjust our deserialization framework to avoid allocations. Instead of eagerly materializing all rows into a vector, let's keep them in an unserialized form and introduce an iterator interface to avoid allocations.

Here are some ideas on how the new interface could look like:

pub struct QueryResult {
    pub rows: Option<RawRows>,
    // We will probably need to include information about the result type
    // ...
}

// RawRows represents raw, unserialized rows
struct RawRows {
    frame_data: Bytes,
    row_data_offset: usize,
    row_data_size: usize,
    // ...
}

impl RawRows {
    pub fn iter(&self) -> ResultIterator { /* ... */ }
}

// Iterates over rows in result
struct ResultIterator<'rows> { /* ... */ }

impl<'rows> Iterator for ResultIterator<'rows> {
    type Item = RowIterator<'rows>;
}

// Iterates over columns in a row
struct RowIterator<'row> { /* ... */ }

impl<'row> Iterator for RowIterator<'row> {
    type Item = &'row [u8];
}

// Helper iterators for consuming compound types
// We can consider typed variants of those, i.e. such that they automatically convert items before returning
struct SequenceIterator<'column> { /* ... */ } // Item = &'column [u8], could probably be used for lists and sets
struct MapIterator<'column> { /* ... */ } // Item = (&'column [u8], &'column [u8])
struct UDTIterator<'col_type, 'column> { /* ... */ } // Item = (&'col_type str, &'column [u8])

We should also introduce a new trait which allows deserializing values directly from a byte slice:

// The lifetime in the trait definition will allow us to return types which borrow from the data,
// will be useful e.g. for MapIterator
trait WipName<'cell> {
    fn deserialize<'cell>(col_type: &ColType, data: &'cell [u8]) -> Self;
}

The ColType describes the type of the column and may aid in deserialization. It will be absolutely necessary if we implement the deserialization trait for CqlValue, but may be useful for UDTIterator as well. It is meant as a placeholder here, we can probably look over the existing types in the driver and choose something.

For existing types which implement FromCqlValue, we can try introducing an automatic impl impl<T> WipName<'static> for T where T: FromCqlValue. This is a good starting point for this task. Because of that I think it won't be possible to define FromCqlValue and WipName on the same type, but I don't think it is a big problem. After some time, we can consider deprecating and removing FromCqlValue.

We should introduce procedural macros for WipName as we already have for FromCqlValue.

Doing all of this requires to introduce a breaking change by modifying QueryResult. However, I suspect that not many users are using QueryResult::rows directly and rather they are using typed API - in its case, I think it is possible to preserve the API as it is without compromising on the allocation counts, just change how it works under the hood. For those which really need the old Vec<Vec<Option<CqlValue>>, we can introduce helper methods so that they convert the data on demand.

As this is a change which will break much of the existing code, we should make it possible for users to keep using the old way of deserializing things. It should be possible to convert the raw rows into Vec<Vec<Option<CqlValue>> and then impose types using the FromCqlValue and its friends. I think it is OK if using the old way requires some adjustments when upgrading, e.g. requiring users to add some method calls, provided that such changes are easy to make. After some time, we can deprecate those methods as well.

Tasks

  1. 5 of 5
    wprzytula
@mykaul
Copy link
Contributor

mykaul commented Mar 23, 2023

@avelanarius - is this complete for 0.8? If not, please re-target to 0.8.1 or 0.9.

@wprzytula
Copy link
Collaborator

Ref: #571

@avelanarius avelanarius modified the milestones: 0.13.0, 0.14.0 Apr 30, 2024
@wprzytula wprzytula added area/deserialization cpp-rust-driver-p0 Functionality required by cpp-rust-driver labels Jul 9, 2024
@wprzytula wprzytula modified the milestones: 0.14.0, 0.15.0 Aug 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API-breaking This might introduce incompatible API changes area/deserialization cpp-rust-driver-p0 Functionality required by cpp-rust-driver performance Improves performance of existing features
Projects
None yet
5 participants