-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
coldata: optimize serialization of Bytes
This commit changes how we convert `coldata.Bytes` to and from the arrow format by introducing our own "arrow-like" (which is arrow-compatible) format. We call this "arrow-like" because we're abusing the arrow format to get the best speed, possibly at the cost of increased allocations (when Bytes vector has been modified in-place many times via `Set`s at arbitrary positions with values of different lengths). In particular, the arrow format represents bytes values via two slices - the flat `[]byte` buffer and the offsets where `len(offsets) = n + 1` (where `n` is the number of elements). ith element is then `buffer[offsets[i]:offsets[i+1]`. However, we squash `[]element` and the buffer for non-inlined values into that flat byte slice, and we only need two positions in `offsets` to indicate the boundary between the two as well as the total data length. As a result, we have the following representation (which defeats the spirit of the arrow format but doesn't cause any issues anywhere): ``` buffer = [<b.elements as []byte><b.buffer] offsets = [0, 0, ..., 0, len(<b.elements as []byte>), len(<b.elements as []byte>) + len(buffer)] ``` This increases the conversion by 2-10x, and the full benchmark results are [here](https://gist.github.com/yuzefovich/2474e806663ed5ba8cf31ef8a426962c). Epic: None Release note: None
- Loading branch information
1 parent
107e78c
commit b353d18
Showing
4 changed files
with
130 additions
and
49 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters