Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add VarZeroLengthlessSlice, use in MultiFieldsULE to optimize #5593

Merged
merged 10 commits into from
Sep 25, 2024

Conversation

Manishearth
Copy link
Member

@Manishearth Manishearth commented Sep 24, 2024

Fixes #5378, #5240

@Manishearth Manishearth requested review from sffc and a team as code owners September 24, 2024 21:16
Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deceptively simple

Comment on lines +464 to +465
let len = self.fields.len();
if len == 1 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Praise: Not many changes in this file making it easy to review

/// # Panics
///
/// Panics if the buffer is not exactly the correct length.
pub fn write_serializable_bytes<T, A, F>(elements: &[A], output: &mut [u8])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought: It might be nice to refactor this code into an EncodeAsVarULE impl, which has well-defined safety requirements and reduces indirection (not now, maybe later)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure that makes sense, I think this is the thing that underlies the EncodeAsVarULE infrastructure. But maybe?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is VarZeroSlice, which is one layer above EncodeAsVarULE, I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is VarZeroVecComponents; which is extremely low level. components.rs is basically where all of the actual VZV format algorithms live.

utils/zerovec/src/varzerovec/lengthless.rs Outdated Show resolved Hide resolved
utils/zerovec/src/varzerovec/lengthless.rs Outdated Show resolved Hide resolved
utils/zerovec/src/varzerovec/lengthless.rs Show resolved Hide resolved
utils/zerovec/src/varzerovec/lengthless.rs Outdated Show resolved Hide resolved
@Manishearth Manishearth requested a review from sffc September 25, 2024 07:54
@Manishearth Manishearth merged commit ff9f691 into unicode-org:main Sep 25, 2024
28 checks passed
@Manishearth Manishearth deleted the vzv-lensplit branch September 25, 2024 17:55
Manishearth added a commit that referenced this pull request Sep 25, 2024
It is impossible for an IndexN array to need more than a length integer
of size N, anyway, the max index is always `>=` the length.


Part of #5523

Builds on #5593


We could in theory just have a `VZVFormatCombo<Index, Len>` type that
allows free selection, however I'm trying to keep this minimal. Overall
the main use case for that is picking things like "a small array of
;argely-sized elements" and we could just expose Index16Len8 for that. I
can see that being useful for things like
#5580, though it also feels
like a data microoptimization.


The "total" lines in fingerprints.csv are interspersed in giant diffs,
and this basically only gets a max of 2-6 byte wins per data, but the
overall data size went down by 200KB. Not amazing, not terrible.

```rust
[18:26:22] मanishearth@manishearth-glaptop2 ~/dev/icu4x/provider/data ^_^ 
$ rg total | awk '{ gsub(/B,/, "", $3); s +=$3} END{print s}' 
23501369
[18:26:08] मanishearth@manishearth-glaptop2 ~/dev/icu4x/provider/data ^_^ 
$ rg total | awk '{ gsub(/B,/, "", $3); s +=$3} END{print s}' 
23391499
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Better abstractions for splitting lengths out from VarZeroVec
2 participants