Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing baked data to use zerotrie #5064

Merged
merged 13 commits into from
Jul 4, 2024

Conversation

robertbastian
Copy link
Member

@robertbastian robertbastian commented Jun 14, 2024

No description provided.

@sffc
Copy link
Member

sffc commented Jun 14, 2024

Have you measured impact on binary size and performance?

@sffc
Copy link
Member

sffc commented Jun 16, 2024

While I think we should explore this, there are some factors at play that might make ZeroTrie less compelling for baked data than for blob data:

  1. The compiler is able to deduplicate locale strings across DataProvider lookup tables, which it did not do in blob1
  2. binary_search is a primitive that the compiler could figure out how to optimize on its own, whereas ZeroTrie provides only one algorithm for the lookup

My suspicion is that despite these factors, ZeroTrie will still be faster and smaller overall, although to a lesser degree than it was in blob1 vs blob2. I would however like to see some figures.

@robertbastian robertbastian force-pushed the zerotrie branch 2 times, most recently from 9305cee to 32f6a00 Compare June 18, 2024 16:39
Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ Pending metrics.

@sffc
Copy link
Member

sffc commented Jul 1, 2024

Do you have metrics?

@robertbastian
Copy link
Member Author

Yes in the diffs

@sffc

This comment was marked as outdated.

@robertbastian
Copy link
Member Author

You should review #5161 first, as it says in the description.

provider/baked/tests/data/hello_world_v1_marker.rs.data Outdated Show resolved Hide resolved
utils/zerotrie/src/reader.rs Outdated Show resolved Hide resolved
@@ -628,24 +629,28 @@ impl_zerotrie_subtype!(
ZeroTrieSimpleAscii,
String,
reader::get_iter_ascii_or_panic,
core::iter::Map<reader::ZeroTrieIterator<'_>, fn((Vec<u8>, usize)) -> (String, usize)>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I'm in support of changing the impl Iterator to a concrete type, but I don't want to restrict the concrete type to being a specific map composition like this. Please introduce a new type such as ZeroTrieAsciiIterator or something like that so we can change the internals in the future. Safest would be to have a different type for all four concrete impls.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And these four types redirect all 76 iterator methods to the types they wrap?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, we can clean this up later before zerotrie 1.0

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made a #[doc(hidden)] type alias, should be non-clickable in the docs

@robertbastian robertbastian requested a review from nciric as a code owner July 2, 2024 20:28
@robertbastian robertbastian force-pushed the zerotrie branch 3 times, most recently from 2368556 to ec8a5f7 Compare July 3, 2024 11:22
@robertbastian robertbastian requested a review from sffc July 3, 2024 16:02
sffc
sffc previously approved these changes Jul 3, 2024
Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Praise: Impressive size reductions, and this is a fair comparison when converting TinyStr to ZeroTrie since those are both reference-less structures.

@robertbastian robertbastian requested a review from sffc July 3, 2024 17:12
@robertbastian robertbastian removed the request for review from nciric July 3, 2024 17:41
Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hash map instability was creeping in somewhere?

@robertbastian robertbastian merged commit 4fc37e9 into unicode-org:main Jul 4, 2024
28 checks passed
@robertbastian robertbastian deleted the zerotrie branch July 4, 2024 08:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants