Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moving AuxiliaryKeys to DataRequest as DataKeyAttributes #4981

Merged
merged 27 commits into from
May 31, 2024

Conversation

robertbastian
Copy link
Member

@robertbastian robertbastian commented May 31, 2024

Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the changes in icu_provider, icu_provider_blob, and parts of icu_datagen. I assume the rest of the changes are mechanical.

Praise: nice work! Thanks for keeping the output data the same in order to make this a more standalone change.

provider/core/src/request.rs Show resolved Hide resolved
Comment on lines +590 to +592
/// TODO
#[derive(Clone, Default)]
pub struct DataKeyAttributes {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO (here and elsewhere)

provider/core/src/marker.rs Show resolved Hide resolved
provider/core/src/datagen/mod.rs Show resolved Hide resolved
key_attributes: &DataKeyAttributes,
) -> Result<bool, DataError> {
self.supported_requests()
.map(|v| v.contains(&(locale.clone(), key_attributes.clone())))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought/Issue: This is a hot path and it's a pity you need to clone; there should be a way to do the lookup without cloning.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes there should be...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could work with HashMap<DataKeyAttributes, HashSet<DataLocale>> instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's some discussion of options in https://stackoverflow.com/questions/45786717/how-to-implement-hashmap-with-two-keys

Doubly-nested HashMap is interesting, but that's a helluva lot of hashmaps...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, since you need this tuple everywhere, maybe just export a helper struct called LocaleWithAttributes (a standalone helper used in IterableDataProvider but not DataRequest) with all the impls you need.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I do this in a follow-up? I don't want this to get stale

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up is okay

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

provider/blob/src/export/blob_exporter.rs Outdated Show resolved Hide resolved
provider/blob/src/blob_schema.rs Show resolved Hide resolved
@robertbastian
Copy link
Member Author

Please also look at icu_datetime, and have a really close look at icu_datagen. I cannot figure out why the locale/aux set for datetime keys is changing.

@robertbastian robertbastian requested a review from sffc May 31, 2024 19:24
components/experimental/src/transliterate/compile/mod.rs Outdated Show resolved Hide resolved
provider/datagen/src/transform/cldr/datetime/neo.rs Outdated Show resolved Hide resolved
Comment on lines +114 to +119
write!(&mut path_buf, "/{key}").expect("infallible");
write!(&mut path_buf, "/{locale}").expect("infallible");
if !key_attributes.is_empty() {
write!(&mut path_buf, "-x-{}", key_attributes as &str).expect("infallible");
}
write!(&mut path_buf, ".{}", self.manifest.file_extension).expect("infallible");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought for later: we probably want to have fs provider create another directory layer instead of using -x-

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, we can improve the representation for all providers

key_attributes: &DataKeyAttributes,
) -> Result<bool, DataError> {
self.supported_requests()
.map(|v| v.contains(&(locale.clone(), key_attributes.clone())))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's some discussion of options in https://stackoverflow.com/questions/45786717/how-to-implement-hashmap-with-two-keys

Doubly-nested HashMap is interesting, but that's a helluva lot of hashmaps...

provider/core/src/request.rs Show resolved Hide resolved
provider/blob/src/export/blob_exporter.rs Outdated Show resolved Hide resolved
@sffc
Copy link
Member

sffc commented May 31, 2024

Please also look at icu_datetime, and have a really close look at icu_datagen. I cannot figure out why the locale/aux set for datetime keys is changing.

The difference is in the deduplicate_payloads function.

The new code deduplicates more aggressively; for example, it finds the following match that the old code did not find:

2024-05-31T22:05:21.385Z TRACE [icu_datagen::driver] Deduplicating datetime/patterns/buddhist/date@1/"l"/en-GB (inherits from en-001)

I'm trying to investigate why.

@sffc
Copy link
Member

sffc commented May 31, 2024

I added more printing to the old code, and I get this message

2024-05-31T22:25:00.908Z TRACE [icu_datagen::driver] Not a patch: datetime/patterns/buddhist/date@1/en-GB-x-l (does not match en-x-l)

I don't know why it thinks en-GB falls back to en...

@sffc
Copy link
Member

sffc commented May 31, 2024

Ok, it appears to be a bug in the LocaleFallbacker where it doesn't ignore the aux keys when it should, which is effectively fixed by removing aux keys from DataLocale. #4983

@robertbastian
Copy link
Member Author

Thanks for investigating.

@robertbastian robertbastian requested a review from sffc May 31, 2024 23:18
@@ -66,7 +66,7 @@ impl WeekCalculator {
&provider.as_downcasting(),
DataRequest {
locale,
metadata: Default::default(),
..Default::default()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought: using ..Default::default() means that if we ever refactor DataRequest again, we won't get compiler errors for missing fields, which I think was an important part of building this PR.

Copy link
Member Author

@robertbastian robertbastian May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't really need or use that. Searching for DataRequest { is exhaustive because there are no constructors.

@robertbastian robertbastian merged commit 0ab2630 into unicode-org:main May 31, 2024
28 checks passed
@robertbastian robertbastian deleted the attrs branch June 6, 2024 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants