Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove AsRef and instead introduce Cow-returning canonicalize methods on locale/langid #5727

Merged
merged 6 commits into from
Oct 24, 2024

Conversation

Manishearth
Copy link
Member

Fixes #2748

@Manishearth Manishearth force-pushed the locale-langid-canonicalize branch from ea8ce7b to 1ab3fde Compare October 23, 2024 18:58
@Manishearth Manishearth requested a review from a team as a code owner October 23, 2024 18:58
@Manishearth Manishearth removed request for a team and nciric October 23, 2024 18:59
if let Ok(s) = core::str::from_utf8(input) {
Ok(s.into())
} else {
Ok(cow.into_owned().into())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is unreachable. try_from_utf8 succeeding should allow you to justify from_utf8_unchecked

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be extremely bad unsafe code hygeine to rely on the behavior of a complicated parse function spread into multiple files across the crate. We should have that justification only if we are willing to document try_from_utf8 as guaranteed to fail for invalid UTF8, and add safety comments throughout it.

The justification of the bytes being equal is an easier one IMO. I'm just not sure if the unsafe is worth it.

I've gone ahead and added unsafe justified by byte equality.

components/locale_core/src/langid.rs Outdated Show resolved Hide resolved
if let Ok(s) = core::str::from_utf8(input) {
Ok(s.into())
} else {
Ok(cow.into_owned().into())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be extremely bad unsafe code hygeine to rely on the behavior of a complicated parse function spread into multiple files across the crate. We should have that justification only if we are willing to document try_from_utf8 as guaranteed to fail for invalid UTF8, and add safety comments throughout it.

The justification of the bytes being equal is an easier one IMO. I'm just not sure if the unsafe is worth it.

I've gone ahead and added unsafe justified by byte equality.

robertbastian
robertbastian previously approved these changes Oct 23, 2024
@Manishearth Manishearth merged commit d52f411 into unicode-org:main Oct 24, 2024
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Locale/LanguageIdentifier canonicalize can return Cow<str> instead of String
3 participants