-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stabilize the icu_casemap component #3234
Comments
Added a checklist, please append more items to it |
@robertbastian i'm not convinced we should be doing the heavy validation in serde: it's debug-assertions only, and the struct has GIGO behavior if you give it bad data. I think this is the right call, since there's a lot of work in properly validating this data otherwise. There are a couple places that are currently relying on validate that I need to GIGO |
I don't care, as long as we don't do any validation in databake and I can make the constructor |
There shouldn't be right now, if there is you are welcome to remove it |
Useful note for later: the unfold data currently in use in icu4x [("aʾ", "ẚ"), ("ff", "ff"), ("ffi", "ffi"), ("ffl", "ffl"), ("fi", "fi"), ("fl", "fl"), ("h\u{331}", "ẖ"), ("i\u{307}", "İ"), ("j\u{30c}", "ǰ"), ("ss", "ßẞ"), ("st", "ſtst"), ("t\u{308}", "ẗ"), ("w\u{30a}", "ẘ"), ("y\u{30a}", "ẙ"), ("ʼn", "ʼn"), ("άι", "ᾴ"), ("ήι", "ῄ"), ("α\u{342}", "ᾶ"), ("α\u{342}ι", "ᾷ"), ("αι", "ᾳᾼ"), ("η\u{342}", "ῆ"), ("η\u{342}ι", "ῇ"), ("ηι", "ῃῌ"), ("ι\u{308}\u{300}", "ῒ"), ("ι\u{308}\u{301}", "ΐΐ"), ("ι\u{308}\u{342}", "ῗ"), ("ι\u{342}", "ῖ"), ("ρ\u{313}", "ῤ"), ("υ\u{308}\u{300}", "ῢ"), ("υ\u{308}\u{301}", "ΰΰ"), ("υ\u{308}\u{342}", "ῧ"), ("υ\u{313}", "ὐ"), ("υ\u{313}\u{300}", "ὒ"), ("υ\u{313}\u{301}", "ὔ"), ("υ\u{313}\u{342}", "ὖ"), ("υ\u{342}", "ῦ"), ("ω\u{342}", "ῶ"), ("ω\u{342}ι", "ῷ"), ("ωι", "ῳῼ"), ("ώι", "ῴ"), ("եւ", "և"), ("մե", "ﬔ"), ("մի", "ﬕ"), ("մխ", "ﬗ"), ("մն", "ﬓ"), ("վն", "ﬖ"), ("ἀι", "ᾀᾈ"), ("ἁι", "ᾁᾉ"), ("ἂι", "ᾂᾊ"), ("ἃι", "ᾃᾋ"), ("ἄι", "ᾄᾌ"), ("ἅι", "ᾅᾍ"), ("ἆι", "ᾆᾎ"), ("ἇι", "ᾇᾏ"), ("ἠι", "ᾐᾘ"), ("ἡι", "ᾑᾙ"), ("ἢι", "ᾒᾚ"), ("ἣι", "ᾓᾛ"), ("ἤι", "ᾔᾜ"), ("ἥι", "ᾕᾝ"), ("ἦι", "ᾖᾞ"), ("ἧι", "ᾗᾟ"), ("ὠι", "ᾠᾨ"), ("ὡι", "ᾡᾩ"), ("ὢι", "ᾢᾪ"), ("ὣι", "ᾣᾫ"), ("ὤι", "ᾤᾬ"), ("ὥι", "ᾥᾭ"), ("ὦι", "ᾦᾮ"), ("ὧι", "ᾧᾯ"), ("ὰι", "ᾲ"), ("ὴι", "ῂ"), ("ὼι", "ῲ")] |
Should #3552 be a stabilization blocker? Feels like we can count it as a known bug. |
Also, ICU4C supports Also |
The
I also think this can be designed in a later release. |
question (@sffc , @robertbastian ): Do the current function names look good to you?
I somewhat feel like the stringy one should be the one with the shorter name, and the char one should be |
A change we should make is move the locale from the constructor to the methods. We can make them all take And how should it look over FFI (where locales are not free to instantiate). I guess we can expose that enum. Thoughts? |
The simple mappings and foldings (which you should practically never use unless you have specific compatibility requirements) having shorter and more default-looking names than the default ones seems like a bad idea. I would favour renaming all of the char-to-char functions to have
Something like that would be a good idea. It is very weird that the case foldings look like they depend on the locale, which they do not and must not. |
Maybe What is your idea for |
We might actually want to accept |
It's what's already used internally, it's a simple enum: pub enum CaseMapLocale {
Root,
Turkish,
Lithuanian,
Greek,
Dutch,
Armenian,
} and it basically covers the different casemapping special-case modes. ICU4C's API consumes a Locale; but it does feel potentially faster to not require a conversion each time, and exposing something that is This way we only require the actual subpart of the locale the algorithm cares about, instead of having clients treat it opaquely. |
Copying over Markus' feedback from the ICU4X team meeting notes
I do think I'm going to go along the path of doing titlecase with a segmentation trait. |
If I had to choose between full-string titlecase and Greek uppercase, I think Greek uppercase is more important since it is about i18n correctness. The Locale thing reminds me a lot of what happens with Collation and Segmentation tailorings. |
I've copied the actionable bits of Markus' feedback, as well as stuff discussed here, into the issue above. I'm not sure if it's either-or: full-string titlecase isn't that tricky, whereas Greek uppercase seems to involve reimplementing half of the uppercasing algorithm, without any spec for reference. It's a lot more work. |
Discussions to have:
|
Discuss with: |
Discussion: CaseMapLocale:
Titlecasing:
Simple case mapping:
Greek uppercasing:
Agreed: @Manishearth @sffc @robertbastian @eggrobin |
Graduation checklist
|
Only remaining stabilization blocker is Shane (or someone else) and I should go through everything. (and then move the folder) |
|
I added the new checkboxes from #3693 to the comment above. A few things I notice:
|
Yeah I was planning to do that later. But I'll just roll it into #3803 |
Final checkbox checked by #3843 We're done! |
This issue tracks the work to release icu_casemap as a stable component.
Checklist (not exhaustive)
full_fold()
functions should work with Writeable (More casemapping fixes #3544)no_std
(More casemapping fixes #3544)deserialize
to allow validation-free databaketitlecase_segment()
(Add titlecase_segment and dutch titlecasing support #3593)TitlecaseMapper<impl AsRef<CaseMapper>>
type, ctors:new()
,new_legacy()
, and then the same ones_with_mapper
Add TitlecaseMapper type #3779TitleCaseOptions
, fieldsHeadAdjustment { #[default] Adjust, NoAdjust }
,TailCasing { #[default] Lowercase, PreserveCase }
Add TitlecaseMapper type #3779titlecase_segment_legacy()
on CaseMapper if wanted Add TitlecaseMapper type #3779CaseCloser
wrapper for case closure things, similar ctors (Split unfold functionality into separate type #3759)The text was updated successfully, but these errors were encountered: