Add more accented characters decomposition #3838
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hello!
Hopefully I've done everything right despite knowing next to nothing about C or Java.
The PR addresses mostly Polish language (which uses chars like "ż/Ż", "ą/Ą", "ę/Ę"), but I have included every available character in Windows' charmap which had either "dot above" or "ogonek", respectively (the terminology came from charmap, but "ogonek" literally means "a little tail" and is actually used by the Polish people to describe those two letters).
In the class I interjected the
dotabove
above umlaut since I was trying to preserve the UTF order. Below, in private static and mapping itself, I just added it to the end, so the grouping would hopefully make sense.There is a nonsensical sentence
ZAŻÓŁĆ GĘŚLĄ JAŹŃ
used to test if a keyboard layout, a program, etc. can display all Polish diacritics. Right now, it ends up asZAÓĆ GŚL JAŹŃ
. After this PR is merged, it should be almost correct -ZAŻÓĆ GĘŚLĄ JAŹŃ
, missing only theŁ
character. Sadly,Ł
itself is impossible to decompose because of no UTF chars exist for "connecting upwards stroke", so "ł/Ł" seems to be out of reach until this gets added (which it probably won't). I know "connecting short stroke" and "connecting long stroke" exist, but even if they did work (because I was not able to make them work properly), it would still be a different character (L
+-
!=Ł
).Since there is addition, but no change or removal of anything, this should merge nicely.