-
-
Notifications
You must be signed in to change notification settings - Fork 411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] transliterating all-caps strings ends up with mixed case #675
Comments
Apologies for the delay -- I've been away on leave. When I run your code I see not quite your expected string, but at least all characters are uppercase:
You mention:
Which translations file? You did not supply this in your original message. Could you please supply the file that you're talking about here? |
@radar my apologies, I am using i18n-rails which includes some transliteration rules for all kinds of languages. The main problem here is that some characters will end up being transliterated as two characters. Here is a full working example for the first option that we currently have, storing capitalized versions of the transliterated characters, which is what rails-i18n does: # frozen_string_literal: true
require 'i18n'
I18n.config.enforce_available_locales = false
I18n.locale = :de
# capitalized transliterations, work only for capitalized words
I18n.backend.store_translations(
:de,
i18n: {
transliterate: {
rule: {
'ä' => 'ae',
'é' => 'e',
'ü' => 'ue',
'ö' => 'oe',
'Ä' => 'Ae',
'Ü' => 'Ue',
'Ö' => 'Oe',
'ß' => 'ss',
'ẞ' => 'SS'
}
}
}
)
puts I18n.transliterate('KANÜLE') # => 'KANUeLE' (bad)
puts I18n.transliterate('FUẞBALL') # => 'FUSSBALL' (good, ẞ is by definition only used for all caps)
puts I18n.transliterate('Überfall') # => 'Ueberfall' (good) As mentioned before, switching to all-caps versions will not help because then we would break the cases where we actually want capitalized versions such as the last example: # frozen_string_literal: true
require 'i18n'
I18n.config.enforce_available_locales = false
I18n.locale = :de
# all caps transliterations, work only for all caps words
I18n.backend.store_translations(
:de,
i18n: {
transliterate: {
rule: {
'ä' => 'ae',
'é' => 'e',
'ü' => 'ue',
'ö' => 'oe',
'Ä' => 'AE', # all caps now
'Ü' => 'UE', # all caps now
'Ö' => 'OE', # all caps now
'ß' => 'ss',
'ẞ' => 'SS'
}
}
}
)
puts I18n.transliterate('KANÜLE') # => 'KANUELE' (good)
puts I18n.transliterate('FUẞBALL') # => 'FUSSBALL' (still good)
puts I18n.transliterate('Überfall') # => 'UEberfall' (bad) |
My 2 cents on the topic as a passing observer... Either of your configurations above will be sufficient for the majority of use cases, but they are only approximations. A comprehensive solution cannot be a straightforward "find and replace"; it would need to look at the surrounding context of words. From the documentation, This library does not, currently, define or maintain transliteration rules across different locales. It simply supports flexible configuration options. |
What I tried to do
I want to transliterate an all-caps string
What I expected to happen
I expect all resulting characters to be capitalized
#=> "KANUELE"
What actually happened
The resulting characters are mixed case
#=> "KANUeLE"
Simply changing the entries in the translations file to
"Ü": "UE"
works for this case, but then of course mixed case words will be transliterated in a wrong manner:I would expect a solution that can handle both cases gracefully.
Versions of i18n, rails, and anything else you think is necessary
All versions of i18n
The text was updated successfully, but these errors were encountered: