Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve locale detection. #52969

Merged
merged 1 commit into from
Jan 18, 2022
Merged

Conversation

bruvzg
Copy link
Member

@bruvzg bruvzg commented Sep 23, 2021

Instead of using a hard-coded supported locale list and expecting OS locale to match, the engine now attempts to decompose whatever garbage is returned by the OS.get_locale(), infer missing parts and select the best matching locale.

Exposes compare_locales and standardize_locale methods to the scripts.

So far it's tested on macOS only, all supported editor translations seem to be detected correctly.

@akien-mga
Copy link
Member

akien-mga commented Sep 23, 2021

Thanks for working on this, it's been a sore point and the hardcoded list we had was far from exhaustive.

But I wonder - is there really any merit to limit what can be valid locales? In practice we mostly need locales to match translation identifiers, so if someone happens to have a locale of say de_RU (which is not "valid", but there could be German speakers in Russia ending up with something like this), they should simply be matched to de.po/de in the CSV. Or de_DE as fallback if de is missing.

So the only country-specific information we'd need to hardcode IMO is just what's the "main" country code to fall back to if there's no translation matching the exact locale:

  1. Check against exact user locale: de_RU
  2. Check for language code only: de
  3. Check for language code with "main" country: de_DE
  4. Check for any match to language code + anything else: de_<whatever>, first match?

@bruvzg bruvzg force-pushed the locale_detection branch 4 times, most recently from 6a17a75 to e8dad0a Compare September 29, 2021 07:03
@bruvzg bruvzg marked this pull request as ready for review September 29, 2021 07:04
@bruvzg bruvzg requested review from a team as code owners September 29, 2021 07:04
@bruvzg
Copy link
Member Author

bruvzg commented Sep 29, 2021

Now it should work with all supported locales on Linux (tested with the list in /usr/share/i18n/SUPPORTED on Ubuntu 21.04), Windows (https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-lcid/70feba9f-294e-491e-b6eb-56532684c37f) and macOS.

Changes in this PR:

  • Uses separate language, script and country lists (same lists are reused by font import options).
  • Modified standardize_locale function: return locale as "lng_Script_CN_variant" (some parts can be missing), it's doing following actions:
    • replace "-" with "_".
    • removes unsupported script and country codes (OS can add your current country code to the locale).
    • parses extra part of locale string after "@" to detect regional variant (e.g. ca_ES@valencia ) or script (e.g. uz_UZ@cyrillic) and remove the rest (currency and collation tags).
    • adds default values of the script and country, in some cases it might be ambiguous.
    • resolves country and language renames.
    • checks if language code is valid (and fallback to "en" if it's not).
  • Added compare_locales function: returns value from 0 to 10, where 10 is full match (after both strings passed to standardize_locale), 0 is no match (if language code is different the rest is not checked), values in between indicate number of matching elements (including language).
  • Removes is_locale_valid and error messages, since standardize_locale should always return something valid.

Comment on lines 238 to 239
// Additional script information to preferred scripts.
// Language code, script code, default country, supported countries.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific reference you used for those?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same references as language lists.

{ "US", "United States of America" },
{ "UY", "Uruguay" },
{ "UZ", "Uzbekistan" },
{ "VA", "Holy See" },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lol, I never knew this was the official name.

Comment on lines 548 to 551
// - https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes
// - https://lh.2xlibre.net/locales/
// - https://iso639-3.sil.org/
// - https://www.localeplanet.com/icu/index.html
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all of these still used to make this list? If not, it would be good to narrow down to a single source so that we know how to sync potential changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have used multiple sources to make sure every language supported by the OS is in the list, but some links the might be excessive and have the same list, I'll re-check it.

if (split == -1) {
split = p_locale.find("-");
String TranslationServer::standardize_locale(const String &p_locale) const {
// Replaces '-' with '_' for macOS Sierra-style locales.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the reference to "Sierra" still valid?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sierra is not relevant, it's used by all macOS versions.

@akien-mga
Copy link
Member

Looks good to me overall, though I'm not sure if we actually get useful information from a hardcoded list of valid countries for a given language.

That's obviously a decision ISO and various OSes followed so there might be a good reason, but nothing prevents me from using an OS in French while living in Denmark and thus ending up with a fr_DK locale which isn't standard.

Do we do anything specific with the valid country codes information?

@bruvzg
Copy link
Member Author

bruvzg commented Jan 3, 2022

Do we do anything specific with the valid country codes information?

It's used in two places:

  • to ignore non-standard variants,
  • and to generate output of get_all_locales used to add translations in the "Localization Editor" and locale specific overrides for the font import, but this part might be changed to the separate country + language code selector, which is likely more convenient than a single long list anyway.

Not sure if checking for the valid language and country code is necessary at all. Just using locale_scripts and renames to resolve ambiguous cases should be enough.

@bruvzg bruvzg marked this pull request as draft January 4, 2022 10:52
@bruvzg bruvzg force-pushed the locale_detection branch 2 times, most recently from b945923 to a9eec67 Compare January 4, 2022 11:10
@bruvzg
Copy link
Member Author

bruvzg commented Jan 4, 2022

I have removed all "supported" lists to allow custom codes, and added same locale comparison code to resource remaps (need to be tested).

Added custom locale selection dialog for the Locale remap editor.

With language and country lists by default:
loc_ed

And fully editable additional code components as an option:
loc_ed_adv

Locale filters are moved from the locale editor to the selection dialog.

Also added property hint for locale, which is using the same dialog as an editor:
prop_edit

@bruvzg bruvzg force-pushed the locale_detection branch 2 times, most recently from bcd4f13 to b4178cb Compare January 5, 2022 10:39
Use separate language, script and country lists.
Add locale selection dialog and property hint.
@bruvzg bruvzg marked this pull request as ready for review January 18, 2022 12:30
@bruvzg bruvzg requested review from a team as code owners January 18, 2022 12:30
@akien-mga akien-mga merged commit ce2b5bd into godotengine:master Jan 18, 2022
@akien-mga
Copy link
Member

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants