Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serbian Cyrillic to Serbian Latin transliteration (and possibly vice versa?) #9702

Open
Susexe opened this issue Jan 26, 2024 · 5 comments
Open
Labels
🌐 i18n Regarding software localization Serbian Translations We use a non-standard version of GetText, lack language variants support translate.openfoodfacts.org 🌐 Translations

Comments

@Susexe
Copy link

Susexe commented Jan 26, 2024

Problem

Currently, Serbian translations in the "ingredients.txt" file are a mess. There's no standardized way of translating them to Serbian; it's a mix of Cyrillic and Latin words under the "sr" language code across different entries. There are also inconsistent language codes: "sr-la" and "sr-el" are both used for Serbian Latin, although those strings are commented. Similarly, "sr" and "sr-ec" are both used for Serbian Cyrillic.

Proposed solution

My suggestion is to only enter Serbian Cyrillic translations of ingredients, additives, etc. under the "sr" language code and create an automatic Serbian Cyrillic-to-Latin conversion. Why Cyrillic to Latin, you may ask? Well, Serbian Latin contains so-called digraphs, which make the transliteration prone to mistakes. Serbian Cyrillic has none of them: NJ is Њ, LJ is Љ, and colloqually DJ (Đ) is Ђ, which guarantees Cyrillic to Latin conversion is 100% correct. The opposite is problematic, though there are some tools that include exception dictionaries (the majority of English words would probably still be transliterated). The tool could potentially be tested for product search too, as the majority of products are in Latin and cannot be found when Cyrillic is entered in the search.

@teolemon
Copy link
Member

We have some custom mappings in the Crowdin config we need to sort out, to see if they are helping or harming:
sr-CS: sr_CS
sr: sr_RS
https://github.com/openfoodfacts/openfoodfacts-server/blob/main/crowdin.yml

@alexgarel
Copy link
Member

@Susexe do you think we can do a single script to transform the current taxonomies for that and then add a check on taxonomy change to avoid going back to ascii ?

@Susexe
Copy link
Author

Susexe commented Jan 29, 2024

"CS" was the former country code for the state union of Serbia and Montenegro until it was dissolved in 2006 and Serbia and Montenegro became separate countries. Some time ago, Microsoft deprecated the codes "sr-Cyrl-CS" and "sr-Latn-CS" in favor of "sr-Cyrl-RS" and "sr-Latn-RS". I'm not sure what Crowdin uses internally, but if it's "sr-CS", then it's misleading. If you ask me, "sr" for Cyrillic (since it's the official script in Serbia) and "sr-Latn" for Latin are the way to go.

I wanted to manually update the taxonomies anyway since they're inconsistent, and my preferred script would be Cyrillic for the reasons I mentioned above.

@Susexe
Copy link
Author

Susexe commented Feb 8, 2024

I've just realized MediaWiki is deprecating sr-ec and sr-el codes, which is great news. Google uses sr and sr-Latn.

@alexgarel
Copy link
Member

@Susexe, yes, we sadly don't handle language variants in taxonomies, yet !

I think it's fair to consider Cyrillic here as there are more speakers writing this way.

So feel free to edit taxonomies to move entries to Cyrillic. (see https://wiki.openfoodfacts.org/Global_taxonomies and https://wiki.openfoodfacts.org/Taxonomy_Maintenance)

@teolemon teolemon added Serbian 🌐 i18n Regarding software localization Translations We use a non-standard version of GetText, lack language variants support translate.openfoodfacts.org 🌐 Translations labels Apr 1, 2024
@teolemon teolemon moved this to To discuss and validate in 🍊 Open Food Facts Server issues Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🌐 i18n Regarding software localization Serbian Translations We use a non-standard version of GetText, lack language variants support translate.openfoodfacts.org 🌐 Translations
Projects
Status: To discuss and validate
Development

No branches or pull requests

3 participants