Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VPN-6778: fix English translations for addons w/ shared string #10123

Merged
merged 1 commit into from
Dec 17, 2024

Conversation

mcleinman
Copy link
Collaborator

@mcleinman mcleinman commented Dec 11, 2024

Description

Short explanation: Deep in the translation build scripts, en is treated differently than en-US - it causes the script to think that en-US does not actually have any translations, and thus when switching to en language it disables the addon. (This has not come up before for reasons explained in the essay below.)


How does adding 3 characters allow us to use English addons that have shared strings? (This turned into an essay, sorry.)

  • Looking at and logging the client code for a while, I saw one of the times that we'll enable/disable addons is when the language changes
  • In these cases, it looks at the translations.completeness file to see if the addon is translated in the new language
  • The translations.completeness file for the addons using shared strings is showing 0.0 for en, when it should be showing 1.0
  • The translations.completeness files for addons are created in build.py`
  • From adding some logging to build.py, it's not being given a default of 0.0 or being skipped or anything - it's being calculated as 0.0
  • This calculation comes from xlifftool.py (which is called by build.py)
  • Adding logging to this file's completeness function, we find out it is correctly counting the number of source strings, but misreporting the number of translations as 0
  • I reviewed this output on the "what's new 2.25" message in french (using shared strings, and working), the "what's new 2.25" message in English (using shared strings, and not working), and the "what's new 2.24" message in English (not using shared strings, and working). This validates the miscount of translations.
  • For the 2.24 English (not using shared strings), the dictionary in this function shows the same English string for both the value and target. However in 2.25 English (using shared strings), it shows empty strings for values. This causes the miscount. But why!
  • Looking at the 2.24 addon's strings.xliff file (from the translation repo) and the 2.25 addon's strings.xliff file (built by build.py because it uses shared strings), nothing jumps out - in both cases, neither show target strings, just source strings.
  • Looking into the parse_filegroup function (still in xlifftool.py)... 2.24 English reports itself as a source language, 2.25 French reports itself as a target language... but 2.25 English reports itself as a target language (which is unexpected)!
  • The source/target distinction is based on the reported locale language (an argument given to this file), the target-language in the xliff file, and the source-language in the xliff file.
  • Looking at the headers from the 3 xliff files being fed into xlifftool.py:
    • 2.25 French (comes from i18n repo) - (fr) source-language="en" target-language="fr"
    • 2.24 English (comes from i18n repo) - (en) source-language="en" target-language="en-US"
    • 2.25 English (constructed from the shared strings yaml file) - (en) source-language="en" target-language="en"
  • This is the culprit!
    • In parse_filegroup, if target-language matches the reported locale, it reports the language is the target. Otherwise if the source-language it the reported locale, it reports the language is the source.
    • Thus 2.25 English is being reported as a target language because target-language is checked before source-language is checked, and it matches. Thus, the fact that 2.24 English reports a source language of en and target of en-US isn't just a happy accident, but it is essential to ensuring the translation pipeline works as expected for the base English locale. Somewhere we translate an en to an target of en-US.
    • Where? In VPN l10n repo's extract_source_strings.py - it manually sets the target-language to en-US when ingesting new strings.
  • So, how is the generated xliff file for shared strings getting its en target?
    • build.py takes the shared strings strings.xliff file (from the translation repo) and builds the an addon-specific temporary xliff file for each addon (this generated file should have identical formatting to the legacy addon-specific xliff files from the translations repo)
    • In build.pys transform_shared_strings function, it essentially copies the headers from the shared strings strings.xliff file. (If I manually change the strings.xliff file in 3rdparty/i18n/ to have a target language of en-US, it works as expected - so this is definitely the right path!)
  • Where does the shared string strings.xliff file get its (incorrect) headers?
    • In shared.py's write_en_language, we have this line: ts.set("language", "en”). And if you change that en to en-US the header's source language stays as en but the target language changes to en-US. 🎉🎉🎉

This ts.set("language", "en”) line formerly lived in build.py, and was copied to a new file as part of the initial shared strings work. This line is 2.5 years old, and seemingly never caused an issue until now. I was curious why it hadn't caused an issue until now?

  • In the l10n repo, in a Github action based on update.yaml, 4 sets of translation files are ingested by the l10n repo:
    • mozillavpn.xliff (most of translations for the client) - is built via extract_source_strings.py, which gives the proper en-US
    • extras.xliff (languages names and servers cities/countries) - this is a direct copy, but the bug doesn't arrise. For language names, there is a manual check for the en case in generate_language_names_map.py. For server, the code doesn't directly look at the target-language string, so there isn't an issue. (There is one more spot this file is used that I didn't investigate, as it didn't seem worth the time to dig even deeper here.)
    • a strings.xliff file for each addon (that doesn't use shared strings) - these are built via extract_source_strings.py, which gives the proper en-US
    • a single strings.xliff file for addon shared strings - this is a direct copy from what build.py outputs, and so it keeps those incorrect headers it was initially given... until this PR

From quick testing, this PR doesn't have any negative side effects on the existing client translations.

After this is merged, we'll need the following steps before the bug will appear fixed:

  1. The updated file needs to be ingested by the l10n repo.
  2. The client repo needs to pull this new file back down.
  3. And then, a new addons RC needs to be made.

Reference

VPN-6778

Checklist

  • My code follows the style guidelines for this project
  • I have not added any packages that contain high risk or unknown licenses (GPL, LGPL, MPL, etc. consult with DevOps if in question)
  • I have performed a self review of my own code
  • I have commented my code PARTICULARLY in hard to understand areas
  • I have added thorough tests where needed

@flodolo
Copy link
Collaborator

flodolo commented Dec 12, 2024

This might the highest ratio ever explanation vs code changes ;-)

I never noticed the difference in target-language within the en folder. The only files with en instead of en-US are extras.xliff and addons/strings.xliff. It would be relatively easy to fix that in the workflows (I'm already doing it for the main file).

I tried running automation (string extraction) against this branch, and the only difference is the target-language for addons/strings.xliff.

From quick testing, this PR doesn't have any negative side effects on the existing client translations.

Any risk of strange behavior, especially on mobile where you might have locales like en-IN?

@mcleinman
Copy link
Collaborator Author

This might the highest ratio ever explanation vs code changes ;-)

Sorry for the essay, I probably should have written a summary as well. (For what it's worth, it's probably the highest hours-put-in vs. characters-changed ratio I've ever done as well.)

I never noticed the difference in target-language within the en folder. The only files with en instead of en-US are extras.xliff and addons/strings.xliff. It would be relatively easy to fix that in the workflows (I'm already doing it for the main file).
I tried running automation (string extraction) against this branch, and the only difference is the target-language for addons/strings.xliff.

I don't think changing the extras.xliff files would affect anything, and it doesn't seem like using the en there is causing any issues now. Thus, I went more conservative and didn't touch it. Let me know if you think it should be updated as well.

From quick testing, this PR doesn't have any negative side effects on the existing client translations.

Any risk of strange behavior, especially on mobile where you might have locales like en-IN?

Just tested out en-IN, and the app looked okay.

@flodolo
Copy link
Collaborator

flodolo commented Dec 13, 2024

I don't think changing the extras.xliff files would affect anything, and it doesn't seem like using the en there is causing any issues now. Thus, I went more conservative and didn't touch it. Let me know if you think it should be updated as well.

I will add a step in the l10n repo to make sure the target-language of files in en is always en-US, at least we're consistent.

@mcleinman
Copy link
Collaborator Author

I don't think changing the extras.xliff files would affect anything, and it doesn't seem like using the en there is causing any issues now. Thus, I went more conservative and didn't touch it. Let me know if you think it should be updated as well.

I will add a step in the l10n repo to make sure the target-language of files in en is always en-US, at least we're consistent.

Sounds good, thanks.

Copy link
Collaborator

@oskirby oskirby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Thank you for the detailed notes.

@mcleinman mcleinman merged commit b69cb7e into main Dec 17, 2024
117 checks passed
@mcleinman mcleinman deleted the vpn-6778-fix-missing-english-language branch December 17, 2024 22:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants