-
-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] sanitize usage of gettext #1156
Conversation
Wow, this is amazing. Thanks for the detailed explanation. When I first started sorting out the translations, I read about I copied all the "Plural-Forms" settings from the Gnu translation site, so they should be correct.
Well spotted.
I don't know, but as far as I can tell, the
I think we should keep the sig_list name, but change the message to:
I don't think so. The places that use tagged messages tend to do an operation on one email, many times. Fortunately, in this case we can use the There are probably more cases like this where we need to add a count parameter to functions. |
3368831
to
1575e88
Compare
Ready for review. Besides some (more or less) trivial changes I needed to rewrite parts of the output routine for attachment headers (these There is also a small code snippet which counts messages ( I converted some of the more complex "gluing strings" to individual (complete) strings. Sometimes these "gluing strings" are format strings for a In case you speak another language sufficiently fluent (no expert level needed), you can check the merged translations and remove the
I added the signal number to the string and adapted the translations by a simple search and replace approach: |
I've spent a few hours looking over this. |
The corresponding code to the comment was changed in 49a037a.
Document the '%s' parameter for translators.
Document that the choice string (which contains the characters for the choice) must match up with the translated string of the question.
Use _() directly instead of N_() followed by _() for strings where this is applicable, i.e. s = N_("string"); printf("...", _(s)); becomes s = _("string") printf("...", s);
Translatable strings should not be split in the middle but be one self-contained and complete unit of thought. In particular do not split them arbitrary at the end of a sentence. Replace s1 = _("sentence 1."); printf("%s sentence2.", s1); with s = _("sentence 1. sentence 2."); printf("%s", s);
Add a comment for translators that the strings must be padded with spaces. Also do actually translate the strings (using '_()') and not only mark them for translation with 'N_()'.
Avoid the antipattern printf(n == 1 ? _("one thing") : _("%d things"), n); instead use printf(ngettext("one thing", "%d things", n), n); (note: n has to be passed to ngettext and printf) Although most Germanic languages (including English) use the singular form for n=1 and a (single) plural form for all other n (including n=0), this is not the case in general. An example from the GNU gettext manual [0] for Polish work plik (file): 1 plik 2,3,4 pliki 5-21 pliko'w 22-24 pliki 25-31 pliko'w When using numbers in translatable strings, do not choose the translated plural form in the code. Instead let GNU gettext capability pick the correct translated plural (provided by a translator) depending on the number. [0] https://www.gnu.org/software/gettext/manual/html_chapter/gettext.html#Plural-forms
Convert some occurrences of printf("%d things", n); to printf(ngettext("%d thing", "%d things", n), n);
Choose the correct plural depending on the number of messages handled. Some choices are between one and not one as the precise number of messages is not known. A clarifying comment is added for the translator in these cases.
Split up the horrible construction of choosing the plural and gluing translations together. This should make the job of a translator easier. Gluing translations is printf(_("Decode-save%s to mailbox"), h ? _("") : _(" tagged")); instead of the better if (h) printf(_("Decode-save to mailbox")); else printf(_("Decode-save tagged to mailbox"));
Do not use this antipattern: printf(_("some %s sentence."), b ? _("big") : _("small")); as the translation of "big"/"small" and depends on the sentence it is used in. Instead give the translator the complete sentence: if (b) printf(_("some big sentence.")); else printf(_("some small sentence."));
Do not use this antipattern: printf(_("some %s sentence."), b ? _("big") : _("small")); as the translation of "big"/"small" and depends on the sentence it is used in. Instead give the translator the complete sentence: if (b) printf(_("some big sentence.")); else printf(_("some small sentence."));
Pass the complete string "This %s/%s attachment has been deleted on %s" to gettext and not single parts gluing them together afterwards. Although it is tempting to toggle optional sentence parts at the end, this is a bad idea as this might not work in all languages since these might have special rules (compare "manner before place before time" rule for adverbs in the end-position or subject-verb-object order in English).
Pass the complete string to the translator instead of chopping it into pieces and translate every piece.
Pass the complete string to the translator instead of chopping it into pieces and translate every piece.
Pass the complete string to the translator instead of chopping it into pieces and translate every piece.
Also include the signal number in the output.
Do not chop the help string into multiple parts instead pass the complete string on for translation. Reverts partly 9b7faee.
This partly reverses commit 9b7faee.
No changes to messages
Translations for the strings "s1" and "s2" which are now "s1 s2" were concatenated except for the cases where a translation of s1 or s2 was not present.
We merged the two translations of the singular message and plural messages into one translation. If the language has more than one plural the old plural translation is used for any plural form of the language. This might not be accurate but more precise than leaving the plural completely untranslated. All the new places were marked as fuzzy.
Combine the separated singular and plural translations into a single translation. For languages with more than a single plural form, the old plural form is used in any case. This might not be correct but is better than nothing (it also is the current behaviour).
Combine the separated singular and plural translations into a single translation. For languages with more than a single plural form, the old plural form is used in any case. This might not be correct but is better than nothing (it also is the current behaviour).
Merge the old translation of "Character set changed to %s; %s." together with the translations of "not converting" or "converting" respectively, by replacing the second '%s' with the translated string. The resulting translation might not be correct but it is better than nothing. Although it is the current translation by NeoMutt as this merge was done previously in the C code.
Join the old translations of "[-- Alternative Type #%d: " "[-- Type: " "%s/%s%s%s, Encoding: %s, Size: %s --]\n" together to the translations of "[-- Alternative Type #%d: %s/%s%s%s, Encoding: %s, Size: %s --]\n" "[-- Type: %s/%s%s%s, Encoding: %s, Size: %s --]\n" Some notes: * Some translators translated "%s/%... ---]" with "[-- %s/%... --]" (note the leading "[--" in the translation). Thus, the new translation is "[-- .. [-- ... --]" which is kinda odd but still the same string as would previously be displayed to the user. * The new string "[-- Alternative Type #%d: ... --]" has an additional parameter "%d". Some translators ignored it in their translation of the old "[-- Alternative Type #%d: " (by not translating it), so that the index for the '%s' would be off. This applies to the majority of translations. However, most of them are out of date and do not have enough '%s', to event print the old string correctly, i.e. "A %s%s B %s" was translated with "A %s B %s". We ignored those languages as a translator must take care of this. There were two exceptions, however, Polish (pl) and Turkish (tr). We used the untranslated version of "[-- Alternative Type #%d: " and "[-- Type: " and appended it to the start of the translation.
The old translation was for "[-- Attachment #%d" and was extended in the C code by either " --]" or ": %s --]". Now we translate the complete string. Update the po files to reflect this. Reuse the old translation and append " --]" or ": %s --]" to it, respectively.
The old translations where build together in the C code. Now we pass the complete string to gettext. Update the po files and combine the old translations into new ones. In the old version the strings translated where "[-- This %s/%s attachment " "(size %s bytes) " "has been deleted --]\n" "[-- on %s --]\n" The four new strings are "[-- This %s/%s attachment (size %s bytes) has been deleted --]\n" "[-- on %s --]\n" "[-- This %s/%s attachment (size %s bytes) has been deleted --]\n" "[-- This %s/%s attachment has been deleted --]\n "[-- on %4$s --]\n" "[-- This %s/%s attachment has been deleted --]\n"
Combine the old translations of the chopped strings s1 = "[-- This %s/%s attachment is not included, --]\n" s2 = "[-- and the indicated external source has --]\n" "[-- expired. --]\n" into one translation for the single string s_new = "[-- This %s/%s attachment is not included, --]\n" "[-- and the indicated external source has --]\n" "[-- expired. --]\n" Four translators (Galician (gl), Lithuanian (lt), Slovak (sk), Taiwanese Mandarin (zh_WT)) seemed to have translated the old s2 as "s1 s2", so the translation of s_new would have a doubling at the start. In these versions we modified the translation to not have this doubling.
Combine the old translations of the chopped strings s1 = "[-- This %s/%s attachment is not included, --]\n" s2 = "[-- and the indicated access-type %s is unsupported --]\n" into one translation for the single string s_new = "[-- This %s/%s attachment is not included, --]\n" "[-- and the indicated access-type %s is unsupported --]\n" Four translators (Galician (gl), Lithuanian (lt), Slovak (sk), Taiwanese Mandarin (zh_WT)) seemed to have translated the old s2 as "s1 s2", so the translation of s_new would have a doubling at the start. In these versions we modified the translation to not have this doubling.
Combine the old translations of the chopped strings "[-- This is an attachment "] "[-- %s/%s is unsupported "] "(use '%s' to view this part)"] "(need 'view-attachments' bound to key!)"] into a translation for the following six strings. These strings have also a " --]\n" appended at the end. "[-- This is an attachment (use '%3$s' to view this part) --]\n" "[-- %s/%s is unsupported (use '%s' to view this part) --]\n" "[-- This is an attachment (need 'view-attachments' bound to key!) --]\n" "[-- %s/%s is unsupported (need 'view-attachments' bound to key!) --]\n" "[-- This is an attachment --]\n" "[-- %s/%s is unsupported --]\n"
The original string was changed from "Caught signal %s... Exiting.\n" to include also the signal number: "Caught signal %d (%s) ... Exiting.\n" We adapted the translation by inserting " (%s)" after "%d" in the translation of "Caught signal %d... Exiting.\n".
We merged the translations of the individual strings to a translation of the new string. Note that for some strings the translation did not exists. In that case we used the original English version, which yields a mixture between English and the other language.
Merge the two old separate translations into a single translation of the new string.
tl;dr Discussion
I found one missing I, too, looked at removing the What I'm committing looks quite different to your branch, but it's not. Code changesThe first block are just the code changes. I've removed the You added /* L10N:
These three letters correspond to the choices in the string:
File exists, (o)verwrite, (a)ppend, or (c)ancel?
*/
switch (mutt_multi_choice(
_("File exists, (o)verwrite, (a)ppend, or (c)ancel?"), _("oac"))) but unfortunately, they are placed next to the menu string because that comes first. switch (mutt_multi_choice(
_("File exists, (o)verwrite, (a)ppend, or (c)ancel?"),
// L10N: Options for: File exists, (o)verwrite, (a)ppend, or (c)ancel?
_("oac"))) Now the message will be associated with the options. Update-poThis one enormous commit only updates the file references. Magic ChangesNow that all the code changes have been made and all the references updated, |
Note: This is work in progress.
Marking strings for translation isn't trivial. There are two common mistakes/antipatterns
Translating parts of a sentence and gluing the translation together:
This should be
This antipattern is more common in its form
Believing that plural is as simple as changing the word for its plural form
In it's extreme form this is written as
However, other languages might have more than one plural form and seldom it is done by just appending a suffix (like
s
) to the noun. See the Polish example in the GNU gettext manual [0]The better way is to use
ngettext(const char*, const char*, unsigned long int)
:This pull request/branch tries to fix some of the above stated mistakes. Also adding some comments and explanations for the translators where needed. (Also also one miss-usage of
N_()
instead of_()
was fixed.)The changes introduce a lot of new strings to be translated. I did my best to "reuse" the old translations for the new ones (using the seperate translations for singular/plural into one translation for
ngettext
, ...). As an end user you should get the same translation as before.[0] https://www.gnu.org/software/gettext/manual/html_chapter/gettext.html#Plural-forms
Currently I've several questions:
in signal.c:54
There is different strings to translate depending on the OS:
Does
sys_siglist
contain theCaught signal
part on those OSes that miss it?commands.c:283
I'd like to use
ngettext()
here. However,h
is just a pointer and not the number of tagged messages.Is there an easy way to get the number of tagged messages? (cf in
recvcmd.c
thecount_tagged()
function).