[RFC] sanitize usage of gettext #1156

ghost · 2018-04-21T20:03:14Z

Note: This is work in progress.

What does this PR do?

Marking strings for translation isn't trivial. There are two common mistakes/antipatterns

Translating parts of a sentence and gluing the translation together:

s1 = _("part 1");
s2 = _("part 2");
printf("%s %s", s1, s2);

This should be

printf(_("part 1 part 2");

This antipattern is more common in its form

printf(_("string with a %s placeholder"), b ? _("big") : _("small"));

Believing that plural is as simple as changing the word for its plural form
```
if (i = 1)
   printf("one message");
else
   printf("%d messages", i);
```
In it's extreme form this is written as
```
printf("%d message%s", i, i == 1 ? "", "s");
```
However, other languages might have more than one plural form and seldom it is done by just appending a suffix (like s) to the noun. See the Polish example in the GNU gettext manual [0]

The better way is to use ngettext(const char*, const char*, unsigned long int):
```
printf(ngettext("one message", "%d messages", i), i);
```

This pull request/branch tries to fix some of the above stated mistakes. Also adding some comments and explanations for the translators where needed. (Also also one miss-usage of N_() instead of _() was fixed.)

The changes introduce a lot of new strings to be translated. I did my best to "reuse" the old translations for the new ones (using the seperate translations for singular/plural into one translation for ngettext, ...). As an end user you should get the same translation as before.

[0] https://www.gnu.org/software/gettext/manual/html_chapter/gettext.html#Plural-forms

Are there points in the code the reviewer needs to double check?

Currently I've several questions:

in signal.c:54
There is different strings to translate depending on the OS:

#if HAVE_DECL_SYS_SIGLIST
  printf(_("%s...  Exiting.\n"), sys_siglist[sig]);
#elif (defined(__sun__) && defined(__svr4__))
  printf(_("Caught %s...  Exiting.\n"), _sys_siglist[sig]);
#elif (defined(__alpha) && defined(__osf__))
  printf(_("Caught %s...  Exiting.\n"), __sys_siglist[sig]);
#else
  printf(_("Caught signal %d...  Exiting.\n"), sig);
#endif

Does sys_siglist contain the Caught signal part on those OSes that miss it?

commands.c:283
I'd like to use ngettext() here. However, h is just a pointer and not the number of tagged messages.
Is there an easy way to get the number of tagged messages? (cf in recvcmd.c the count_tagged() function).

flatcap · 2018-04-21T23:47:19Z

Wow, this is amazing. Thanks for the detailed explanation.

When I first started sorting out the translations, I read about ngettext().
It's very clever, but I don't remember ever finding any actual users of it :-)

I copied all the "Plural-Forms" settings from the Gnu translation site, so they should be correct.

Also also one mis-usage of N_() instead of _() was fixed

Well spotted.

in signal.c:54
There is different strings to translate depending on the OS:

I don't know, but as far as I can tell, the sig_list is just going to be a human-readable string. It's not a message we're going to see often, and when it does happen I'm not sure it's useful to translate the message...

NeoMutt just stopped with the message: "香蕉"

I think we should keep the sig_list name, but change the message to:

Use one message for all OSs
Include the signal number

Is there an easy way to get the number of tagged messages?

I don't think so. The places that use tagged messages tend to do an operation on one email, many times. Fortunately, in this case we can use the for loop above to count them. It's in an if (Context) clause, but I can't see how that could possibly be NULL. No Context, means no emails and no tagged emails.

There are probably more cases like this where we need to add a count parameter to functions.

ghost · 2018-04-22T19:15:48Z

Ready for review.

Besides some (more or less) trivial changes I needed to rewrite parts of the output routine for attachment headers (these [-- .... --] thingies). Please review them. They are located in handler.c in the functions external_body_handler() and mutt_body_handler().

There is also a small code snippet which counts messages (commands.c functions mutt_pipe_message(), mutt_print_message() and mutt_save_message_ctx()) which is new, so you might also have a look there.

I converted some of the more complex "gluing strings" to individual (complete) strings. Sometimes these "gluing strings" are format strings for a printf() call. Depending on how they are combined the resulting individual strings might have a different number of parameters. An example can be found in the function external_body_handler() in handler.c. To keep things simple (in the C code), I only called printf() once with the union of all parameters and used in those translatable strings which only need a subset of the parameters the syntax %2$s. When doing so, I added a comment for the translators that they are not surprised by this. I case you don't like this, I can change it at the cost of making the C code slightly more complex with a second printf() call (unless there is better solution).

In case you speak another language sufficiently fluent (no expert level needed), you can check the merged translations and remove the fuzzy flag if appropriate.

I think we should keep the sig_list name, but change the message to:

Use one message for all OSs

Include the signal number

I added the signal number to the string and adapted the translations by a simple search and replace approach: %d becomes %d (%s).

flatcap · 2018-04-23T22:11:18Z

Ready for review.

git diff master | wc -l
185221

I've spent a few hours looking over this.
I'll have some questions soon (hopefully).

The corresponding code to the comment was changed in 49a037a.

Document the '%s' parameter for translators.

Document that the choice string (which contains the characters for the choice) must match up with the translated string of the question.

Use _() directly instead of N_() followed by _() for strings where this is applicable, i.e. s = N_("string"); printf("...", _(s)); becomes s = _("string") printf("...", s);

Translatable strings should not be split in the middle but be one self-contained and complete unit of thought. In particular do not split them arbitrary at the end of a sentence. Replace s1 = _("sentence 1."); printf("%s sentence2.", s1); with s = _("sentence 1. sentence 2."); printf("%s", s);

Add a comment for translators that the strings must be padded with spaces. Also do actually translate the strings (using '_()') and not only mark them for translation with 'N_()'.

Avoid the antipattern printf(n == 1 ? _("one thing") : _("%d things"), n); instead use printf(ngettext("one thing", "%d things", n), n); (note: n has to be passed to ngettext and printf) Although most Germanic languages (including English) use the singular form for n=1 and a (single) plural form for all other n (including n=0), this is not the case in general. An example from the GNU gettext manual [0] for Polish work plik (file): 1 plik 2,3,4 pliki 5-21 pliko'w 22-24 pliki 25-31 pliko'w When using numbers in translatable strings, do not choose the translated plural form in the code. Instead let GNU gettext capability pick the correct translated plural (provided by a translator) depending on the number. [0] https://www.gnu.org/software/gettext/manual/html_chapter/gettext.html#Plural-forms

Convert some occurrences of printf("%d things", n); to printf(ngettext("%d thing", "%d things", n), n);

Choose the correct plural depending on the number of messages handled. Some choices are between one and not one as the precise number of messages is not known. A clarifying comment is added for the translator in these cases.

Split up the horrible construction of choosing the plural and gluing translations together. This should make the job of a translator easier. Gluing translations is printf(_("Decode-save%s to mailbox"), h ? _("") : _(" tagged")); instead of the better if (h) printf(_("Decode-save to mailbox")); else printf(_("Decode-save tagged to mailbox"));

Do not use this antipattern: printf(_("some %s sentence."), b ? _("big") : _("small")); as the translation of "big"/"small" and depends on the sentence it is used in. Instead give the translator the complete sentence: if (b) printf(_("some big sentence.")); else printf(_("some small sentence."));

Pass the complete string "This %s/%s attachment has been deleted on %s" to gettext and not single parts gluing them together afterwards. Although it is tempting to toggle optional sentence parts at the end, this is a bad idea as this might not work in all languages since these might have special rules (compare "manner before place before time" rule for adverbs in the end-position or subject-verb-object order in English).

Pass the complete string to the translator instead of chopping it into pieces and translate every piece.

Also include the signal number in the output.

Do not chop the help string into multiple parts instead pass the complete string on for translation. Reverts partly 9b7faee.

This partly reverses commit 9b7faee.

No changes to messages

Translations for the strings "s1" and "s2" which are now "s1 s2" were concatenated except for the cases where a translation of s1 or s2 was not present.

We merged the two translations of the singular message and plural messages into one translation. If the language has more than one plural the old plural translation is used for any plural form of the language. This might not be accurate but more precise than leaving the plural completely untranslated. All the new places were marked as fuzzy.

Combine the separated singular and plural translations into a single translation. For languages with more than a single plural form, the old plural form is used in any case. This might not be correct but is better than nothing (it also is the current behaviour).

Merge the old translation of "Character set changed to %s; %s." together with the translations of "not converting" or "converting" respectively, by replacing the second '%s' with the translated string. The resulting translation might not be correct but it is better than nothing. Although it is the current translation by NeoMutt as this merge was done previously in the C code.

Join the old translations of "[-- Alternative Type #%d: " "[-- Type: " "%s/%s%s%s, Encoding: %s, Size: %s --]\n" together to the translations of "[-- Alternative Type #%d: %s/%s%s%s, Encoding: %s, Size: %s --]\n" "[-- Type: %s/%s%s%s, Encoding: %s, Size: %s --]\n" Some notes: * Some translators translated "%s/%... ---]" with "[-- %s/%... --]" (note the leading "[--" in the translation). Thus, the new translation is "[-- .. [-- ... --]" which is kinda odd but still the same string as would previously be displayed to the user. * The new string "[-- Alternative Type #%d: ... --]" has an additional parameter "%d". Some translators ignored it in their translation of the old "[-- Alternative Type #%d: " (by not translating it), so that the index for the '%s' would be off. This applies to the majority of translations. However, most of them are out of date and do not have enough '%s', to event print the old string correctly, i.e. "A %s%s B %s" was translated with "A %s B %s". We ignored those languages as a translator must take care of this. There were two exceptions, however, Polish (pl) and Turkish (tr). We used the untranslated version of "[-- Alternative Type #%d: " and "[-- Type: " and appended it to the start of the translation.

The old translation was for "[-- Attachment #%d" and was extended in the C code by either " --]" or ": %s --]". Now we translate the complete string. Update the po files to reflect this. Reuse the old translation and append " --]" or ": %s --]" to it, respectively.

The old translations where build together in the C code. Now we pass the complete string to gettext. Update the po files and combine the old translations into new ones. In the old version the strings translated where "[-- This %s/%s attachment " "(size %s bytes) " "has been deleted --]\n" "[-- on %s --]\n" The four new strings are "[-- This %s/%s attachment (size %s bytes) has been deleted --]\n" "[-- on %s --]\n" "[-- This %s/%s attachment (size %s bytes) has been deleted --]\n" "[-- This %s/%s attachment has been deleted --]\n "[-- on %4$s --]\n" "[-- This %s/%s attachment has been deleted --]\n"

Combine the old translations of the chopped strings s1 = "[-- This %s/%s attachment is not included, --]\n" s2 = "[-- and the indicated external source has --]\n" "[-- expired. --]\n" into one translation for the single string s_new = "[-- This %s/%s attachment is not included, --]\n" "[-- and the indicated external source has --]\n" "[-- expired. --]\n" Four translators (Galician (gl), Lithuanian (lt), Slovak (sk), Taiwanese Mandarin (zh_WT)) seemed to have translated the old s2 as "s1 s2", so the translation of s_new would have a doubling at the start. In these versions we modified the translation to not have this doubling.

Combine the old translations of the chopped strings s1 = "[-- This %s/%s attachment is not included, --]\n" s2 = "[-- and the indicated access-type %s is unsupported --]\n" into one translation for the single string s_new = "[-- This %s/%s attachment is not included, --]\n" "[-- and the indicated access-type %s is unsupported --]\n" Four translators (Galician (gl), Lithuanian (lt), Slovak (sk), Taiwanese Mandarin (zh_WT)) seemed to have translated the old s2 as "s1 s2", so the translation of s_new would have a doubling at the start. In these versions we modified the translation to not have this doubling.

Combine the old translations of the chopped strings "[-- This is an attachment "] "[-- %s/%s is unsupported "] "(use '%s' to view this part)"] "(need 'view-attachments' bound to key!)"] into a translation for the following six strings. These strings have also a " --]\n" appended at the end. "[-- This is an attachment (use '%3$s' to view this part) --]\n" "[-- %s/%s is unsupported (use '%s' to view this part) --]\n" "[-- This is an attachment (need 'view-attachments' bound to key!) --]\n" "[-- %s/%s is unsupported (need 'view-attachments' bound to key!) --]\n" "[-- This is an attachment --]\n" "[-- %s/%s is unsupported --]\n"

The original string was changed from "Caught signal %s... Exiting.\n" to include also the signal number: "Caught signal %d (%s) ... Exiting.\n" We adapted the translation by inserting " (%s)" after "%d" in the translation of "Caught signal %d... Exiting.\n".

We merged the translations of the individual strings to a translation of the new string. Note that for some strings the translation did not exists. In that case we used the original English version, which yields a mixture between English and the other language.

Merge the two old separate translations into a single translation of the new string.

flatcap · 2018-04-24T23:32:15Z

tl;dr
Very impressive. Merged. Thanks.

Discussion

(these [-- .... --] thingies). Please review them

I found one missing state_mark_attach(), but otherwise they are all good.

I, too, looked at removing the [-- and --] from the messages,
but realised that I'd need to printf() twice in some cases.
It would save the translator a little effort, but perhaps we'll leave that for now.

What I'm committing looks quite different to your branch, but it's not.
I've rearranged the commits so that the changes are smaller and simpler to understand.

Code changes

The first block are just the code changes.
I've made a few small tweaks here.

I've removed the *s from the L10N comment blocks.
When using a standard C block comment, the *s turn up in the .po file

You added L10N comments for the menu option strings:

/* L10N:
   These three letters correspond to the choices in the string:
   File exists, (o)verwrite, (a)ppend, or (c)ancel?
 */
switch (mutt_multi_choice(
    _("File exists, (o)verwrite, (a)ppend, or (c)ancel?"), _("oac")))

but unfortunately, they are placed next to the menu string because that comes first.
I've fixed this by inserting a comment into the mutt_multi_choice() arguments:

switch (mutt_multi_choice(
    _("File exists, (o)verwrite, (a)ppend, or (c)ancel?"),
    // L10N: Options for: File exists, (o)verwrite, (a)ppend, or (c)ancel?
    _("oac")))

Now the message will be associated with the options.

Update-po

This one enormous commit only updates the file references.
There are none of your magic changes here.

Magic Changes

Now that all the code changes have been made and all the references updated,
your magic changes to add plurals, etc, are actually quite small commits.
Their meaning is clear.

ghost self-assigned this Apr 21, 2018

ghost added the status:in-progress label Apr 21, 2018

ghost force-pushed the devel/gettext branch 3 times, most recently from 3368831 to 1575e88 Compare April 22, 2018 18:55

ghost changed the title ~~[WIP/RFC] sanitize usage of gettext~~ [RFC] sanitize usage of gettext Apr 22, 2018

flatcap mentioned this pull request Apr 24, 2018

Re-sort resources for better documents #1105

Merged

Reis Radomil added 20 commits April 24, 2018 21:13

I18N: ncrypt/smime.c: Remove outdated comment

7af2d7b

The corresponding code to the comment was changed in 49a037a.

I18N: recvattach.c: Add comment to translation string

fc1c0d1

Document the '%s' parameter for translators.

I18N: Document the choice strings for translators

45e04d3

Document that the choice string (which contains the characters for the choice) must match up with the translated string of the question.

I18N: Use _() instead of N_() for marking strings for translation

49fa6f2

Use _() directly instead of N_() followed by _() for strings where this is applicable, i.e. s = N_("string"); printf("...", _(s)); becomes s = _("string") printf("...", s);

I18N: ncrypt/smime.c: Translate strings and document them

191449f

Add a comment for translators that the strings must be padded with spaces. Also do actually translate the strings (using '_()') and not only mark them for translation with 'N_()'.

I18N: Define ngettext macro when compiling without gettext support

44474de

I18N: Use ngettext to select plural form for a string

f543815

Convert some occurrences of printf("%d things", n); to printf(ngettext("%d thing", "%d things", n), n);

I18N: Choose correct plural form in recvcmd.c

2eb3e4a

I18N: No need to translate "%s: %s"

af5663c

I18N: Choose correct plural form in commands.c

e1a561b

Choose the correct plural depending on the number of messages handled. Some choices are between one and not one as the precise number of messages is not known. A clarifying comment is added for the translator in these cases.

I18N: handler.c: Pass the complete line to the translators

5252126

I18N: handler.c: Pass the complete string to the translator

6050fbb

Pass the complete string to the translator instead of chopping it into pieces and translate every piece.

I18N: handler.c: Pass the complete string to the translator

347c442

Pass the complete string to the translator instead of chopping it into pieces and translate every piece.

I18N: handler.c: Pass the complete string to the translator

3c9b0be

Pass the complete string to the translator instead of chopping it into pieces and translate every piece.

Reis Radomil and others added 21 commits April 24, 2018 21:31

I18N: Unify printed string on signal caught

85ecc55

Also include the signal number in the output.

I18N: Combine help string into a single string for translation

ed2a64f

Do not chop the help string into multiple parts instead pass the complete string on for translation. Reverts partly 9b7faee.

I18N: version.c: Pass the complete license string to the translator

dbfac70

This partly reverses commit 9b7faee.

I18N: po/*.po updated to match new master

ee5c474

No changes to messages

I18N: Update po files

75c2738

Translations for the strings "s1" and "s2" which are now "s1 s2" were concatenated except for the cases where a translation of s1 or s2 was not present.

I18N: Update po files

7d46801

Combine the separated singular and plural translations into a single translation. For languages with more than a single plural form, the old plural form is used in any case. This might not be correct but is better than nothing (it also is the current behaviour).

I18N: Update po files

d29ff4d

Combine the separated singular and plural translations into a single translation. For languages with more than a single plural form, the old plural form is used in any case. This might not be correct but is better than nothing (it also is the current behaviour).

I18N: Update po files

5548c60

I18N: Update po files

d24564e

We merged the translations of the individual strings to a translation of the new string. Note that for some strings the translation did not exists. In that case we used the original English version, which yields a mixture between English and the other language.

I18N: Update po files

c0138be

Merge the two old separate translations into a single translation of the new string.

L10N: Review British English translation

9c14f19

add missing state_mark_attach()

ae455b9

flatcap force-pushed the devel/gettext branch from 1575e88 to ae455b9 Compare April 24, 2018 23:18

ghost assigned flatcap Apr 24, 2018

flatcap approved these changes Apr 24, 2018

View reviewed changes

flatcap merged commit ae455b9 into master Apr 24, 2018

ghost removed the status:in-progress label Apr 24, 2018

flatcap deleted the devel/gettext branch April 27, 2018 15:29

flatcap added the topic:translation Translation label Aug 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] sanitize usage of gettext #1156

[RFC] sanitize usage of gettext #1156

ghost commented Apr 21, 2018

flatcap commented Apr 21, 2018

ghost commented Apr 22, 2018

flatcap commented Apr 23, 2018

flatcap commented Apr 24, 2018

[RFC] sanitize usage of gettext #1156

[RFC] sanitize usage of gettext #1156

Conversation

ghost commented Apr 21, 2018

flatcap commented Apr 21, 2018

ghost commented Apr 22, 2018

flatcap commented Apr 23, 2018

flatcap commented Apr 24, 2018

Discussion

Code changes

Update-po

Magic Changes