Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] sanitize usage of gettext #1156

Merged
merged 41 commits into from
Apr 24, 2018
Merged

[RFC] sanitize usage of gettext #1156

merged 41 commits into from
Apr 24, 2018

Conversation

ghost
Copy link

@ghost ghost commented Apr 21, 2018

Note: This is work in progress.

  • What does this PR do?

Marking strings for translation isn't trivial. There are two common mistakes/antipatterns

  1. Translating parts of a sentence and gluing the translation together:

    s1 = _("part 1");
    s2 = _("part 2");
    printf("%s %s", s1, s2);
    

    This should be

    printf(_("part 1 part 2");
    

    This antipattern is more common in its form

    printf(_("string with a %s placeholder"), b ? _("big") : _("small"));
    
  2. Believing that plural is as simple as changing the word for its plural form

    if (i = 1)
       printf("one message");
    else
       printf("%d messages", i);
    

    In it's extreme form this is written as

    printf("%d message%s", i, i == 1 ? "", "s");
    

    However, other languages might have more than one plural form and seldom it is done by just appending a suffix (like s) to the noun. See the Polish example in the GNU gettext manual [0]

    The better way is to use ngettext(const char*, const char*, unsigned long int):

    printf(ngettext("one message", "%d messages", i), i);
    

This pull request/branch tries to fix some of the above stated mistakes. Also adding some comments and explanations for the translators where needed. (Also also one miss-usage of N_() instead of _() was fixed.)

The changes introduce a lot of new strings to be translated. I did my best to "reuse" the old translations for the new ones (using the seperate translations for singular/plural into one translation for ngettext, ...). As an end user you should get the same translation as before.

[0] https://www.gnu.org/software/gettext/manual/html_chapter/gettext.html#Plural-forms

  • Are there points in the code the reviewer needs to double check?

Currently I've several questions:

  • in signal.c:54
    There is different strings to translate depending on the OS:

    #if HAVE_DECL_SYS_SIGLIST
      printf(_("%s...  Exiting.\n"), sys_siglist[sig]);
    #elif (defined(__sun__) && defined(__svr4__))
      printf(_("Caught %s...  Exiting.\n"), _sys_siglist[sig]);
    #elif (defined(__alpha) && defined(__osf__))
      printf(_("Caught %s...  Exiting.\n"), __sys_siglist[sig]);
    #else
      printf(_("Caught signal %d...  Exiting.\n"), sig);
    #endif
    

    Does sys_siglist contain the Caught signal part on those OSes that miss it?

  • commands.c:283
    I'd like to use ngettext() here. However, h is just a pointer and not the number of tagged messages.
    Is there an easy way to get the number of tagged messages? (cf in recvcmd.c the count_tagged() function).

@ghost ghost self-assigned this Apr 21, 2018
@ghost ghost added the status:in-progress label Apr 21, 2018
@flatcap
Copy link
Member

flatcap commented Apr 21, 2018

Wow, this is amazing. Thanks for the detailed explanation.

When I first started sorting out the translations, I read about ngettext().
It's very clever, but I don't remember ever finding any actual users of it :-)

I copied all the "Plural-Forms" settings from the Gnu translation site, so they should be correct.

Also also one mis-usage of N_() instead of _() was fixed

Well spotted.

in signal.c:54
There is different strings to translate depending on the OS:

I don't know, but as far as I can tell, the sig_list is just going to be a human-readable string. It's not a message we're going to see often, and when it does happen I'm not sure it's useful to translate the message...

NeoMutt just stopped with the message: "香蕉"

I think we should keep the sig_list name, but change the message to:

  • Use one message for all OSs
  • Include the signal number

Is there an easy way to get the number of tagged messages?

I don't think so. The places that use tagged messages tend to do an operation on one email, many times. Fortunately, in this case we can use the for loop above to count them. It's in an if (Context) clause, but I can't see how that could possibly be NULL. No Context, means no emails and no tagged emails.

There are probably more cases like this where we need to add a count parameter to functions.

@ghost ghost force-pushed the devel/gettext branch 3 times, most recently from 3368831 to 1575e88 Compare April 22, 2018 18:55
@ghost
Copy link
Author

ghost commented Apr 22, 2018

Ready for review.

Besides some (more or less) trivial changes I needed to rewrite parts of the output routine for attachment headers (these [-- .... --] thingies). Please review them. They are located in handler.c in the functions external_body_handler() and mutt_body_handler().

There is also a small code snippet which counts messages (commands.c functions mutt_pipe_message(), mutt_print_message() and mutt_save_message_ctx()) which is new, so you might also have a look there.

I converted some of the more complex "gluing strings" to individual (complete) strings. Sometimes these "gluing strings" are format strings for a printf() call. Depending on how they are combined the resulting individual strings might have a different number of parameters. An example can be found in the function external_body_handler() in handler.c. To keep things simple (in the C code), I only called printf() once with the union of all parameters and used in those translatable strings which only need a subset of the parameters the syntax %2$s. When doing so, I added a comment for the translators that they are not surprised by this. I case you don't like this, I can change it at the cost of making the C code slightly more complex with a second printf() call (unless there is better solution).

In case you speak another language sufficiently fluent (no expert level needed), you can check the merged translations and remove the fuzzy flag if appropriate.

I think we should keep the sig_list name, but change the message to:

  • Use one message for all OSs
  • Include the signal number

I added the signal number to the string and adapted the translations by a simple search and replace approach: %d becomes %d (%s).

@ghost ghost changed the title [WIP/RFC] sanitize usage of gettext [RFC] sanitize usage of gettext Apr 22, 2018
@flatcap
Copy link
Member

flatcap commented Apr 23, 2018

Ready for review.

git diff master | wc -l
185221

I've spent a few hours looking over this.
I'll have some questions soon (hopefully).

Reis Radomil added 20 commits April 24, 2018 21:13
The corresponding code to the comment was changed in
49a037a.
Document the '%s' parameter for translators.
Document that the choice string (which contains the characters for the
choice) must match up with the translated string of the question.
Use _() directly instead of N_() followed by _() for strings where
this is applicable, i.e.

   s = N_("string");
   printf("...", _(s));

becomes

   s = _("string")
   printf("...", s);
Translatable strings should not be split in the middle but be one
self-contained and complete unit of thought. In particular do not split
them arbitrary at the end of a sentence.

Replace

  s1 = _("sentence 1.");
  printf("%s sentence2.", s1);

with

  s = _("sentence 1. sentence 2.");
  printf("%s", s);
Add a comment for translators that the strings must be padded with
spaces.

Also do actually translate the strings (using '_()') and not only mark
them for translation with 'N_()'.
Avoid the antipattern

   printf(n == 1 ? _("one thing") : _("%d things"), n);

instead use

   printf(ngettext("one thing", "%d things", n), n);

(note: n has to be passed to ngettext and printf)

Although most Germanic languages (including English) use the singular
form for n=1 and a (single) plural form for all other n (including n=0),
this is not the case in general. An example from the GNU gettext manual
[0] for Polish work plik (file):

1 plik
2,3,4 pliki
5-21 pliko'w
22-24 pliki
25-31 pliko'w

When using numbers in translatable strings, do not choose the translated
plural form in the code. Instead let GNU gettext capability pick the
correct translated plural (provided by a translator) depending on the
number.

[0] https://www.gnu.org/software/gettext/manual/html_chapter/gettext.html#Plural-forms
Convert some occurrences of

   printf("%d things", n);

to

   printf(ngettext("%d thing", "%d things", n), n);
Choose the correct plural depending on the number of messages handled.
Some choices are between one and not one as the precise number of
messages is not known. A clarifying comment is added for the translator
in these cases.
Split up the horrible construction of choosing the plural and gluing
translations together. This should make the job of a translator easier.

Gluing translations is

   printf(_("Decode-save%s to mailbox"), h ? _("") : _(" tagged"));

instead of the better

   if (h)
     printf(_("Decode-save to mailbox"));
   else
     printf(_("Decode-save tagged to mailbox"));
Do not use this antipattern:

  printf(_("some %s sentence."), b ? _("big") : _("small"));

as the translation of "big"/"small" and depends on the sentence it is
used in. Instead give the translator the complete sentence:

  if (b)
    printf(_("some big sentence."));
  else
    printf(_("some small sentence."));
Do not use this antipattern:

  printf(_("some %s sentence."), b ? _("big") : _("small"));

as the translation of "big"/"small" and depends on the sentence it is
used in. Instead give the translator the complete sentence:

  if (b)
    printf(_("some big sentence."));
  else
    printf(_("some small sentence."));
Pass the complete string "This %s/%s attachment has been deleted on %s"
to gettext and not single parts gluing them together afterwards.

Although it is tempting to toggle optional sentence parts at the end,
this is a bad idea as this might not work in all languages since these
might have special rules (compare "manner before place before time" rule
for adverbs in the end-position or subject-verb-object order in English).
Pass the complete string to the translator instead of chopping it into
pieces and translate every piece.
Pass the complete string to the translator instead of chopping it into
pieces and translate every piece.
Pass the complete string to the translator instead of chopping it into
pieces and translate every piece.
Reis Radomil and others added 21 commits April 24, 2018 21:31
Also include the signal number in the output.
Do not chop the help string into multiple parts instead pass the
complete string on for translation.

Reverts partly 9b7faee.
Translations for the strings "s1" and "s2" which are now "s1 s2"  were
concatenated except for the cases where a translation of s1 or s2 was
not present.
We merged the two translations of the singular message and plural
messages into one translation. If the language has more than one plural
the old plural translation is used for any plural form of the language.
This might not be accurate but more precise than leaving the plural
completely untranslated.

All the new places were marked as fuzzy.
Combine the separated singular and plural translations into a single
translation. For languages with more than a single plural form, the old
plural form is used in any case. This might not be correct but is better
than nothing (it also is the current behaviour).
Combine the separated singular and plural translations into a single
translation. For languages with more than a single plural form, the old
plural form is used in any case. This might not be correct but is better
than nothing (it also is the current behaviour).
Merge the old translation of

    "Character set changed to %s; %s."

together with the translations of

    "not converting"
or

    "converting"

respectively, by replacing the second '%s' with the translated string.
The resulting translation might not be correct but it is better than
nothing. Although it is the current translation by NeoMutt as this merge
was done previously in the C code.
Join the old translations of

    "[-- Alternative Type #%d: "
    "[-- Type: "
    "%s/%s%s%s, Encoding: %s, Size: %s --]\n"

together to the translations of

    "[-- Alternative Type #%d: %s/%s%s%s, Encoding: %s, Size: %s --]\n"
    "[-- Type: %s/%s%s%s, Encoding: %s, Size: %s --]\n"

Some notes:

* Some translators translated "%s/%... ---]" with "[-- %s/%...  --]"
  (note the leading "[--" in the translation).  Thus, the new
  translation is "[-- .. [-- ... --]" which is kinda odd but still the
  same string as would previously be displayed to the user.

* The new string "[-- Alternative Type #%d: ... --]" has an additional
  parameter "%d". Some translators ignored it in their translation of
  the old "[-- Alternative Type #%d: " (by not translating it), so that
  the index for the '%s' would be off.

  This applies to the majority of translations. However, most of them
  are out of date and do not have enough '%s', to event print the old
  string correctly, i.e. "A %s%s B %s" was translated with "A %s B %s".
  We ignored those languages as a translator must take care of this.

  There were two exceptions, however, Polish (pl) and Turkish (tr).
  We used the untranslated version of "[-- Alternative Type #%d: " and
  "[-- Type: " and appended it to the start of the translation.
The old translation was for "[-- Attachment #%d" and was extended in the
C code by either " --]" or ": %s --]". Now we translate the
complete string.

Update the po files to reflect this. Reuse the old translation and
append " --]" or ": %s --]" to it, respectively.
The old translations where build together in the C code. Now we pass the
complete string to gettext. Update the po files and combine the old
translations into new ones.

In the old version the strings translated where

    "[-- This %s/%s attachment "

    "(size %s bytes) "

    "has been deleted --]\n"

    "[-- on %s --]\n"

The four new strings are

    "[-- This %s/%s attachment (size %s bytes) has been deleted --]\n"
    "[-- on %s --]\n"

    "[-- This %s/%s attachment (size %s bytes) has been deleted --]\n"

    "[-- This %s/%s attachment has been deleted --]\n
    "[-- on %4$s --]\n"

    "[-- This %s/%s attachment has been deleted --]\n"
Combine the old translations of the chopped strings

    s1 =
    "[-- This %s/%s attachment is not included, --]\n"

    s2 =
    "[-- and the indicated external source has --]\n"
    "[-- expired. --]\n"

into one translation for the single string

    s_new =
    "[-- This %s/%s attachment is not included, --]\n"
    "[-- and the indicated external source has --]\n"
    "[-- expired. --]\n"

Four translators (Galician (gl), Lithuanian (lt), Slovak (sk), Taiwanese
Mandarin (zh_WT)) seemed to have translated the old s2 as "s1 s2", so
the translation of s_new would have a doubling at the start. In these
versions we modified the translation to not have this doubling.
Combine the old translations of the chopped strings

    s1 =
    "[-- This %s/%s attachment is not included, --]\n"

    s2 =
    "[-- and the indicated access-type %s is unsupported --]\n"

into one translation for the single string

    s_new =
    "[-- This %s/%s attachment is not included, --]\n"
    "[-- and the indicated access-type %s is unsupported --]\n"

Four translators (Galician (gl), Lithuanian (lt), Slovak (sk), Taiwanese
Mandarin (zh_WT)) seemed to have translated the old s2 as "s1 s2", so
the translation of s_new would have a doubling at the start. In these
versions we modified the translation to not have this doubling.
Combine the old translations of the chopped strings

    "[-- This is an attachment "]

    "[-- %s/%s is unsupported "]

    "(use '%s' to view this part)"]

    "(need 'view-attachments' bound to key!)"]

into a translation for the following six strings. These strings have
also a " --]\n" appended at the end.

    "[-- This is an attachment (use '%3$s' to view this part) --]\n"

    "[-- %s/%s is unsupported (use '%s' to view this part) --]\n"

    "[-- This is an attachment (need 'view-attachments' bound to key!) --]\n"

    "[-- %s/%s is unsupported (need 'view-attachments' bound to key!) --]\n"

    "[-- This is an attachment --]\n"

    "[-- %s/%s is unsupported --]\n"
The original string was changed from "Caught signal %s... Exiting.\n"
to include also the signal number:

   "Caught signal %d (%s) ...  Exiting.\n"

We adapted the translation by inserting " (%s)" after "%d" in the
translation of "Caught signal %d... Exiting.\n".
We merged the translations of the individual strings to a translation of
the new string. Note that for some strings the translation did not
exists. In that case we used the original English version, which yields
a mixture between English and the other language.
Merge the two old separate translations into a single translation of the
new string.
@ghost ghost assigned flatcap Apr 24, 2018
@flatcap
Copy link
Member

flatcap commented Apr 24, 2018

tl;dr
Very impressive. Merged. Thanks.

Discussion

(these [-- .... --] thingies). Please review them

I found one missing state_mark_attach(), but otherwise they are all good.

I, too, looked at removing the [-- and --] from the messages,
but realised that I'd need to printf() twice in some cases.
It would save the translator a little effort, but perhaps we'll leave that for now.

What I'm committing looks quite different to your branch, but it's not.
I've rearranged the commits so that the changes are smaller and simpler to understand.

Code changes

The first block are just the code changes.
I've made a few small tweaks here.

I've removed the *s from the L10N comment blocks.
When using a standard C block comment, the *s turn up in the .po file

You added L10N comments for the menu option strings:

/* L10N:
   These three letters correspond to the choices in the string:
   File exists, (o)verwrite, (a)ppend, or (c)ancel?
 */
switch (mutt_multi_choice(
    _("File exists, (o)verwrite, (a)ppend, or (c)ancel?"), _("oac")))

but unfortunately, they are placed next to the menu string because that comes first.
I've fixed this by inserting a comment into the mutt_multi_choice() arguments:

switch (mutt_multi_choice(
    _("File exists, (o)verwrite, (a)ppend, or (c)ancel?"),
    // L10N: Options for: File exists, (o)verwrite, (a)ppend, or (c)ancel?
    _("oac")))

Now the message will be associated with the options.

Update-po

This one enormous commit only updates the file references.
There are none of your magic changes here.

Magic Changes

Now that all the code changes have been made and all the references updated,
your magic changes to add plurals, etc, are actually quite small commits.
Their meaning is clear.

@flatcap flatcap merged commit ae455b9 into master Apr 24, 2018
@ghost ghost removed the status:in-progress label Apr 24, 2018
@flatcap flatcap deleted the devel/gettext branch April 27, 2018 15:29
@flatcap flatcap added the topic:translation Translation label Aug 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant