Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a tool to check printf style format string in translations #48530

Merged
merged 3 commits into from
May 3, 2021

Conversation

BrettDong
Copy link
Member

@BrettDong BrettDong commented Apr 15, 2021

Summary

Infrastructure "Add a tool to check printf style format string in translations"

Purpose of change

Some printf style format string mismatches exist in translations, for example, %d something %s gets translated to %s XXX %d, and currently we lack an automated tool to check for such kind of errors.

Describe the solution

Write a Python script that uses polib library to read PO files and check for inconsistencies of printf format strings between original string and translated string.

Alternative

My first attempt: Run printf test frompofilter tool from translate-toolkit software package to check for printf format string errors, and filter its output by a Python script to get rid of false positives.

Update: There are too many false positives, so I just went to write my own printf checker.

Testing

Errors are detected, here is a portion of the output:

Checking ja.po => 1 error(s) detected:
original  :<npcname> doesn't have limbs that require splinting.
translated:%1$sに固定が必要な四肢はありません。


Checking ko.po => 2 error(s) detected:
original  :You avoid an incoming projectile!
translated:%s을(를) 피했다!

original  :<npcname> avoids an incoming projectile.
translated:<npcname>이(가) %s을(를) 피했다.


Checking pl.po => 3 error(s) detected:
original  :%1$s butts %2$s with their antlers
translated:%1$s łomocze %s swoim porożem

original  :It should take %d minutes to finish washing items in the %s.
translated:Wypranie rzeczy w %s powinno zająć %d minut.

original  :Insects begin to emerge from <npcname>'s skin!
translated:Owady rozrywają ciało %s i wychodzą na zewnątrz.


Checking pt_BR.po => 1 error(s) detected:
original  :The %s recovers from its skid.
translated:% se recupera de sua derrapagem.


Checking ru.po => 3 error(s) detected:
original  :You tear into %s with your serrated teeth
translated:Вы разрываете противника (%2$s) своими зазубренными зубами

original  :%1$s bashes %2$s with their powerful wings
translated:%1$s бьёт противника (%s) своими могучими крыльями

original  :Do you wish to sell the crop of %d %s for a profit of $%d?
translated:Вы хотите продать урожай (%s - %d), получив прибыль в размере $%d?


Checking zh_CN.po => 3 error(s) detected:
original  :You wrap your scarf tighter.
translated:你裹紧你的 %s。

original  :You wrap your scarf a bit tighter.
translated:你稍稍裹紧你的 %s。

Additional context

@BrettDong BrettDong force-pushed the check-po branch 2 times, most recently from 141375d to 9d18cb5 Compare April 15, 2021 01:30
@BrettDong BrettDong added Code: Tests Measurement, self-control, statistics, balancing. Code: Tooling Tooling that is not part of the main game but is part of the repo. Translation I18n labels Apr 15, 2021
@BrettDong BrettDong force-pushed the check-po branch 4 times, most recently from 1f60bee to 06b3c0e Compare April 15, 2021 06:37
@BrettDong BrettDong marked this pull request as ready for review April 15, 2021 06:40
@BrettDong BrettDong changed the title Add GitHub Action workflow to check printf style format string in translations Add a tool to check printf style format string in translations Apr 15, 2021
@BrettDong BrettDong marked this pull request as draft April 15, 2021 06:58
@BrettDong BrettDong force-pushed the check-po branch 2 times, most recently from 793bc95 to 82fb0c7 Compare April 15, 2021 16:44
@BrettDong BrettDong marked this pull request as ready for review April 16, 2021 02:51
@jbytheway
Copy link
Contributor

Since you're raised the subject of printf format strings and translations: I've been thinking about writing a clang-tidy check to require that every translated string with multiple format specifiers use the numbered form (1$, 2$, etc.). Do you think that would be worthwhile? In the short term it would invalidate a lot of translations, which is why I was thinking it would be good to do shortly after the stable release; in the long term it would save us scrutinizing PRs for that particular i18n issue.

@BrettDong
Copy link
Member Author

The impact may be mitigated by building a tool that automatically migrates non positional format strings to positional format strings in translations.

lang/po/pl.po Outdated
Comment on lines 283030 to 283033
#: src/map.cpp
#, c-format
msgid "It should take %d minutes to finish washing items in the %s."
msgstr "Wypranie rzeczy w %s powinno zająć %d minut."
msgstr ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one appears to need positional parameters?

lang/po/es_ES.po Outdated
@@ -278694,7 +278694,7 @@ msgstr ""

#: src/faction_camp.cpp
msgid "departs to survey land…"
msgstr "parte para inspeccionar tenrenos..."
msgstr "parte para inspeccionar terrenos..."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
msgstr "parte para inspeccionar terrenos..."
msgstr "parte para inspeccionar terrenos..."

lang/po/es_ES.po Outdated
@@ -156204,7 +156204,7 @@ msgid ""
msgstr ""
"Al ignorar la ofensiva a favor de la autodefensa, eres mejor en la protección.\n"
"\n"
"Daño bloqueado reducido en un 100%% de Destreza."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm missing something, this %% was necessary in order to print a % to the final string.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is a plain unformatted string, because in the original English string there is only a single %

lang/po/es_ES.po Outdated
@@ -156222,7 +156222,7 @@ msgid ""
msgstr ""
"Un practicante de aikido intermedio puede protegerse contra múltiples oponentes.\n"
"\n"
"Daño bloqueado reducido en un 100%% de Destreza.\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This %% was also necessary.

However, now that I'm looking at it, why does "Intermediate Akido" translate to this long phrase?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a translation error. lang/po/es_ES.po in master branch reads

#: lang/json/martial_art_from_json.py
msgid "Intermediate Aikido"
msgstr "Aikido intermedio"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'd better revert the *.po changes and only add the checking script in this pull request.

@ZhilkinSerg ZhilkinSerg merged commit 6f2d1ff into CleverRaven:master May 3, 2021
@BrettDong BrettDong deleted the check-po branch May 21, 2021 23:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Code: Tests Measurement, self-control, statistics, balancing. Code: Tooling Tooling that is not part of the main game but is part of the repo. Translation I18n
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants