Skip to content

Commit

Permalink
[FA] First draft of Chapter2/Page2 (#129)
Browse files Browse the repository at this point in the history
* merged branch0 into main. no toctree yet.

* updated toctree.

* Updated the glossary with terms from chapter0.

* Second draft in collab w/ @schoobani. Added empty chapter1 for preview.

* Glossary typo fix.

* Added missing backticks.

* Removed a couple of bad indefinite articles I forgot.

* First draft of ch2/p2. Adds to glossary. Trans. guidelines moved out.

* Fixed missing diacritics, fixed the py/tf switch placing. Other fixes.

* Changed the equivalent for prediction per @kambizG 's direction.

* Redid ambiguous passage in translation per @lewtun 's direction.
  • Loading branch information
jowharshamshiri authored Apr 20, 2022
1 parent f851ec7 commit dea95d5
Show file tree
Hide file tree
Showing 4 changed files with 574 additions and 48 deletions.
67 changes: 67 additions & 0 deletions chapters/fa/TRANSLATING.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
1. Use this style guide for Persian. https://apll.ir/wp-content/uploads/2018/10/D-1394.pdf
We deviate from this style guide in the following ways:
- Use 'ء' as a possessive suffix instead of 'ی'

2. Don't translate industry-accepted acronyms. e.g. TPU or GPU.

3. Translate acronyms used for convenience in English to full Persian form. e.g. ML

4. Keep voice active and consistent. Don't overdo it but try to avoid a passive voice.

5. Refer and contribute to the glossary frequently to stay on top of the latest
choices we make. This minimizes the amount of editing that is required.

6. Keep POV consistent.

7. Smaller sentences are better sentences. Apply with nuance.

8. If translating a technical word, keep the choice of Persian equivalent consistent.
If workspace is mohit-e-kari don't just switch to faza-ye-kari. This ofc
does not apply for non-technical choices, as in those cases variety actually
helps keep the text engaging.

9. This is merely a translation. Don't add any technical/contextual information
not present in the original text. Also don't leave stuff out. The creative
choices in composing this information were the original authors' to make.
Our creative choices are in doing a quality translation.

10. Note on translating indefinite articles(a, an) to Persian. Persian syntax does
not have these. Common problem in translations is using
the equivalent of 'One'("Yek", "Yeki") as a direct word-for-word substitute
which makes for ugly Persian. Be creative with Persian word order to avoid this.

For example, the sentence "we use a library" should be translated as
"ما از کتابخانه‌ای استفاده می‌کنیم" and not "ما از یک کتابخانه استفاده می‌کنیم".
"Yek" here implies a number(1) to the Persian reader, but the original text
was emphasizing the indefinite nature of the library and not its count.

11. Be exact when choosing equivalents for technical words. Package is package.
Library is library. Don't mix and match.

12. Library names are not brand names, nor are they concepts, so no need to
transliterate. They are shorter namespaces so just use the English form.
For example we will transliterate 'Tokenizer' when it is used to indicate
a concept but not when it is the name of a library.

13. You need to surround the entire doc with a <div dir="rtl">, also separately
for lists. They render in LTR with the current version of Huggingface docs
if you don't do this and the preview will look ugly. Refer to our previously
submitted pages.

Also with function names that include ellipses you need to surround them with
a SPAN. i.e. <span dir="ltr">pipeline()</span>
These are stopgap measures for the preview to render correctly.

14. Put namespaces in backticks. Library names are shorter namespaces.

15. Persian sentences follow the format Subject-Object-Verb. That means it is
generally expected that the sentences end in a verb. Again there is a lot of
gray area here and we need to apply nuance. But try to keep that in mind.
For example maybe don't end sentences in "هستند یا نه". Maybe, bc if you
do, that is your creative choice and we respect that.

16. If a Persian equivalent for a word in the glossary has diacritics, they
are important and should be repeated in every occurrence of the word.
While we do want to facilitate our reader's access to the English source
material by introducing loanwords and transliterating where we can, we
also want to avoid confusion in the correct pronunciation of loanwords.
2 changes: 2 additions & 0 deletions chapters/fa/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@
sections:
- local: chapter2/1
title: مقدمه
- local: chapter2/2
title: پشت صحنه خط‌تولید

- title: واژه‌نامه
sections:
Expand Down
Loading

0 comments on commit dea95d5

Please sign in to comment.