Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add zh-CN machine translations. #385

Merged
merged 1 commit into from
Nov 28, 2022
Merged

Conversation

chenglu
Copy link
Contributor

@chenglu chenglu commented Nov 27, 2022

The goal of this PR is to add bilingual Chinese subtitles based on the accurate English subtitles (#384) with machine translation.

Why use bilingual subtitles for collaboration?

Since our localized version subtitles are done by machine translation, we may need to reference to the original (English) ones, and the bilingual subtitles might could help, since in this type of subtitle, we are putting the localized words on top and English on the bottom, so it will be easier for reviewers to review. And when we publish the localized ones to YouTube, we could add a new function to generate_subtitles.py to remove the English lines.

image

If we are OK with the bilingual subtitles plan, I could use the tools to generate other languages, too.

These zh-CN translations were also:

  • Replaced 您 to 你
  • Added space between English and Chinese
  • Corrected the translation of "Hugging Face"
  • Corrected part of Transformers translations

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Nov 27, 2022

The documentation is not available anymore as the PR was closed or merged.

Copy link
Member

@lewtun lewtun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for adding the zh-CN machine translations @chenglu 🚀 !

I like your idea of supporting a bilingual format and it would be quite easy to strip out the English parts before adding them to YouTube. Would you like to add that logic to generate_subtitles.py or prefer I do it?

@chenglu
Copy link
Contributor Author

chenglu commented Nov 28, 2022

Hi Lewis, thanks for your great support!

For adding the new function to the script, considering this is not kind of urgent work, I'd like to introduce Steven @tyisme614 to give it a try, buuut... if we're not finished this after all the zh-CN subtitles are reviewed, I'll have to ask your help to do it, WDYT? 🥺

@lewtun
Copy link
Member

lewtun commented Nov 28, 2022

Sounds great, so shall we merge this PR as-is and iterate on the corrections? (No preference on my side, just pick whatever is easiest for you :))

@chenglu
Copy link
Contributor Author

chenglu commented Nov 28, 2022

My vote goes to merge this PR now, since @xianbao and me are planning to organize the other Chinese language volunteers for the further collaboration, directly open PR to the main repo will be easier for us.
Thanks again for the great help, Lewis and Tiezhen!

@lewtun lewtun merged commit a1ed37b into huggingface:main Nov 28, 2022
@lewtun
Copy link
Member

lewtun commented Nov 28, 2022

Done and thank you for working on making the course accessible to Chinese speakers!

lewtun added a commit that referenced this pull request May 11, 2023
* Refactor tokenization of targets for transformers v4.22 (#316)

* Refactor tokenization of targets for transformers v4.22

* typo fix (#319)

line no.49 Changed _SQuAD_it-text.json_-> _SQuAD_it-test.json_

* [FR] Many corrections (#318)

* Fix URL to the Pile (#324)

* Fix URL to the Pile

* [RU] ch5  (#317)

* fix: book url (#323)

* zh-CN - Chapter 7,8,9finished (#315)

Co-authored-by: Lewis Tunstall <[email protected]>

* Refactor events (#261)

* Fix whole word masking labels (#326)

* Fix question answering indices (#327)

* Add translation checker (#329)

* [FR] Refactor events (#330)

* Translation Chapter 4 (#325)

* update author list (de) (#331)

* Fix Russian ToC (#332)

* Refactor dataset upload in Chapter 5 / section 5 (#334)

* Fix id2label types (#337)

* Fix keywords in de quiz chapter 3 (#338)

Noticed two `undefined` in the new render, because the `text` key was capitalized.

* Tweak course validator (#340)

* [Italian] Added Ch2/3 and Ch2/4 (#322)

* Completes chapter 1 (#341)

* Create 5.mdx and translate it into Japanese.

* Create 6.mdx and translate it into Japanese.

* done chapter1.2,1.3

* Create 4.mdx and translate it into Japanese.

* Slightly modified

* Slightly modified

* Slightly modified

* TF generation fixes (#344)

* Fixes to chapter 7

Co-authored-by: lewtun <[email protected]>

* i18n: ES - translate file chapter2/6.mdx (#346)

* Typo in russian translation (#349)

It should be "Обучающий цикл" not "Обучающий цикла"

* Remove translated string (#350)

* [It] Ch2/5, Ch2/6, Ch2/7 (#353)

* Add FAQ (#354)

* i18n: ES - translate file chapter2/7.mdx (#347)

* [id] Add translation to Bahasa Indonesia for chapter0 & some of chapter1 (#351)

* i18n: ES - chapter2/8.mdx (#352)

* Update 4.mdx based on the advice.

* [de] Translation Chapter 1 (#336)

* Update 1.mdx (#356)

* Update 1.mdx (#357)

* removed original english texts to open pull request

* removed original english texts to open pull request

* removed original english texts to open pull request

* add lines for chap1/4 to 6

* Slightly modified

* modify 2.mdx, 3.mdx

* modify _toctree.yml

* Update pr docs actions (#369)

* Add Python syntax highlighting (#370)

* [FR] Add FAQ and more (#367)


Co-authored-by: Lewis Tunstall <[email protected]>

* [RU] Chapter 6 (1/2) finished (#368)

* Spanish translation of Chapter 5 (#366)



Co-authored-by: Lewis Tunstall <[email protected]>

* Add Japanese trasnlation of chapter 1/ 7 to 10  (#359)

* Adding Portuguese Translation to Chapter3 (#361)

* make style

* Typo in Chapter 2, Section 2 (#364)

Replace "inputs" with "outputs".

* Revert "Update pr docs actions (#369)"

This reverts commit 44f77be.

* Typo (#374)

* Chapter 9 - Italian (#373)

* Fix notebook link (#378)

* docs: feat: chapter2-1 in Korean (#375)

Review by @lewtun 22/11/22
docs: fix: remove commented toc for future contributors

* Migrate Spaces URLs to new domain (#379)

* docs: feat: same links across languages (#380)

Added custom anchors using double square brackets, e.g. [[formatted-anchor]]

* Add video transcripts  (#150)

* docs: fix: Accurate for the origin (English) subtitles (#384)

* docs: i18n: add zh-CN machine translation (#385)

* [FR] Notebooks links (#386)

* Upgrade python version in the workflow (#402)

* Update README.md (#389)

Add that preview does not work with windows

* translated chapter2_1-3 (#392)

* fixes small typos (#397)

* Add Chap2/4.mdx and 5.mdx (#391)

Co-authored-by: 長澤春希 <[email protected]>

* created new script for converting bilingual captions to monolingual caption (#399)

* Add French YouTube videos transcription (#410)

* docs(zh-cn): Reviewed 56_data-processing-for-masked-language-modeling.srt (#400)

* docs(zh-cn): Reviewed 57_what-is-perplexity.srt (#401)

* reviewed ep.58 (#405)

* reviewed ep.59 (#406)

* docs(zh-cn): Reviewed 60_what-is-the-bleu-metric.srt (#407)

* finished review (#408)

* docs(zh-cn): Reviewed 61_data-processing-for-summarization.srt (#409)

* Fix subtitle - translation data processing (#411)

* [FR] Final PR (#412)

* [ko] Add chapter 8 translation (#417)

* docs(zh-cn): Reviewed 62_what-is-the-rouge-metric.srt (#419)

* finished review

* fixed errors in original english subtitle

* fixed errors (#420)

* docs(zh-cn): Reviewed 63_data-processing-for-causal-language-modeling.srt (#421)

* Update 63_data-processing-for-causal-language-modeling.srt

* finished review

* Update 63_data-processing-for-causal-language-modeling.srt

* docs(zh-cn): Reviewed 65_data-processing-for-question-answering.srt (#423)

* finished review

* finished review

* finished review (#422)

* Add Ko chapter2 2.mdx (#418)

* Add Ko chapter2 2.mdx

* [ko] Add chapter 8 translation (#417)

* docs(zh-cn): Reviewed 62_what-is-the-rouge-metric.srt (#419)

* finished review

* fixed errors in original english subtitle

* fixed errors (#420)

* docs(zh-cn): Reviewed 63_data-processing-for-causal-language-modeling.srt (#421)

* Update 63_data-processing-for-causal-language-modeling.srt

* finished review

* Update 63_data-processing-for-causal-language-modeling.srt

* docs(zh-cn): Reviewed 65_data-processing-for-question-answering.srt (#423)

* finished review

* finished review

* finished review (#422)

* Add Ko chapter2 2.mdx

Co-authored-by: IL-GU KIM <[email protected]>
Co-authored-by: Yuan <[email protected]>

* update textbook link (#427)

* Visual fixes (#428)

* finish first round review (#429)

* Fix French subtitles + refactor conversion script (#431)

* Fix subtitles and scripts

* Fix subtitle

* Add tokenizer to MLM Trainer (#432)

* Fix FR video descriptions (#433)

* Fix FR video descriptions

* Rename file

* Fix dead GPT model docs link. (#430)

* Translate into Korean: 2-3 (#434)

Co-authored-by: “Ryan” <“[email protected]”>

* Add korean translation of chapter5 (1,2) (#441)

update toctree for chapter 5 (1, 2)
ensure same title for 5-2
add updates from upstream English with custom anchors

Co-Authored-By: Minho Ryu <[email protected]>

Co-authored-by: Meta Learner응용개발팀 류민호 <[email protected]>
Co-authored-by: Minho Ryu <[email protected]>

* Update 3.mdx (#444)

* docs(zh-cn): Reviewed 67_the-post-processing-step-in-question-answering-(tensorflow).srt (#447)

* Update 67_the-post-processing-step-in-question-answering-(tensorflow).srt

* finished review

* docs(zh-cn): Reviewed 66_the-post-processing-step-in-question-answering-(pytorch).srt (#448)

* Update 66_the-post-processing-step-in-question-answering-(pytorch).srt

* finished review

* refined translation

* docs(zh-cn): Reviewed 01_the-pipeline-function.srt (#452)

* finish review

* Update subtitles/zh-CN/01_the-pipeline-function.srt

Co-authored-by: Luke Cheng <[email protected]>

Co-authored-by: Luke Cheng <[email protected]>

* finish review (#453)

* Revise some unnatural translations (#458)

Some unnatural translations have been revised to use expressions more popular with Chinese readers

* Fix chapter 5 links (#461)

* fix small typo (#460)

* Add Ko chapter2 3~8.mdx & Modify Ko chapter2 2.mdx typo (#446)

* Add captions for tasks videos (#464)

* Add captions for tasks videos

* Fix script

* [FR] Add 🤗  Tasks videos (#468)

* Synchronous Chinese course update

Update the Chinese Course document to
sha:f71cf6c3b4cb235bc75a14416c6e8a57fc3d00a7
sha date: 2023/01/06 00:02:26 UTC+8

* review sync

* Update 3.mdx

* format zh_CN

* format all mdx

* Remove temp folder

* finished review (#449)

* docs(zh-cn): Reviewed 31_navigating-the-model-hub.srt (#451)

* docs(zh-cn): Reviewed No. 08 - What happens inside the pipeline function? (PyTorch) (#454)

* docs(zh-cn): Reviewed 03_what-is-transfer-learning.srt (#457)

* docs(zh-cn): 32_managing-a-repo-on-the-model-hub.srt (#469)

* docs(zh-cn): Reviewed No. 10 - Instantiate a Transformers model (PyTorch) (#472)

* update Chinese translation

有一些英文句子与中文语序是相反的,我直接按照最终的中文语序排列了,这样是否可以?

* finish first round review

* finish second round review

* finish second round review

* branch commit

* Update subtitles/zh-CN/10_instantiate-a-transformers-model-(pytorch).srt

Co-authored-by: Luke Cheng <[email protected]>

* Update subtitles/zh-CN/10_instantiate-a-transformers-model-(pytorch).srt

Co-authored-by: Luke Cheng <[email protected]>

---------

Co-authored-by: Luke Cheng <[email protected]>

* docs(zh-cn): 33_the-push-to-hub-api-(pytorch).srt (#473)

* docs(zh-cn): Reviewed 34_the-push-to-hub-api-(tensorflow).srt (#479)

* running python utils/code_formatter.py

* review 05 cn translations

* review 06 cn translations

* Review No.11

* translate no.24

* review 06 cn translations

* review 07 cn translations

* Update 23_what-is-dynamic-padding.srt

* Update 23_what-is-dynamic-padding.srt

* Update 23_what-is-dynamic-padding.srt

* Update subtitles/zh-CN/23_what-is-dynamic-padding.srt

Co-authored-by: Luke Cheng <[email protected]>

* Update subtitles/zh-CN/23_what-is-dynamic-padding.srt

Co-authored-by: Luke Cheng <[email protected]>

* add blank

* Review No. 11, No. 12

* Review No. 13

* Review No. 12

* Review No. 14

* finished review

* optimized translation

* optimized translation

* docs(zh-cn): Reviewed No. 29 - Write your training loop in PyTorch

* Review 15

* Review 16

* Review 17

* Review 18

* Review ch 72 translation

* Update 72 cn translation

* To be reviewed No.42-No.54

* No.11 check-out

* No.12 check-out

* No. 13 14 check-out

* No. 15 16 check-out

* No. 17 18 check-out

* Add note for "token-*"

* Reviewed No.8, 9, 10

* Reviewed No.42

* Review No.43

* finished review

* optimized translation

* finished review

* optimized translation

* Review 44(need refine)

* Review 45(need refine)

* Review No. 46 (need refine)

* Review No.47

* Review No.46

* Review No.45

* Review No.44

* Review No.48

* Review No.49

* Review No.50

* Modify Ko chapter2 8.mdx (#465)

* Add Ko chapter2 2.mdx

* Add Ko chapter2 2.mdx

* Add Ko chapter2 3.mdx & 4.mdx

* Modify Ko chapter2 3.mdx & 4.mdx

* Modify Ko chapter2 3.mdx & 4.mdx

* Modify Ko chapter2 3.mdx & 4.mdx

* Modify _toctree.yml

* Add Ko chapter2 5.mdx

* Modify Ko chapter2 4.mdx

* Add doc-builder step

* Add Ko chapter2 6~8.mdx & Modify Ko chapter2 2.mdx typo

* Modify Ko _toctree.yml

* Modify Ko chapter2 8.mdx & README.md

* Fixed typo (#471)

* fixed subtitle errors (#474)

timestamp: 00:00:26,640 --> 00:00:28,620
modification: notification --> authentication

timestamp: 00:04:21,113 --> 00:04:22,923
modification: of --> or

* Fixed a typo (#475)

* Update 3.mdx (#526)

Fix typo

* [zh-TW] Added chapters 1-9 (#477)

The translation is based on Simplified Chinese version, converted via OpenCC and fixed some formatting issues.

* finished review

* Explain why there are more tokens, than reviews (#476)

* Explain why there are more tokens, than reviews

* Update chapters/en/chapter5/3.mdx

---------

Co-authored-by: lewtun <[email protected]>

* [RU] Subtitles for Chapter 1 of the video course (#489)

* Created a directory for the russian subtitles.

Created a folder for Russian subtitles for the video course and published a translation of the introductory video from chapter 1.

* Uploaded subtitles for chapter 1

Uploaded subtitles for the remaining videos for chapter 1 of the video course.

* Added subtitles for chapter 2 of the video course

Added STR subtitle files for the second chapter of the YouTube video course.

* Delete subtitles/ru directory

Removed the old translation. Incorrect timestamping.

* Create 00_welcome-to-the-hugging-face-course.srt

Create a directory and upload a subtitle file for the introductory video of the course.

* Add files via upload

Upload subtitle files for the first chapter of the course.

* Review No.52

* [ru] Added the glossary and translation guide (#490)

* Added the glossary and translation guide

* Fixed casing

* Minor fixes

* Updated glossary

* Glossary update

* Glossary update

* Glossary update

* [ru] Chapters 0 and 1 proofreading, updating and translating missing sections (#491)

* Chapter 0 proofreading

* Chapter 1 Section 1 proofreading
- Added new people from English version;
- Added links to creator's pages;
- Added FAQ translation;

* Chapter 1 Sections 2-5 proofreading

* Chapter 1 Sections 6-9 proofreading

* Final proofreading and added missing quiz section

* Minor spelling corrections

* Review No.51

* Review No.53

* Review No.54

* finished review

* modified translation

* modified translation

* modified subtitle

use the same text appeared in video

* translated

* Fix typo (#532)

* review chapter4/2

* review chapter4/2

* review chapter4/2

* Review 75

* Review No.20, need review some

* docs(zh-cn): Reviewed Chapter 7/1

* Update 1.mdx

* Review No.22

* Review No.21 (need refinement)

* Review No.30, need review: 26 27 28 30 73 74

* Review 30 (good)

* Review 20

* Review 21 (refine)

* Review 21

* Review 22

* Review 26

* Review 27

* Review 28

* Review 30

* Review 73

* Review 74

* Review 26-28, 42-54, 73-75

* Demo link fixes (#562)

* demo link fixes

* minor demo fix

---------

Co-authored-by: Aravind Kumar <[email protected]>
Co-authored-by: lbourdois <[email protected]>
Co-authored-by: Pavel <[email protected]>
Co-authored-by: buti1021 <[email protected]>
Co-authored-by: 1375626371 <[email protected]>
Co-authored-by: Fabrizio Damicelli <[email protected]>
Co-authored-by: Jesper Dramsch <[email protected]>
Co-authored-by: Acciaro Gennaro Daniele <[email protected]>
Co-authored-by: Caterina Bonan <[email protected]>
Co-authored-by: Haruki Nagasawa <[email protected]>
Co-authored-by: blackdoor571 <[email protected]>
Co-authored-by: Matt <[email protected]>
Co-authored-by: Angel Mendez <[email protected]>
Co-authored-by: Artem Vysotsky <[email protected]>
Co-authored-by: Gusti Adli Anshari <[email protected]>
Co-authored-by: Marcus Fraaß <[email protected]>
Co-authored-by: Christopher Akiki <[email protected]>
Co-authored-by: Mishig <[email protected]>
Co-authored-by: David Gilbertson <[email protected]>
Co-authored-by: Camilo Martínez Burgos <[email protected]>
Co-authored-by: Hiroaki Funayama <[email protected]>
Co-authored-by: Cesar0106 <[email protected]>
Co-authored-by: Younes Belkada <[email protected]>
Co-authored-by: Filippo Broggini <[email protected]>
Co-authored-by: Mishig <[email protected]>
Co-authored-by: Nanachi <[email protected]>
Co-authored-by: Edoardo Abati <[email protected]>
Co-authored-by: Wonhyeong Seo <[email protected]>
Co-authored-by: Luke Cheng <[email protected]>
Co-authored-by: xianbaoqian <[email protected]>
Co-authored-by: Thomas Simonini <[email protected]>
Co-authored-by: Subaru Kimura <[email protected]>
Co-authored-by: Carlos Santos Garcia <[email protected]>
Co-authored-by: 長澤春希 <[email protected]>
Co-authored-by: Yuan <[email protected]>
Co-authored-by: IL-GU KIM <[email protected]>
Co-authored-by: Kim Bo Geum <[email protected]>
Co-authored-by: Bartosz Szmelczynski <[email protected]>
Co-authored-by: Shawn Lee <[email protected]>
Co-authored-by: Naveen Reddy D <[email protected]>
Co-authored-by: rainmaker <[email protected]>
Co-authored-by: “Ryan” <“[email protected]”>
Co-authored-by: Meta Learner응용개발팀 류민호 <[email protected]>
Co-authored-by: Minho Ryu <[email protected]>
Co-authored-by: richardachen <[email protected]>
Co-authored-by: beyondguo <[email protected]>
Co-authored-by: bsenst <[email protected]>
Co-authored-by: 1375626371 <[email protected]>
Co-authored-by: yaoqih <[email protected]>
Co-authored-by: 李洋 <[email protected]>
Co-authored-by: PowerChina <[email protected]>
Co-authored-by: chenglu99 <[email protected]>
Co-authored-by: iCell <[email protected]>
Co-authored-by: Qi Zhang <[email protected]>
Co-authored-by: researcher <[email protected]>
Co-authored-by: simpleAI <[email protected]>
Co-authored-by: FYJNEVERFOLLOWS <[email protected]>
Co-authored-by: zhangchaosd <[email protected]>
Co-authored-by: TK Buristrakul <[email protected]>
Co-authored-by: Carlos Aguayo <[email protected]>
Co-authored-by: ateliershen <[email protected]>
Co-authored-by: Pavel Nesterov <[email protected]>
Co-authored-by: Artyom Boyko <[email protected]>
Co-authored-by: Kirill Milintsevich <[email protected]>
Co-authored-by: jybarnes21 <[email protected]>
Co-authored-by: gxy-gxy <[email protected]>
Co-authored-by: iLeGend <[email protected]>
Co-authored-by: Maria Khalusova <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants