Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chinese - Chapter 1 finished #113

Merged
merged 8 commits into from
Apr 13, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
173 changes: 173 additions & 0 deletions chapters/zh/_toctree.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
- title: 0. 准备
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove Chapter 0 from the table of contents since it hasn't been translated yet

sections:
- local: chapter0/1
title: 课程简介

- title: 1. Transformer 模型
sections:
- local: chapter1/1
title: 章节简介
- local: chapter1/2
title: 自然语言处理
- local: chapter1/3
title: Transformers能做什么?
- local: chapter1/4
title: Transformers 是如何工作的?
- local: chapter1/5
title: 编码器模型
- local: chapter1/6
title: 解码器模型
- local: chapter1/7
title: 序列到序列模型
- local: chapter1/8
title: 偏见和局限性
- local: chapter1/9
title: 总结
- local: chapter1/10
title: 章末小测验
quiz: 1

- title: 2. Using 🤗 Transformers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove all these sections that haven't been translated yet

sections:
- local: chapter2/1
title: Introduction
- local: chapter2/2
title: Behind the pipeline
- local: chapter2/3
title: Models
- local: chapter2/4
title: Tokenizers
- local: chapter2/5
title: Handling multiple sequences
- local: chapter2/6
title: Putting it all together
- local: chapter2/7
title: Basic usage completed!
- local: chapter2/8
title: End-of-chapter quiz
quiz: 2

- title: 3. Fine-tuning a pretrained model
sections:
- local: chapter3/1
title: Introduction
- local: chapter3/2
title: Processing the data
- local: chapter3/3
title: Fine-tuning a model with the Trainer API or Keras
local_fw: { pt: chapter3/3, tf: chapter3/3_tf }
- local: chapter3/4
title: A full training
- local: chapter3/5
title: Fine-tuning, Check!
- local: chapter3/6
title: End-of-chapter quiz
quiz: 3

- title: 4. Sharing models and tokenizers
sections:
- local: chapter4/1
title: The Hugging Face Hub
- local: chapter4/2
title: Using pretrained models
- local: chapter4/3
title: Sharing pretrained models
- local: chapter4/4
title: Building a model card
- local: chapter4/5
title: Part 1 completed!
- local: chapter4/6
title: End-of-chapter quiz
quiz: 4

- title: 5. The 🤗 Datasets library
sections:
- local: chapter5/1
title: Introduction
- local: chapter5/2
title: What if my dataset isn't on the Hub?
- local: chapter5/3
title: Time to slice and dice
- local: chapter5/4
title: Big data? 🤗 Datasets to the rescue!
- local: chapter5/5
title: Creating your own dataset
- local: chapter5/6
title: Semantic search with FAISS
- local: chapter5/7
title: 🤗 Datasets, check!
- local: chapter5/8
title: End-of-chapter quiz
quiz: 5

- title: 6. The 🤗 Tokenizers library
sections:
- local: chapter6/1
title: Introduction
- local: chapter6/2
title: Training a new tokenizer from an old one
- local: chapter6/3
title: Fast tokenizers' special powers
- local: chapter6/3b
title: Fast tokenizers in the QA pipeline
- local: chapter6/4
title: Normalization and pre-tokenization
- local: chapter6/5
title: Byte-Pair Encoding tokenization
- local: chapter6/6
title: WordPiece tokenization
- local: chapter6/7
title: Unigram tokenization
- local: chapter6/8
title: Building a tokenizer, block by block
- local: chapter6/9
title: Tokenizers, check!
- local: chapter6/10
title: End-of-chapter quiz
quiz: 6

- title: 7. Main NLP tasks
sections:
- local: chapter7/1
title: Introduction
- local: chapter7/2
title: Token classification
- local: chapter7/3
title: Fine-tuning a masked language model
- local: chapter7/4
title: Translation
- local: chapter7/5
title: Summarization
- local: chapter7/6
title: Training a causal language model from scratch
- local: chapter7/7
title: Question answering
- local: chapter7/8
title: Mastering NLP
- local: chapter7/9
title: End-of-chapter quiz
quiz: 7

- title: 8. How to ask for help
sections:
- local: chapter8/1
title: Introduction
- local: chapter8/2
title: What to do when you get an error
- local: chapter8/3
title: Asking for help on the forums
- local: chapter8/4
title: Debugging the training pipeline
local_fw: { pt: chapter8/4, tf: chapter8/4_tf }
- local: chapter8/5
title: How to write a good issue
- local: chapter8/6
title: Part 2 completed!
- local: chapter8/7
title: End-of-chapter quiz
quiz: 8

- title: Hugging Face Course Event
sections:
- local: event/1
title: Part 2 Release Event
52 changes: 52 additions & 0 deletions chapters/zh/chapter1/1.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# 简介

## 欢迎来到🤗课程

<Youtube id="00GKzGyWFEs" />

本课程将使用 Hugging Face 生态系统中的库——🤗 Transformers、🤗 Datasets、🤗 Tokenizers 和 🤗 Accelerate——以及 Hugging Face Hub 教你自然语言处理 (NLP)。它是完全免费的,并且没有广告。


## 有什么是值得期待的?

以下是课程的简要概述:

<div class="flex justify-center">
<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/summary.svg" alt="Brief overview of the chapters of the course."/>
<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/summary-dark.svg" alt="Brief overview of the chapters of the course."/>
</div>

- 第 1 章到第 4 章介绍了 🤗 Transformers 库的主要概念。在本课程的这一部分结束时,您将熟悉 Transformer 模型的工作原理,并将了解如何使用 [Hugging Face Hub](https://huggingface.co/models) 中的模型,在数据集上对其进行微调,并在 Hub 上分享您的结果。
- 第 5 章到第 8 章在深入研究经典 NLP 任务之前,教授 🤗 Datasets和 🤗 Tokenizers的基础知识。在本部分结束时,您将能够自己解决最常见的 NLP 问题。
- 第 9 章到第 12 章更加深入,探讨了如何使用 Transformer 模型处理语音处理和计算机视觉中的任务。在此过程中,您将学习如何构建和分享模型,并针对生产环境对其进行优化。在这部分结束时,您将准备好将🤗 Transformers 应用于(几乎)任何机器学习问题!

这个课程:

* 需要良好的 Python 知识
* 最好先学习深度学习入门课程,例如[DeepLearning.AI](https://www.deeplearning.ai/) 提供的 [fast.ai实用深度学习教程](https://course.fast.ai/)
* 不需要事先具备 [PyTorch](https://pytorch.org/) 或 [TensorFlow](https://www.tensorflow.org/) 知识,虽然熟悉其中任何一个都会对huggingface的学习有所帮助

完成本课程后,我们建议您查看 [DeepLearning.AI的自然语言处理系列课程](https://www.coursera.org/specializations/natural-language-processing?utm_source=deeplearning-ai&utm_medium=institutions&utm_campaign=20211011-nlp-2-hugging_face-page-nlp-refresh),其中涵盖了广泛的传统 NLP 模型,如朴素贝叶斯和 LSTM,这些模型非常值得了解!

## 我们是谁?

关于作者:

**Matthew Carrigan** 是 Hugging Face 的机器学习工程师。他住在爱尔兰都柏林,之前在 Parse.ly 担任机器学习工程师,在此之前,他在Trinity College Dublin担任博士后研究员。他不相信我们会通过扩展现有架构来实现 AGI,但无论如何都对机器人充满希望。

**Lysandre Debut** 是 Hugging Face 的机器学习工程师,从早期的开发阶段就一直致力于 🤗 Transformers 库。他的目标是通过使用非常简单的 API 开发工具,让每个人都可以使用 NLP。

**Sylvain Gugger** 是 Hugging Face 的一名研究工程师,也是 🤗Transformers库的核心维护者之一。此前,他是 fast.ai 的一名研究科学家,他与Jeremy Howard 共同编写了[Deep Learning for Coders with fastai and Py Torch](https://learning.oreilly.com/library/view/deep-learning-for/9781492045519/)。他的主要研究重点是通过设计和改进允许模型在有限资源上快速训练的技术,使深度学习更容易普及。

**Merve Noyan** 是 Hugging Face 的开发者倡导者,致力于开发工具并围绕它们构建内容,以使每个人的机器学习平民化。

**Lucile Saulnier** 是 Hugging Face 的机器学习工程师,负责开发和支持开源工具的使用。她还积极参与了自然语言处理领域的许多研究项目,例如协作训练和 BigScience。

**Lewis Tunstall** 是 Hugging Face 的机器学习工程师,专注于开发开源工具并使更广泛的社区可以使用它们。他也是即将出版的一本书[O’Reilly book on Transformers](https://www.oreilly.com/library/view/natural-language-processing/9781098103231/)的作者之一。

**Leandro von Werra** 是 Hugging Face 开源团队的机器学习工程师,也是即将出版的一本书[O’Reilly book on Transformers](https://www.oreilly.com/library/view/natural-language-processing/9781098103231/)的作者之一。他拥有多年的行业经验,通过在整个机器学习堆栈中工作,将 NLP 项目投入生产。

你准备好了吗?在本章中,您将学习:
* 如何使用 `pipeline()` 函数解决文本生成、分类等NLP任务
* 关于 Transformer 架构
* 如何区分编码器、解码器和编码器-解码器架构和用例
Loading