+
+注意,解码器块中的第一个注意力层关联到解码器的所有(过去的)输入,但是第二注意力层使用编码器的输出。因此,它可以访问整个输入句子,以最好地预测当前单词。这是非常有用的,因为不同的语言可以有语法规则将单词按不同的顺序排列,或者句子后面提供的一些上下文可能有助于确定给定单词的最佳翻译。
+
+也可以在编码器/解码器中使用*注意力遮罩层*,以防止模型注意某些特殊单词。例如,在批处理句子时,填充特殊词使所有句子的长度一致。
+
+## 架构与参数
+
+在本课程中,当我们深入探讨Transformers模型时,您将看到
+架构、参数和模型
+。 这些术语的含义略有不同:
+
+* **架构**: 这是模型的骨架 -- 每个层的定义以及模型中发生的每个操作。
+* **Checkpoints**: 这些是将在给架构中结构中加载的权重。
+* **模型**: 这是一个笼统的术语,没有“架构”或“参数”那么精确:它可以指两者。为了避免歧义,本课程使用将使用架构和参数。
+
+例如,BERT是一个架构,而 `bert-base-cased`, 这是谷歌团队为BERT的第一个版本训练的一组权重参数,是一个参数。我们可以说“BERT模型”和"`bert-base-cased`模型."
diff --git a/chapters/zh/chapter1/5.mdx b/chapters/zh/chapter1/5.mdx
new file mode 100644
index 000000000..7aa765ec2
--- /dev/null
+++ b/chapters/zh/chapter1/5.mdx
@@ -0,0 +1,17 @@
+# “编码器”模型
+
+
+
+“编码器”模型指仅使用编码器的Transformer模型。在每个阶段,注意力层都可以获取初始句子中的所有单词。这些模型通常具有“双向”注意力,被称为自编码模型。
+
+这些模型的预训练通常围绕着以某种方式破坏给定的句子(例如:通过随机遮盖其中的单词),并让模型寻找或重建给定的句子。
+
+“编码器”模型最适合于需要理解完整句子的任务,例如:句子分类、命名实体识别(以及更普遍的单词分类)和阅读理解后回答问题。
+
+该系列模型的典型代表有:
+
+- [ALBERT](https://huggingface.co/transformers/model_doc/albert.html)
+- [BERT](https://huggingface.co/transformers/model_doc/bert.html)
+- [DistilBERT](https://huggingface.co/transformers/model_doc/distilbert.html)
+- [ELECTRA](https://huggingface.co/transformers/model_doc/electra.html)
+- [RoBERTa](https://huggingface.co/transformers/model_doc/roberta.html)
diff --git a/chapters/zh/chapter1/6.mdx b/chapters/zh/chapter1/6.mdx
new file mode 100644
index 000000000..2de4c44a6
--- /dev/null
+++ b/chapters/zh/chapter1/6.mdx
@@ -0,0 +1,17 @@
+# “解码器”模型
+
+
+
+“解码器”模型通常指仅使用解码器的Transformer模型。在每个阶段,对于给定的单词,注意力层只能获取到句子中位于将要预测单词前面的单词。这些模型通常被称为自回归模型。
+
+“解码器”模型的预训练通常围绕预测句子中的下一个单词进行。
+
+这些模型最适合于涉及文本生成的任务。
+
+该系列模型的典型代表有:
+
+
+- [CTRL](https://huggingface.co/transformers/model_doc/ctrl.html)
+- [GPT](https://huggingface.co/transformers/model_doc/gpt.html)
+- [GPT-2](https://huggingface.co/transformers/model_doc/gpt2.html)
+- [Transformer XL](https://huggingface.co/transformers/model_doc/transformerxl.html)
diff --git a/chapters/zh/chapter1/7.mdx b/chapters/zh/chapter1/7.mdx
new file mode 100644
index 000000000..99dc00eea
--- /dev/null
+++ b/chapters/zh/chapter1/7.mdx
@@ -0,0 +1,16 @@
+# 序列到序列模型
+
+
+
+编码器-解码器模型(也称为序列到序列模型)同时使用Transformer架构的编码器和解码器两个部分。在每个阶段,编码器的注意力层可以访问初始句子中的所有单词,而解码器的注意力层只能访问位于输入中将要预测单词前面的单词。
+
+这些模型的预训练可以使用训练编码器或解码器模型的方式来完成,但通常涉及更复杂的内容。例如,[T5](https://huggingface.co/t5-base)通过将文本的随机跨度(可以包含多个单词)替换为单个特殊单词来进行预训练,然后目标是预测该掩码单词替换的文本。
+
+序列到序列模型最适合于围绕根据给定输入生成新句子的任务,如摘要、翻译或生成性问答。
+
+该系列模型的典型代表有:
+
+- [BART](https://huggingface.co/transformers/model_doc/bart.html)
+- [mBART](https://huggingface.co/transformers/model_doc/mbart.html)
+- [Marian](https://huggingface.co/transformers/model_doc/marian.html)
+- [T5](https://huggingface.co/transformers/model_doc/t5.html)
diff --git a/chapters/zh/chapter1/8.mdx b/chapters/zh/chapter1/8.mdx
new file mode 100644
index 000000000..707731892
--- /dev/null
+++ b/chapters/zh/chapter1/8.mdx
@@ -0,0 +1,31 @@
+# Bias and limitations
+
+
+
+如果您打算在正式的项目中使用经过预训练或经过微调的模型。请注意:虽然这些模型是很强大,但它们也有局限性。其中最大的一个问题是,为了对大量数据进行预训练,研究人员通常会搜集所有他们能找到的内容,中间可能夹带一些意识形态或者价值观的刻板印象。
+
+为了快速解释清楚这个问题,让我们回到一个使用BERT模型的pipeline的例子:
+
+```python
+from transformers import pipeline
+
+unmasker = pipeline("fill-mask", model="bert-base-uncased")
+result = unmasker("This man works as a [MASK].")
+print([r["token_str"] for r in result])
+
+result = unmasker("This woman works as a [MASK].")
+print([r["token_str"] for r in result])
+```
+
+```python out
+['lawyer', 'carpenter', 'doctor', 'waiter', 'mechanic']
+['nurse', 'waitress', 'teacher', 'maid', 'prostitute']
+```
+当要求模型填写这两句话中缺少的单词时,模型给出的答案中,只有一个与性别无关(服务员/女服务员)。其他职业通常与某一特定性别相关,妓女最终进入了模型中与“女人”和“工作”相关的前五位。尽管BERT是使用经过筛选和清洗后,明显中立的数据集上建立的的Transformer模型,而不是通过从互联网上搜集数据(它是在[Wikipedia 英文](https://huggingface.co/datasets/wikipedia)和[BookCorpus](https://huggingface.co/datasets/bookcorpus)数据集)。
+
+因此,当您使用这些工具时,您需要记住,使用的原始模型的时候,很容易生成性别歧视、种族主义或恐同内容。这种固有偏见不会随着微调模型而使消失。
\ No newline at end of file
diff --git a/chapters/zh/chapter1/9.mdx b/chapters/zh/chapter1/9.mdx
new file mode 100644
index 000000000..16c5ab6ad
--- /dev/null
+++ b/chapters/zh/chapter1/9.mdx
@@ -0,0 +1,11 @@
+# 总结
+
+在本章中,您了解了如何使用来自🤗Transformers的函数pipeline()处理不同的NLP任务。您还了解了如何在模型中心(hub)中搜索和使用模型,以及如何使用推理API直接在浏览器中测试模型。
+
+我们讨论了Transformer模型如何在应用层上工作,并讨论了迁移学习和微调的重要性。您可以使用完整的体系结构,也可以仅使用编码器或解码器,具体取决于您要解决的任务类型。下表总结了这一点:
+
+| 模型 | 示例 | 任务|
+| ---- | ---- |----|
+| 编码器 | ALBERT, BERT, DistilBERT, ELECTRA, RoBERTa |句子分类、命名实体识别、从文本中提取答案|
+| 解码器 | CTRL, GPT, GPT-2, Transformer XL |文本生成|
+| 编码器-解码器 | BART, T5, Marian, mBART |文本摘要、翻译、生成问题的回答|
\ No newline at end of file
From 8ec6fd3680456717dab3a75115c49507c89adfd1 Mon Sep 17 00:00:00 2001
From: 1375626371 <40328311+1375626371@users.noreply.github.com>
Date: Tue, 12 Apr 2022 21:26:13 +0800
Subject: [PATCH 2/6] Add zh to the languages field
Add zh to the languages field in the build_documentation.yml and build_pr_documentation.yml files
---
.github/workflows/build_documentation.yml | 2 +-
.github/workflows/build_pr_documentation.yml | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/.github/workflows/build_documentation.yml b/.github/workflows/build_documentation.yml
index a03061dc8..8e0f0b7e9 100644
--- a/.github/workflows/build_documentation.yml
+++ b/.github/workflows/build_documentation.yml
@@ -14,6 +14,6 @@ jobs:
package: course
path_to_docs: course/chapters/
additional_args: --not_python_module
- languages: ar bn en es fa fr he ko pt ru th tr
+ languages: ar bn en es fa fr he ko pt ru th tr zh
secrets:
token: ${{ secrets.HUGGINGFACE_PUSH }}
\ No newline at end of file
diff --git a/.github/workflows/build_pr_documentation.yml b/.github/workflows/build_pr_documentation.yml
index 84af57f75..ff5039db9 100644
--- a/.github/workflows/build_pr_documentation.yml
+++ b/.github/workflows/build_pr_documentation.yml
@@ -16,5 +16,5 @@ jobs:
package: course
path_to_docs: course/chapters/
additional_args: --not_python_module
- languages: ar bn en es fa fr he ko pt ru th tr
+ languages: ar bn en es fa fr he ko pt ru th tr zh
hub_base_path: https://moon-ci-docs.huggingface.co/course
From b931926f84a95f24cc21d81efec548f9d5e7957d Mon Sep 17 00:00:00 2001
From: 1375626371 <40328311+1375626371@users.noreply.github.com>
Date: Tue, 12 Apr 2022 21:32:03 +0800
Subject: [PATCH 3/6] Remove untranslated chapters in _toctree.yml
Remove all these sections that haven't been translated yet
Remove Chapter 0 from the table of contents since it hasn't been translated yet
---
chapters/zh/_toctree.yml | 152 +--------------------------------------
1 file changed, 1 insertion(+), 151 deletions(-)
diff --git a/chapters/zh/_toctree.yml b/chapters/zh/_toctree.yml
index 298de22ab..453fc5e99 100644
--- a/chapters/zh/_toctree.yml
+++ b/chapters/zh/_toctree.yml
@@ -1,8 +1,3 @@
-- title: 0. 准备
- sections:
- - local: chapter0/1
- title: 课程简介
-
- title: 1. Transformer 模型
sections:
- local: chapter1/1
@@ -25,149 +20,4 @@
title: 总结
- local: chapter1/10
title: 章末小测验
- quiz: 1
-
-- title: 2. Using 🤗 Transformers
- sections:
- - local: chapter2/1
- title: Introduction
- - local: chapter2/2
- title: Behind the pipeline
- - local: chapter2/3
- title: Models
- - local: chapter2/4
- title: Tokenizers
- - local: chapter2/5
- title: Handling multiple sequences
- - local: chapter2/6
- title: Putting it all together
- - local: chapter2/7
- title: Basic usage completed!
- - local: chapter2/8
- title: End-of-chapter quiz
- quiz: 2
-
-- title: 3. Fine-tuning a pretrained model
- sections:
- - local: chapter3/1
- title: Introduction
- - local: chapter3/2
- title: Processing the data
- - local: chapter3/3
- title: Fine-tuning a model with the Trainer API or Keras
- local_fw: { pt: chapter3/3, tf: chapter3/3_tf }
- - local: chapter3/4
- title: A full training
- - local: chapter3/5
- title: Fine-tuning, Check!
- - local: chapter3/6
- title: End-of-chapter quiz
- quiz: 3
-
-- title: 4. Sharing models and tokenizers
- sections:
- - local: chapter4/1
- title: The Hugging Face Hub
- - local: chapter4/2
- title: Using pretrained models
- - local: chapter4/3
- title: Sharing pretrained models
- - local: chapter4/4
- title: Building a model card
- - local: chapter4/5
- title: Part 1 completed!
- - local: chapter4/6
- title: End-of-chapter quiz
- quiz: 4
-
-- title: 5. The 🤗 Datasets library
- sections:
- - local: chapter5/1
- title: Introduction
- - local: chapter5/2
- title: What if my dataset isn't on the Hub?
- - local: chapter5/3
- title: Time to slice and dice
- - local: chapter5/4
- title: Big data? 🤗 Datasets to the rescue!
- - local: chapter5/5
- title: Creating your own dataset
- - local: chapter5/6
- title: Semantic search with FAISS
- - local: chapter5/7
- title: 🤗 Datasets, check!
- - local: chapter5/8
- title: End-of-chapter quiz
- quiz: 5
-
-- title: 6. The 🤗 Tokenizers library
- sections:
- - local: chapter6/1
- title: Introduction
- - local: chapter6/2
- title: Training a new tokenizer from an old one
- - local: chapter6/3
- title: Fast tokenizers' special powers
- - local: chapter6/3b
- title: Fast tokenizers in the QA pipeline
- - local: chapter6/4
- title: Normalization and pre-tokenization
- - local: chapter6/5
- title: Byte-Pair Encoding tokenization
- - local: chapter6/6
- title: WordPiece tokenization
- - local: chapter6/7
- title: Unigram tokenization
- - local: chapter6/8
- title: Building a tokenizer, block by block
- - local: chapter6/9
- title: Tokenizers, check!
- - local: chapter6/10
- title: End-of-chapter quiz
- quiz: 6
-
-- title: 7. Main NLP tasks
- sections:
- - local: chapter7/1
- title: Introduction
- - local: chapter7/2
- title: Token classification
- - local: chapter7/3
- title: Fine-tuning a masked language model
- - local: chapter7/4
- title: Translation
- - local: chapter7/5
- title: Summarization
- - local: chapter7/6
- title: Training a causal language model from scratch
- - local: chapter7/7
- title: Question answering
- - local: chapter7/8
- title: Mastering NLP
- - local: chapter7/9
- title: End-of-chapter quiz
- quiz: 7
-
-- title: 8. How to ask for help
- sections:
- - local: chapter8/1
- title: Introduction
- - local: chapter8/2
- title: What to do when you get an error
- - local: chapter8/3
- title: Asking for help on the forums
- - local: chapter8/4
- title: Debugging the training pipeline
- local_fw: { pt: chapter8/4, tf: chapter8/4_tf }
- - local: chapter8/5
- title: How to write a good issue
- - local: chapter8/6
- title: Part 2 completed!
- - local: chapter8/7
- title: End-of-chapter quiz
- quiz: 8
-
-- title: Hugging Face Course Event
- sections:
- - local: event/1
- title: Part 2 Release Event
+ quiz: 1
\ No newline at end of file
From 0778f36a829715813a7ce45f552cc5ab844715c3 Mon Sep 17 00:00:00 2001
From: 1375626371 <40328311+1375626371@users.noreply.github.com>
Date: Wed, 13 Apr 2022 01:17:40 +0800
Subject: [PATCH 4/6] Fixed an error in the translation format
Fixed an error in the translation format of Chapter 1, Section 3
---
chapters/zh/chapter1/3.mdx | 607 ++++++++++++++++++++-----------------
1 file changed, 327 insertions(+), 280 deletions(-)
diff --git a/chapters/zh/chapter1/3.mdx b/chapters/zh/chapter1/3.mdx
index 1f067ab6d..cd6aee466 100644
--- a/chapters/zh/chapter1/3.mdx
+++ b/chapters/zh/chapter1/3.mdx
@@ -1,280 +1,327 @@
-# Transformers能做什么?
-
-
-
-在本节中,我们将看看 Transformer 模型可以做什么,并使用 🤗 Transformers 库中的第一个工具:pipeline() 函数。
-
-## Transformer被应用于各个方面!
-Transformer 模型用于解决各种 NLP 任务,就像上一节中提到的那样。以下是一些使用 Hugging Face 和 Transformer 模型的公司和组织,他们也通过分享他们的模型回馈社区:
-
-![使用 Hugging Face 的公司](https://huggingface.co/course/static/chapter1/companies.PNG)
-[🤗 Transformers 库](https://github.com/huggingface/transformers)提供了创建和使用这些共享模型的功能。[模型中心(hub)](https://huggingface.co/models)包含数千个任何人都可以下载和使用的预训练模型。您还可以将自己的模型上传到 Hub!
-
-```python
-⚠️ Hugging Face Hub 不限于 Transformer 模型。任何人都可以分享他们想要的任何类型的模型或数据集!创建一个 Huggingface.co 帐户(https://huggingface.co/join)以使用所有可用功能!
-```
-
-在深入研究 Transformer 模型的底层工作原理之前,让我们先看几个示例,看看它们如何用于解决一些有趣的 NLP 问题。
-
-## 使用pipelines
-
-
-(这里有一个视频,但是国内可能打不开,译者注)
-
-
-🤗 Transformers 库中最基本的对象是 **pipeline()** 函数。它将模型与其必要的预处理和后处理步骤连接起来,使我们能够通过直接输入任何文本并获得最终的答案:
-
-```python
-from transformers import pipeline
-classifier = pipeline("sentiment-analysis")
-classifier("I've been waiting for a HuggingFace course my whole life.")
-```
-```python
-[{'label': 'POSITIVE', 'score': 0.9598047137260437}]
-```
-
-
-我们也可以多传几句!
-```python
-classifier(
- ["I've been waiting for a HuggingFace course my whole life.", "I hate this so much!"]
-)
-```
-```python
-[{'label': 'POSITIVE', 'score': 0.9598047137260437},
- {'label': 'NEGATIVE', 'score': 0.9994558095932007}]
-```
-默认情况下,此pipeline选择一个特定的预训练模型,该模型已针对英语情感分析进行了微调。创建**分类器**对象时,将下载并缓存模型。如果您重新运行该命令,则将使用缓存的模型,无需再次下载模型。
-
-将一些文本传递到pipeline时涉及三个主要步骤:
-
-1. 文本被预处理为模型可以理解的格式。
-2. 预处理的输入被传递给模型。
-3. 模型处理后输出最终人类可以理解的结果。
-
-目前[可用的一些pipeline](https://huggingface.co/transformers/main_classes/pipelines.html)是:
-
-* **特征提取**(获取文本的向量表示)
-* **填充空缺**
-* **ner**(命名实体识别)
-* **问答**
-* **情感分析**
-* **文本摘要**
-* **文本生成**
-* **翻译**
-* **零样本分类**
-
-让我们来看看其中的一些吧!
-
-## 零样本分类
-我们将首先处理一项非常具挑战性的任务,我们需要对尚未标记的文本进行分类。这是实际项目中的常见场景,因为注释文本通常很耗时并且需要领域专业知识。对于这项任务**zero-shot-classification**pipeline非常强大:它允许您直接指定用于分类的标签,因此您不必依赖预训练模型的标签。下面的模型展示了如何使用这两个标签将句子分类为正面或负面——但也可以使用您喜欢的任何其他标签集对文本进行分类。
-
-```python
-from transformers import pipeline
-
-classifier = pipeline("zero-shot-classification")
-classifier(
- "This is a course about the Transformers library",
- candidate_labels=["education", "politics", "business"],
-)
-```
-```python
-{'sequence': 'This is a course about the Transformers library',
- 'labels': ['education', 'business', 'politics'],
- 'scores': [0.8445963859558105, 0.111976258456707, 0.043427448719739914]}
-```
-
-此pipeline称为zero-shot,因为您不需要对数据上的模型进行微调即可使用它。它可以直接返回您想要的任何标签列表的概率分数!
-```
-✏️快来试试吧!使用您自己的序列和标签,看看模型的行为。
-```
-
-## 文本生成
-现在让我们看看如何使用pipeline来生成一些文本。这里的主要使用方法是您提供一个提示,模型将通过生成剩余的文本来自动完成整段话。这类似于许多手机上的预测文本功能。文本生成涉及随机性,因此如果您没有得到相同的如下所示的结果,这是正常的。
-
-```python
-from transformers import pipeline
-
-generator = pipeline("text-generation")
-generator("In this course, we will teach you how to")
-```
-```python
-[{'generated_text': 'In this course, we will teach you how to understand and use '
- 'data flow and data interchange when handling user data. We '
- 'will be working with one or more of the most commonly used '
- 'data flows — data flows of various types, as seen by the '
- 'HTTP'}]
-```
-您可以使用参数 **num_return_sequences** 控制生成多少个不同的序列,并使用参数 **max_length** 控制输出文本的总长度。
-
-```
-✏️快来试试吧!使用 num_return_sequences 和 max_length 参数生成两个句子,每个句子 15 个单词。
-```
-
-## 在pipeline中使用 Hub 中的其他模型
-前面的示例使用了默认模型,但您也可以从 Hub 中选择特定模型以在特定任务的pipeline中使用 - 例如,文本生成。转到[模型中心(hub)](https://huggingface.co/models)并单击左侧的相应标签将会只显示该任务支持的模型。[例如这样](https://huggingface.co/models?pipeline_tag=text-generation)。
-
-让我们试试 [**distilgpt2**](https://huggingface.co/distilgpt2) 模型吧!以下是如何在与以前相同的pipeline中加载它:
-
-```python
-from transformers import pipeline
-
-generator = pipeline("text-generation", model="distilgpt2")
-generator(
- "In this course, we will teach you how to",
- max_length=30,
- num_return_sequences=2,
-)
-```
-```python
-[{'generated_text': 'In this course, we will teach you how to manipulate the world and '
- 'move your mental and physical capabilities to your advantage.'},
- {'generated_text': 'In this course, we will teach you how to become an expert and '
- 'practice realtime, and with a hands on experience on both real '
- 'time and real'}]
-```
-您可以通过单击语言标签来筛选搜索结果,然后选择另一种文本生成模型的模型。模型中心(hub)甚至包含支持多种语言的多语言模型。
-
-通过单击选择模型后,您会看到有一个小组件,可让您直接在线试用。通过这种方式,您可以在下载之前快速测试模型的功能。
-```
-✏️快来试试吧!使用标签筛选查找另一种语言的文本生成模型。使用小组件测试并在pipeline中使用它!
-```
-
-## 推理 API
-所有模型都可以使用 Inference API 直接通过浏览器进行测试,该 API 可在 [Hugging Face 网站](https://huggingface.co/)上找到。通过输入自定义文本并观察模型的输出,您可以直接在此页面上使用模型。
-
-小组件形式的推理 API 也可作为付费产品使用,如果您的工作流程需要它,它会派上用场。有关更多详细信息,请参阅[定价页面](https://huggingface.co/pricing)。
-
-## Mask filling
-您将尝试的下一个pipeline是 **fill-mask**。此任务的想法是填充给定文本中的空白:
-```python
-from transformers import pipeline
-
-unmasker = pipeline("fill-mask")
-unmasker("This course will teach you all about models.", top_k=2)
-```
-```python
-[{'sequence': 'This course will teach you all about mathematical models.',
- 'score': 0.19619831442832947,
- 'token': 30412,
- 'token_str': ' mathematical'},
- {'sequence': 'This course will teach you all about computational models.',
- 'score': 0.04052725434303284,
- 'token': 38163,
- 'token_str': ' computational'}]
-```
-**top_k** 参数控制要显示的结果有多少种。请注意,这里模型填充了特殊的< **mask** >词,它通常被称为掩码标记。其他掩码填充模型可能有不同的掩码标记,因此在探索其他模型时要验证正确的掩码字是什么。检查它的一种方法是查看小组件中使用的掩码。
-
-```
-✏️快来试试吧!在 Hub 上搜索基于 bert 的模型并在推理 API 小组件中找到它的掩码。这个模型对上面pipeline示例中的句子预测了什么?
-```
-
-## 命名实体识别
-命名实体识别 (NER) 是一项任务,其中模型必须找到输入文本的哪些部分对应于诸如人员、位置或组织之类的实体。让我们看一个例子:
-```python
-from transformers import pipeline
-
-ner = pipeline("ner", grouped_entities=True)
-ner("My name is Sylvain and I work at Hugging Face in Brooklyn.")
-```
-```python
-[{'entity_group': 'PER', 'score': 0.99816, 'word': 'Sylvain', 'start': 11, 'end': 18},
- {'entity_group': 'ORG', 'score': 0.97960, 'word': 'Hugging Face', 'start': 33, 'end': 45},
- {'entity_group': 'LOC', 'score': 0.99321, 'word': 'Brooklyn', 'start': 49, 'end': 57}
-]
-```
-在这里,模型正确地识别出 Sylvain 是一个人 (PER),Hugging Face 是一个组织 (ORG),而布鲁克林是一个位置 (LOC)。
-
-我们在pipeline创建函数中传递选项 **grouped_entities=True** 以告诉pipeline将对应于同一实体的句子部分重新组合在一起:这里模型正确地将“Hugging”和“Face”分组为一个组织,即使名称由多个词组成。事实上,正如我们即将在下一章看到的,预处理甚至会将一些单词分成更小的部分。例如,**Sylvain** 分割为了四部分:**S、##yl、##va** 和 **##in**。在后处理步骤中,pipeline成功地重新组合了这些部分。
-
-```
-✏️快来试试吧!在模型中心(hub)搜索能够用英语进行词性标注(通常缩写为 POS)的模型。这个模型对上面例子中的句子预测了什么?
-```
-
-## 问答系统
-问答pipeline使用来自给定上下文的信息回答问题:
-```python
-from transformers import pipeline
-
-question_answerer = pipeline("question-answering")
-question_answerer(
- question="Where do I work?",
- context="My name is Sylvain and I work at Hugging Face in Brooklyn",
-)
-
-```
-```python
-{'score': 0.6385916471481323, 'start': 33, 'end': 45, 'answer': 'Hugging Face'}
-klyn",
-)
-
-```
-请注意,此pipeline通过从提供的上下文中提取信息来工作;它不会凭空生成答案。
-
-## 文本摘要
-文本摘要是将文本缩减为较短文本的任务,同时保留文本中的主要(重要)信息。下面是一个例子:
-
-```python
-from transformers import pipeline
-
-summarizer = pipeline("summarization")
-summarizer(
- """
- America has changed dramatically during recent years. Not only has the number of
- graduates in traditional engineering disciplines such as mechanical, civil,
- electrical, chemical, and aeronautical engineering declined, but in most of
- the premier American universities engineering curricula now concentrate on
- and encourage largely the study of engineering science. As a result, there
- are declining offerings in engineering subjects dealing with infrastructure,
- the environment, and related issues, and greater concentration on high
- technology subjects, largely supporting increasingly complex scientific
- developments. While the latter is important, it should not be at the expense
- of more traditional engineering.
-
- Rapidly developing economies such as China and India, as well as other
- industrial countries in Europe and Asia, continue to encourage and advance
- the teaching of engineering. Both China and India, respectively, graduate
- six and eight times as many traditional engineers as does the United States.
- Other industrial countries at minimum maintain their output, while America
- suffers an increasingly serious decline in the number of engineering graduates
- and a lack of well-educated engineers.
-"""
-)
-```
-```python
-[{'summary_text': ' America has changed dramatically during recent years . The '
- 'number of engineering graduates in the U.S. has declined in '
- 'traditional engineering disciplines such as mechanical, civil '
- ', electrical, chemical, and aeronautical engineering . Rapidly '
- 'developing economies such as China and India, as well as other '
- 'industrial countries in Europe and Asia, continue to encourage '
- 'and advance engineering .'}]
-```
-与文本生成一样,您指定结果的 **max_length** 或 **min_length**。
-
-## 翻译
-对于翻译,如果您在任务名称中提供语言对(例如“**translation_en_to_fr**”),则可以使用默认模型,但最简单的方法是在[模型中心(hub)](https://huggingface.co/models)选择要使用的模型。在这里,我们将尝试从法语翻译成英语:
-
-```python
-from transformers import pipeline
-
-translator = pipeline("translation", model="Helsinki-NLP/opus-mt-fr-en")
-translator("Ce cours est produit par Hugging Face.")
-```
-```python
-[{'translation_text': 'This course is produced by Hugging Face.'}]
-
-```
-
-与文本生成和摘要一样,您可以指定结果的 **max_length** 或 **min_length**。
-
-```
-✏️快来试试吧!搜索其他语言的翻译模型,尝试将前一句翻译成几种不同的语言。
-```
-
-到目前为止显示的pipeline主要用于演示目的。它们是为特定任务而编程的,不能对他们进行自定义的修改。在下一章中,您将了解 **pipeline()** 函数内部的内容以及如何进行自定义的修改。
\ No newline at end of file
+# Transformers, what can they do?
+
+
+
+In this section, we will look at what Transformer models can do and use our first tool from the 🤗 Transformers library: the `pipeline()` function.
+
+
+👀 See that Open in Colab button on the top right? Click on it to open a Google Colab notebook with all the code samples of this section. This button will be present in any section containing code examples.
+
+If you want to run the examples locally, we recommend taking a look at the setup.
+
+
+## Transformers are everywhere!
+
+Transformer models are used to solve all kinds of NLP tasks, like the ones mentioned in the previous section. Here are some of the companies and organizations using Hugging Face and Transformer models, who also contribute back to the community by sharing their models:
+
+
+
+The [🤗 Transformers library](https://github.com/huggingface/transformers) provides the functionality to create and use those shared models. The [Model Hub](https://huggingface.co/models) contains thousands of pretrained models that anyone can download and use. You can also upload your own models to the Hub!
+
+
+⚠️ The Hugging Face Hub is not limited to Transformer models. Anyone can share any kind of models or datasets they want! Create a huggingface.co account to benefit from all available features!
+
+
+Before diving into how Transformer models work under the hood, let's look at a few examples of how they can be used to solve some interesting NLP problems.
+
+## Working with pipelines
+
+
+
+The most basic object in the 🤗 Transformers library is the `pipeline()` function. It connects a model with its necessary preprocessing and postprocessing steps, allowing us to directly input any text and get an intelligible answer:
+
+```python
+from transformers import pipeline
+
+classifier = pipeline("sentiment-analysis")
+classifier("I've been waiting for a HuggingFace course my whole life.")
+```
+
+```python out
+[{'label': 'POSITIVE', 'score': 0.9598047137260437}]
+```
+
+We can even pass several sentences!
+
+```python
+classifier(
+ ["I've been waiting for a HuggingFace course my whole life.", "I hate this so much!"]
+)
+```
+
+```python out
+[{'label': 'POSITIVE', 'score': 0.9598047137260437},
+ {'label': 'NEGATIVE', 'score': 0.9994558095932007}]
+```
+
+By default, this pipeline selects a particular pretrained model that has been fine-tuned for sentiment analysis in English. The model is downloaded and cached when you create the `classifier` object. If you rerun the command, the cached model will be used instead and there is no need to download the model again.
+
+There are three main steps involved when you pass some text to a pipeline:
+
+1. The text is preprocessed into a format the model can understand.
+2. The preprocessed inputs are passed to the model.
+3. The predictions of the model are post-processed, so you can make sense of them.
+
+
+Some of the currently [available pipelines](https://huggingface.co/transformers/main_classes/pipelines.html) are:
+
+- `feature-extraction` (get the vector representation of a text)
+- `fill-mask`
+- `ner` (named entity recognition)
+- `question-answering`
+- `sentiment-analysis`
+- `summarization`
+- `text-generation`
+- `translation`
+- `zero-shot-classification`
+
+Let's have a look at a few of these!
+
+## Zero-shot classification
+
+We'll start by tackling a more challenging task where we need to classify texts that haven't been labelled. This is a common scenario in real-world projects because annotating text is usually time-consuming and requires domain expertise. For this use case, the `zero-shot-classification` pipeline is very powerful: it allows you to specify which labels to use for the classification, so you don't have to rely on the labels of the pretrained model. You've already seen how the model can classify a sentence as positive or negative using those two labels — but it can also classify the text using any other set of labels you like.
+
+```python
+from transformers import pipeline
+
+classifier = pipeline("zero-shot-classification")
+classifier(
+ "This is a course about the Transformers library",
+ candidate_labels=["education", "politics", "business"],
+)
+```
+
+```python out
+{'sequence': 'This is a course about the Transformers library',
+ 'labels': ['education', 'business', 'politics'],
+ 'scores': [0.8445963859558105, 0.111976258456707, 0.043427448719739914]}
+```
+
+This pipeline is called _zero-shot_ because you don't need to fine-tune the model on your data to use it. It can directly return probability scores for any list of labels you want!
+
+
+
+✏️ **Try it out!** Play around with your own sequences and labels and see how the model behaves.
+
+
+
+
+## Text generation
+
+Now let's see how to use a pipeline to generate some text. The main idea here is that you provide a prompt and the model will auto-complete it by generating the remaining text. This is similar to the predictive text feature that is found on many phones. Text generation involves randomness, so it's normal if you don't get the same results as shown below.
+
+```python
+from transformers import pipeline
+
+generator = pipeline("text-generation")
+generator("In this course, we will teach you how to")
+```
+
+```python out
+[{'generated_text': 'In this course, we will teach you how to understand and use '
+ 'data flow and data interchange when handling user data. We '
+ 'will be working with one or more of the most commonly used '
+ 'data flows — data flows of various types, as seen by the '
+ 'HTTP'}]
+```
+
+You can control how many different sequences are generated with the argument `num_return_sequences` and the total length of the output text with the argument `max_length`.
+
+
+
+✏️ **Try it out!** Use the `num_return_sequences` and `max_length` arguments to generate two sentences of 15 words each.
+
+
+
+
+## Using any model from the Hub in a pipeline
+
+The previous examples used the default model for the task at hand, but you can also choose a particular model from the Hub to use in a pipeline for a specific task — say, text generation. Go to the [Model Hub](https://huggingface.co/models) and click on the corresponding tag on the left to display only the supported models for that task. You should get to a page like [this one](https://huggingface.co/models?pipeline_tag=text-generation).
+
+Let's try the [`distilgpt2`](https://huggingface.co/distilgpt2) model! Here's how to load it in the same pipeline as before:
+
+```python
+from transformers import pipeline
+
+generator = pipeline("text-generation", model="distilgpt2")
+generator(
+ "In this course, we will teach you how to", max_length=30, num_return_sequences=2,
+)
+```
+
+```python out
+[{'generated_text': 'In this course, we will teach you how to manipulate the world and '
+ 'move your mental and physical capabilities to your advantage.'},
+ {'generated_text': 'In this course, we will teach you how to become an expert and '
+ 'practice realtime, and with a hands on experience on both real '
+ 'time and real'}]
+```
+
+You can refine your search for a model by clicking on the language tags, and pick a model that will generate text in another language. The Model Hub even contains checkpoints for multilingual models that support several languages.
+
+Once you select a model by clicking on it, you'll see that there is a widget enabling you to try it directly online. This way you can quickly test the model's capabilities before downloading it.
+
+
+
+✏️ **Try it out!** Use the filters to find a text generation model for another language. Feel free to play with the widget and use it in a pipeline!
+
+
+
+### The Inference API
+
+All the models can be tested directly through your browser using the Inference API, which is available on the Hugging Face [website](https://huggingface.co/). You can play with the model directly on this page by inputting custom text and watching the model process the input data.
+
+The Inference API that powers the widget is also available as a paid product, which comes in handy if you need it for your workflows. See the [pricing page](https://huggingface.co/pricing) for more details.
+
+## Mask filling
+
+The next pipeline you'll try is `fill-mask`. The idea of this task is to fill in the blanks in a given text:
+
+```python
+from transformers import pipeline
+
+unmasker = pipeline("fill-mask")
+unmasker("This course will teach you all about models.", top_k=2)
+```
+
+```python out
+[{'sequence': 'This course will teach you all about mathematical models.',
+ 'score': 0.19619831442832947,
+ 'token': 30412,
+ 'token_str': ' mathematical'},
+ {'sequence': 'This course will teach you all about computational models.',
+ 'score': 0.04052725434303284,
+ 'token': 38163,
+ 'token_str': ' computational'}]
+```
+
+The `top_k` argument controls how many possibilities you want to be displayed. Note that here the model fills in the special `` word, which is often referred to as a *mask token*. Other mask-filling models might have different mask tokens, so it's always good to verify the proper mask word when exploring other models. One way to check it is by looking at the mask word used in the widget.
+
+
+
+✏️ **Try it out!** Search for the `bert-base-cased` model on the Hub and identify its mask word in the Inference API widget. What does this model predict for the sentence in our `pipeline` example above?
+
+
+
+## Named entity recognition
+
+Named entity recognition (NER) is a task where the model has to find which parts of the input text correspond to entities such as persons, locations, or organizations. Let's look at an example:
+
+```python
+from transformers import pipeline
+
+ner = pipeline("ner", grouped_entities=True)
+ner("My name is Sylvain and I work at Hugging Face in Brooklyn.")
+```
+
+```python out
+[{'entity_group': 'PER', 'score': 0.99816, 'word': 'Sylvain', 'start': 11, 'end': 18},
+ {'entity_group': 'ORG', 'score': 0.97960, 'word': 'Hugging Face', 'start': 33, 'end': 45},
+ {'entity_group': 'LOC', 'score': 0.99321, 'word': 'Brooklyn', 'start': 49, 'end': 57}
+]
+```
+
+Here the model correctly identified that Sylvain is a person (PER), Hugging Face an organization (ORG), and Brooklyn a location (LOC).
+
+We pass the option `grouped_entities=True` in the pipeline creation function to tell the pipeline to regroup together the parts of the sentence that correspond to the same entity: here the model correctly grouped "Hugging" and "Face" as a single organization, even though the name consists of multiple words. In fact, as we will see in the next chapter, the preprocessing even splits some words into smaller parts. For instance, `Sylvain` is split into four pieces: `S`, `##yl`, `##va`, and `##in`. In the post-processing step, the pipeline successfully regrouped those pieces.
+
+
+
+✏️ **Try it out!** Search the Model Hub for a model able to do part-of-speech tagging (usually abbreviated as POS) in English. What does this model predict for the sentence in the example above?
+
+
+
+## Question answering
+
+The `question-answering` pipeline answers questions using information from a given context:
+
+```python
+from transformers import pipeline
+
+question_answerer = pipeline("question-answering")
+question_answerer(
+ question="Where do I work?",
+ context="My name is Sylvain and I work at Hugging Face in Brooklyn",
+)
+```
+
+```python out
+{'score': 0.6385916471481323, 'start': 33, 'end': 45, 'answer': 'Hugging Face'}
+```
+
+Note that this pipeline works by extracting information from the provided context; it does not generate the answer.
+
+## Summarization
+
+Summarization is the task of reducing a text into a shorter text while keeping all (or most) of the important aspects referenced in the text. Here's an example:
+
+```python
+from transformers import pipeline
+
+summarizer = pipeline("summarization")
+summarizer(
+ """
+ America has changed dramatically during recent years. Not only has the number of
+ graduates in traditional engineering disciplines such as mechanical, civil,
+ electrical, chemical, and aeronautical engineering declined, but in most of
+ the premier American universities engineering curricula now concentrate on
+ and encourage largely the study of engineering science. As a result, there
+ are declining offerings in engineering subjects dealing with infrastructure,
+ the environment, and related issues, and greater concentration on high
+ technology subjects, largely supporting increasingly complex scientific
+ developments. While the latter is important, it should not be at the expense
+ of more traditional engineering.
+
+ Rapidly developing economies such as China and India, as well as other
+ industrial countries in Europe and Asia, continue to encourage and advance
+ the teaching of engineering. Both China and India, respectively, graduate
+ six and eight times as many traditional engineers as does the United States.
+ Other industrial countries at minimum maintain their output, while America
+ suffers an increasingly serious decline in the number of engineering graduates
+ and a lack of well-educated engineers.
+"""
+)
+```
+
+```python out
+[{'summary_text': ' America has changed dramatically during recent years . The '
+ 'number of engineering graduates in the U.S. has declined in '
+ 'traditional engineering disciplines such as mechanical, civil '
+ ', electrical, chemical, and aeronautical engineering . Rapidly '
+ 'developing economies such as China and India, as well as other '
+ 'industrial countries in Europe and Asia, continue to encourage '
+ 'and advance engineering .'}]
+```
+
+Like with text generation, you can specify a `max_length` or a `min_length` for the result.
+
+
+## Translation
+
+For translation, you can use a default model if you provide a language pair in the task name (such as `"translation_en_to_fr"`), but the easiest way is to pick the model you want to use on the [Model Hub](https://huggingface.co/models). Here we'll try translating from French to English:
+
+```python
+from transformers import pipeline
+
+translator = pipeline("translation", model="Helsinki-NLP/opus-mt-fr-en")
+translator("Ce cours est produit par Hugging Face.")
+```
+
+```python out
+[{'translation_text': 'This course is produced by Hugging Face.'}]
+```
+
+Like with text generation and summarization, you can specify a `max_length` or a `min_length` for the result.
+
+
+
+✏️ **Try it out!** Search for translation models in other languages and try to translate the previous sentence into a few different languages.
+
+
+
+The pipelines shown so far are mostly for demonstrative purposes. They were programmed for specific tasks and cannot perform variations of them. In the next chapter, you'll learn what's inside a `pipeline()` function and how to customize its behavior.
From e7100c604db051cfcfcd0c27c141fffda9ad4b9e Mon Sep 17 00:00:00 2001
From: 1375626371 <40328311+1375626371@users.noreply.github.com>
Date: Wed, 13 Apr 2022 01:40:00 +0800
Subject: [PATCH 5/6] Added a small part of the missing content
---
chapters/zh/chapter1/3.mdx | 188 ++++++++++++++-----------------------
1 file changed, 73 insertions(+), 115 deletions(-)
diff --git a/chapters/zh/chapter1/3.mdx b/chapters/zh/chapter1/3.mdx
index cd6aee466..1e7e91108 100644
--- a/chapters/zh/chapter1/3.mdx
+++ b/chapters/zh/chapter1/3.mdx
@@ -1,4 +1,4 @@
-# Transformers, what can they do?
+# Transformers能做什么?
-In this section, we will look at what Transformer models can do and use our first tool from the 🤗 Transformers library: the `pipeline()` function.
-
+在本节中,我们将看看 Transformer 模型可以做什么,并使用 🤗 Transformers 库中的第一个工具:pipeline() 函数。
-👀 See that Open in Colab button on the top right? Click on it to open a Google Colab notebook with all the code samples of this section. This button will be present in any section containing code examples.
+👀 看到那个右上角的 在Colab中打开的按钮了吗? 单击它就可以打开一个包含本节所有代码示例的 Google Colab 笔记本。 每一个有实例代码的小节都会有它。
-If you want to run the examples locally, we recommend taking a look at the setup.
+如果您想在本地运行示例,我们建议您查看准备.
-## Transformers are everywhere!
-
-Transformer models are used to solve all kinds of NLP tasks, like the ones mentioned in the previous section. Here are some of the companies and organizations using Hugging Face and Transformer models, who also contribute back to the community by sharing their models:
+## Transformer被应用于各个方面!
+Transformer 模型用于解决各种 NLP 任务,就像上一节中提到的那样。以下是一些使用 Hugging Face 和 Transformer 模型的公司和组织,他们也通过分享他们的模型回馈社区:
-
-
-The [🤗 Transformers library](https://github.com/huggingface/transformers) provides the functionality to create and use those shared models. The [Model Hub](https://huggingface.co/models) contains thousands of pretrained models that anyone can download and use. You can also upload your own models to the Hub!
+![使用 Hugging Face 的公司](https://huggingface.co/course/static/chapter1/companies.PNG)
+[🤗 Transformers 库](https://github.com/huggingface/transformers)提供了创建和使用这些共享模型的功能。[模型中心(hub)](https://huggingface.co/models)包含数千个任何人都可以下载和使用的预训练模型。您还可以将自己的模型上传到 Hub!
-⚠️ The Hugging Face Hub is not limited to Transformer models. Anyone can share any kind of models or datasets they want! Create a huggingface.co account to benefit from all available features!
+⚠️ Hugging Face Hub 不限于 Transformer 模型。任何人都可以分享他们想要的任何类型的模型或数据集!创建一个 Huggingface.co 帐户(https://huggingface.co/join)以使用所有可用功能!
-Before diving into how Transformer models work under the hood, let's look at a few examples of how they can be used to solve some interesting NLP problems.
+在深入研究 Transformer 模型的底层工作原理之前,让我们先看几个示例,看看它们如何用于解决一些有趣的 NLP 问题。
+
+## 使用pipelines
-## Working with pipelines
+
+(这里有一个视频,但是国内可能打不开,译者注)
-
-The most basic object in the 🤗 Transformers library is the `pipeline()` function. It connects a model with its necessary preprocessing and postprocessing steps, allowing us to directly input any text and get an intelligible answer:
+🤗 Transformers 库中最基本的对象是 **pipeline()** 函数。它将模型与其必要的预处理和后处理步骤连接起来,使我们能够通过直接输入任何文本并获得最终的答案:
```python
from transformers import pipeline
@@ -41,50 +40,45 @@ from transformers import pipeline
classifier = pipeline("sentiment-analysis")
classifier("I've been waiting for a HuggingFace course my whole life.")
```
-
```python out
[{'label': 'POSITIVE', 'score': 0.9598047137260437}]
```
-We can even pass several sentences!
+我们也可以多传几句!
```python
classifier(
["I've been waiting for a HuggingFace course my whole life.", "I hate this so much!"]
)
-```
-
+```
```python out
[{'label': 'POSITIVE', 'score': 0.9598047137260437},
{'label': 'NEGATIVE', 'score': 0.9994558095932007}]
```
+默认情况下,此pipeline选择一个特定的预训练模型,该模型已针对英语情感分析进行了微调。创建**分类器**对象时,将下载并缓存模型。如果您重新运行该命令,则将使用缓存的模型,无需再次下载模型。
-By default, this pipeline selects a particular pretrained model that has been fine-tuned for sentiment analysis in English. The model is downloaded and cached when you create the `classifier` object. If you rerun the command, the cached model will be used instead and there is no need to download the model again.
+将一些文本传递到pipeline时涉及三个主要步骤:
-There are three main steps involved when you pass some text to a pipeline:
+1. 文本被预处理为模型可以理解的格式。
+2. 预处理的输入被传递给模型。
+3. 模型处理后输出最终人类可以理解的结果。
-1. The text is preprocessed into a format the model can understand.
-2. The preprocessed inputs are passed to the model.
-3. The predictions of the model are post-processed, so you can make sense of them.
+目前[可用的一些pipeline](https://huggingface.co/transformers/main_classes/pipelines.html)是:
+* **特征提取**(获取文本的向量表示)
+* **填充空缺**
+* **ner**(命名实体识别)
+* **问答**
+* **情感分析**
+* **文本摘要**
+* **文本生成**
+* **翻译**
+* **零样本分类**
-Some of the currently [available pipelines](https://huggingface.co/transformers/main_classes/pipelines.html) are:
+让我们来看看其中的一些吧!
-- `feature-extraction` (get the vector representation of a text)
-- `fill-mask`
-- `ner` (named entity recognition)
-- `question-answering`
-- `sentiment-analysis`
-- `summarization`
-- `text-generation`
-- `translation`
-- `zero-shot-classification`
-
-Let's have a look at a few of these!
-
-## Zero-shot classification
-
-We'll start by tackling a more challenging task where we need to classify texts that haven't been labelled. This is a common scenario in real-world projects because annotating text is usually time-consuming and requires domain expertise. For this use case, the `zero-shot-classification` pipeline is very powerful: it allows you to specify which labels to use for the classification, so you don't have to rely on the labels of the pretrained model. You've already seen how the model can classify a sentence as positive or negative using those two labels — but it can also classify the text using any other set of labels you like.
+## 零样本分类
+我们将首先处理一项非常具挑战性的任务,我们需要对尚未标记的文本进行分类。这是实际项目中的常见场景,因为注释文本通常很耗时并且需要领域专业知识。对于这项任务**zero-shot-classification**pipeline非常强大:它允许您直接指定用于分类的标签,因此您不必依赖预训练模型的标签。下面的模型展示了如何使用这两个标签将句子分类为正面或负面——但也可以使用您喜欢的任何其他标签集对文本进行分类。
```python
from transformers import pipeline
@@ -95,25 +89,19 @@ classifier(
candidate_labels=["education", "politics", "business"],
)
```
-
```python out
{'sequence': 'This is a course about the Transformers library',
'labels': ['education', 'business', 'politics'],
'scores': [0.8445963859558105, 0.111976258456707, 0.043427448719739914]}
```
-This pipeline is called _zero-shot_ because you don't need to fine-tune the model on your data to use it. It can directly return probability scores for any list of labels you want!
-
+此pipeline称为zero-shot,因为您不需要对数据上的模型进行微调即可使用它。它可以直接返回您想要的任何标签列表的概率分数!
-
-✏️ **Try it out!** Play around with your own sequences and labels and see how the model behaves.
-
+✏️**快来试试吧!**使用您自己的序列和标签,看看模型的行为。
-
-## Text generation
-
-Now let's see how to use a pipeline to generate some text. The main idea here is that you provide a prompt and the model will auto-complete it by generating the remaining text. This is similar to the predictive text feature that is found on many phones. Text generation involves randomness, so it's normal if you don't get the same results as shown below.
+## 文本生成
+现在让我们看看如何使用pipeline来生成一些文本。这里的主要使用方法是您提供一个提示,模型将通过生成剩余的文本来自动完成整段话。这类似于许多手机上的预测文本功能。文本生成涉及随机性,因此如果您没有得到相同的如下所示的结果,这是正常的。
```python
from transformers import pipeline
@@ -121,7 +109,6 @@ from transformers import pipeline
generator = pipeline("text-generation")
generator("In this course, we will teach you how to")
```
-
```python out
[{'generated_text': 'In this course, we will teach you how to understand and use '
'data flow and data interchange when handling user data. We '
@@ -129,21 +116,16 @@ generator("In this course, we will teach you how to")
'data flows — data flows of various types, as seen by the '
'HTTP'}]
```
-
-You can control how many different sequences are generated with the argument `num_return_sequences` and the total length of the output text with the argument `max_length`.
+您可以使用参数 **num_return_sequences** 控制生成多少个不同的序列,并使用参数 **max_length** 控制输出文本的总长度。
-
-✏️ **Try it out!** Use the `num_return_sequences` and `max_length` arguments to generate two sentences of 15 words each.
-
+✏️**快来试试吧!**使用 num_return_sequences 和 max_length 参数生成两个句子,每个句子 15 个单词。
+## 在pipeline中使用 Hub 中的其他模型
+前面的示例使用了默认模型,但您也可以从 Hub 中选择特定模型以在特定任务的pipeline中使用 - 例如,文本生成。转到[模型中心(hub)](https://huggingface.co/models)并单击左侧的相应标签将会只显示该任务支持的模型。[例如这样](https://huggingface.co/models?pipeline_tag=text-generation)。
-## Using any model from the Hub in a pipeline
-
-The previous examples used the default model for the task at hand, but you can also choose a particular model from the Hub to use in a pipeline for a specific task — say, text generation. Go to the [Model Hub](https://huggingface.co/models) and click on the corresponding tag on the left to display only the supported models for that task. You should get to a page like [this one](https://huggingface.co/models?pipeline_tag=text-generation).
-
-Let's try the [`distilgpt2`](https://huggingface.co/distilgpt2) model! Here's how to load it in the same pipeline as before:
+让我们试试 [**distilgpt2**](https://huggingface.co/distilgpt2) 模型吧!以下是如何在与以前相同的pipeline中加载它:
```python
from transformers import pipeline
@@ -153,7 +135,6 @@ generator(
"In this course, we will teach you how to", max_length=30, num_return_sequences=2,
)
```
-
```python out
[{'generated_text': 'In this course, we will teach you how to manipulate the world and '
'move your mental and physical capabilities to your advantage.'},
@@ -161,34 +142,26 @@ generator(
'practice realtime, and with a hands on experience on both real '
'time and real'}]
```
+您可以通过单击语言标签来筛选搜索结果,然后选择另一种文本生成模型的模型。模型中心(hub)甚至包含支持多种语言的多语言模型。
-You can refine your search for a model by clicking on the language tags, and pick a model that will generate text in another language. The Model Hub even contains checkpoints for multilingual models that support several languages.
-
-Once you select a model by clicking on it, you'll see that there is a widget enabling you to try it directly online. This way you can quickly test the model's capabilities before downloading it.
-
+通过单击选择模型后,您会看到有一个小组件,可让您直接在线试用。通过这种方式,您可以在下载之前快速测试模型的功能。
-
-✏️ **Try it out!** Use the filters to find a text generation model for another language. Feel free to play with the widget and use it in a pipeline!
-
+✏️**快来试试吧!**使用标签筛选查找另一种语言的文本生成模型。使用小组件测试并在pipeline中使用它!
-### The Inference API
+## 推理 API
+所有模型都可以使用 Inference API 直接通过浏览器进行测试,该 API 可在 [Hugging Face 网站](https://huggingface.co/)上找到。通过输入自定义文本并观察模型的输出,您可以直接在此页面上使用模型。
-All the models can be tested directly through your browser using the Inference API, which is available on the Hugging Face [website](https://huggingface.co/). You can play with the model directly on this page by inputting custom text and watching the model process the input data.
-
-The Inference API that powers the widget is also available as a paid product, which comes in handy if you need it for your workflows. See the [pricing page](https://huggingface.co/pricing) for more details.
+小组件形式的推理 API 也可作为付费产品使用,如果您的工作流程需要它,它会派上用场。有关更多详细信息,请参阅[定价页面](https://huggingface.co/pricing)。
## Mask filling
-
-The next pipeline you'll try is `fill-mask`. The idea of this task is to fill in the blanks in a given text:
-
+您将尝试的下一个pipeline是 **fill-mask**。此任务的想法是填充给定文本中的空白:
```python
from transformers import pipeline
unmasker = pipeline("fill-mask")
unmasker("This course will teach you all about models.", top_k=2)
```
-
```python out
[{'sequence': 'This course will teach you all about mathematical models.',
'score': 0.19619831442832947,
@@ -199,47 +172,36 @@ unmasker("This course will teach you all about models.", top_k=2)
'token': 38163,
'token_str': ' computational'}]
```
-
-The `top_k` argument controls how many possibilities you want to be displayed. Note that here the model fills in the special `` word, which is often referred to as a *mask token*. Other mask-filling models might have different mask tokens, so it's always good to verify the proper mask word when exploring other models. One way to check it is by looking at the mask word used in the widget.
+**top_k** 参数控制要显示的结果有多少种。请注意,这里模型填充了特殊的< **mask** >词,它通常被称为掩码标记。其他掩码填充模型可能有不同的掩码标记,因此在探索其他模型时要验证正确的掩码字是什么。检查它的一种方法是查看小组件中使用的掩码。
-
-✏️ **Try it out!** Search for the `bert-base-cased` model on the Hub and identify its mask word in the Inference API widget. What does this model predict for the sentence in our `pipeline` example above?
-
+✏️**快来试试吧!**在 Hub 上搜索基于 bert 的模型并在推理 API 小组件中找到它的掩码。这个模型对上面pipeline示例中的句子预测了什么?
-## Named entity recognition
-
-Named entity recognition (NER) is a task where the model has to find which parts of the input text correspond to entities such as persons, locations, or organizations. Let's look at an example:
-
+## 命名实体识别
+命名实体识别 (NER) 是一项任务,其中模型必须找到输入文本的哪些部分对应于诸如人员、位置或组织之类的实体。让我们看一个例子:
```python
from transformers import pipeline
ner = pipeline("ner", grouped_entities=True)
ner("My name is Sylvain and I work at Hugging Face in Brooklyn.")
```
-
```python out
[{'entity_group': 'PER', 'score': 0.99816, 'word': 'Sylvain', 'start': 11, 'end': 18},
{'entity_group': 'ORG', 'score': 0.97960, 'word': 'Hugging Face', 'start': 33, 'end': 45},
{'entity_group': 'LOC', 'score': 0.99321, 'word': 'Brooklyn', 'start': 49, 'end': 57}
]
```
+在这里,模型正确地识别出 Sylvain 是一个人 (PER),Hugging Face 是一个组织 (ORG),而布鲁克林是一个位置 (LOC)。
-Here the model correctly identified that Sylvain is a person (PER), Hugging Face an organization (ORG), and Brooklyn a location (LOC).
-
-We pass the option `grouped_entities=True` in the pipeline creation function to tell the pipeline to regroup together the parts of the sentence that correspond to the same entity: here the model correctly grouped "Hugging" and "Face" as a single organization, even though the name consists of multiple words. In fact, as we will see in the next chapter, the preprocessing even splits some words into smaller parts. For instance, `Sylvain` is split into four pieces: `S`, `##yl`, `##va`, and `##in`. In the post-processing step, the pipeline successfully regrouped those pieces.
+我们在pipeline创建函数中传递选项 **grouped_entities=True** 以告诉pipeline将对应于同一实体的句子部分重新组合在一起:这里模型正确地将“Hugging”和“Face”分组为一个组织,即使名称由多个词组成。事实上,正如我们即将在下一章看到的,预处理甚至会将一些单词分成更小的部分。例如,**Sylvain** 分割为了四部分:**S、##yl、##va** 和 **##in**。在后处理步骤中,pipeline成功地重新组合了这些部分。
-
-✏️ **Try it out!** Search the Model Hub for a model able to do part-of-speech tagging (usually abbreviated as POS) in English. What does this model predict for the sentence in the example above?
-
+✏️**快来试试吧!**在模型中心(hub)搜索能够用英语进行词性标注(通常缩写为 POS)的模型。这个模型对上面例子中的句子预测了什么?
-## Question answering
-
-The `question-answering` pipeline answers questions using information from a given context:
-
+## 问答系统
+问答pipeline使用来自给定上下文的信息回答问题:
```python
from transformers import pipeline
@@ -249,16 +211,16 @@ question_answerer(
context="My name is Sylvain and I work at Hugging Face in Brooklyn",
)
```
-
```python out
{'score': 0.6385916471481323, 'start': 33, 'end': 45, 'answer': 'Hugging Face'}
-```
-
-Note that this pipeline works by extracting information from the provided context; it does not generate the answer.
+klyn",
+)
-## Summarization
+```
+请注意,此pipeline通过从提供的上下文中提取信息来工作;它不会凭空生成答案。
-Summarization is the task of reducing a text into a shorter text while keeping all (or most) of the important aspects referenced in the text. Here's an example:
+## 文本摘要
+文本摘要是将文本缩减为较短文本的任务,同时保留文本中的主要(重要)信息。下面是一个例子:
```python
from transformers import pipeline
@@ -287,7 +249,6 @@ summarizer(
"""
)
```
-
```python out
[{'summary_text': ' America has changed dramatically during recent years . The '
'number of engineering graduates in the U.S. has declined in '
@@ -297,13 +258,10 @@ summarizer(
'industrial countries in Europe and Asia, continue to encourage '
'and advance engineering .'}]
```
+与文本生成一样,您指定结果的 **max_length** 或 **min_length**。
-Like with text generation, you can specify a `max_length` or a `min_length` for the result.
-
-
-## Translation
-
-For translation, you can use a default model if you provide a language pair in the task name (such as `"translation_en_to_fr"`), but the easiest way is to pick the model you want to use on the [Model Hub](https://huggingface.co/models). Here we'll try translating from French to English:
+## 翻译
+对于翻译,如果您在任务名称中提供语言对(例如“**translation_en_to_fr**”),则可以使用默认模型,但最简单的方法是在[模型中心(hub)](https://huggingface.co/models)选择要使用的模型。在这里,我们将尝试从法语翻译成英语:
```python
from transformers import pipeline
@@ -311,17 +269,17 @@ from transformers import pipeline
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-fr-en")
translator("Ce cours est produit par Hugging Face.")
```
-
```python out
[{'translation_text': 'This course is produced by Hugging Face.'}]
+
```
-Like with text generation and summarization, you can specify a `max_length` or a `min_length` for the result.
+与文本生成和摘要一样,您可以指定结果的 **max_length** 或 **min_length**。
-✏️ **Try it out!** Search for translation models in other languages and try to translate the previous sentence into a few different languages.
+✏️**快来试试吧!**搜索其他语言的翻译模型,尝试将前一句翻译成几种不同的语言。
-The pipelines shown so far are mostly for demonstrative purposes. They were programmed for specific tasks and cannot perform variations of them. In the next chapter, you'll learn what's inside a `pipeline()` function and how to customize its behavior.
+到目前为止显示的pipeline主要用于演示目的。它们是为特定任务而编程的,不能对他们进行自定义的修改。在下一章中,您将了解 **pipeline()** 函数内部的内容以及如何进行自定义的修改。
\ No newline at end of file
From 0990c773d3e0f5f1ff434a5af1d06078072d4ff9 Mon Sep 17 00:00:00 2001
From: Lewis Tunstall
Date: Wed, 13 Apr 2022 11:14:29 +0200
Subject: [PATCH 6/6] Fix style
---
chapters/zh/chapter1/3.mdx | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/chapters/zh/chapter1/3.mdx b/chapters/zh/chapter1/3.mdx
index 1e7e91108..076263ba4 100644
--- a/chapters/zh/chapter1/3.mdx
+++ b/chapters/zh/chapter1/3.mdx
@@ -132,7 +132,9 @@ from transformers import pipeline
generator = pipeline("text-generation", model="distilgpt2")
generator(
- "In this course, we will teach you how to", max_length=30, num_return_sequences=2,
+ "In this course, we will teach you how to",
+ max_length=30,
+ num_return_sequences=2,
)
```
```python out