Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(zh-cn): Reviewed No. 08 - What happens inside the pipeline function? (PyTorch) #454

Merged
merged 2 commits into from
Feb 13, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

2
00:00:05,340 --> 00:00:07,563
- 管道函数内部发生了什么
- pipeline 函数内部发生了什么
- What happens inside the pipeline function?

3
Expand All @@ -25,22 +25,22 @@ of the Transformers library.

6
00:00:15,090 --> 00:00:16,860
更具体地说,我们将看看
详细来讲,我们将举例
More specifically, we will look

7
00:00:16,860 --> 00:00:19,200
在情绪分析管道中
在情绪分析的 pipeline 中
at the sentiment analysis pipeline,

8
00:00:19,200 --> 00:00:22,020
以及它是如何从以下两个句子开始的
它是如何从以下两个句子开始的
and how it went from the two following sentences,

9
00:00:22,020 --> 00:00:23,970
正负标签
将正负标签
to the positive and negative labels

10
Expand All @@ -50,12 +50,12 @@ with their respective scores.

11
00:00:26,760 --> 00:00:29,190
正如我们在管道演示中看到的那样
正如我们在 pipeline 展示中看到的那样
As we have seen in the pipeline presentation,

12
00:00:29,190 --> 00:00:31,860
管道分为三个阶段
pipeline 分为三个阶段
there are three stages in the pipeline.

13
Expand All @@ -65,7 +65,7 @@ First, we convert the raw texts to numbers

14
00:00:34,620 --> 00:00:37,173
该模型可以理解使用分词器
该模型可以通过使用分词器理解
the model can make sense of using a tokenizer.

15
Expand All @@ -75,17 +75,17 @@ Then those numbers go through the model,

16
00:00:40,530 --> 00:00:41,943
输出逻辑
输出 logits
which outputs logits.

17
00:00:42,780 --> 00:00:45,600
最后,后处理步骤变换
最后,后处理步骤转换
Finally, the post-processing steps transforms

18
00:00:45,600 --> 00:00:48,150
那些登录到标签和分数
那些 logits 包含标签和分数
those logits into labels and scores.

19
Expand All @@ -100,17 +100,17 @@ and how to replicate them using the Transformers library,

21
00:00:53,640 --> 00:00:56,043
从第一阶段开始,标记化
从第一阶段开始,token 化
beginning with the first stage, tokenization.

22
00:00:57,915 --> 00:01:00,360
令牌化过程有几个步骤
token 化过程有几个步骤
The tokenization process has several steps.

23
00:01:00,360 --> 00:01:04,950
首先,文本被分成称为标记的小块
首先,文本被分成称为 token 的小块
First, the text is split into small chunks called tokens.

24
Expand All @@ -120,7 +120,7 @@ They can be words, parts of words or punctuation symbols.

25
00:01:08,550 --> 00:01:11,580
然后 tokenizer 将有一些特殊的标记
然后 tokenizer 将有一些特殊的 token
Then the tokenizer will had some special tokens,

26
Expand All @@ -130,17 +130,17 @@ if the model expect them.

27
00:01:13,500 --> 00:01:16,860
这里的模型在开头使用期望 CLS 令牌
这里的模型在开头使用期望 CLS token
Here the model uses expects a CLS token at the beginning

28
00:01:16,860 --> 00:01:19,743
以及用于分类的句子末尾的 SEP 标记
以及用于分类的句子末尾的 SEP token
and a SEP token at the end of the sentence to classify.

29
00:01:20,580 --> 00:01:24,180
最后,标记器将每个标记与其唯一 ID 匹配
最后,tokenizer 将每个 token 与其唯一 ID 匹配
Lastly, the tokenizer matches each token to its unique ID

30
Expand Down Expand Up @@ -180,7 +180,7 @@ Here the checkpoint used by default

37
00:01:45,360 --> 00:01:47,280
用于情绪分析管道
用于情绪分析的 pipeline
for the sentiment analysis pipeline

38
Expand Down Expand Up @@ -250,7 +250,7 @@ Looking at the result, we see we have a dictionary

51
00:02:25,590 --> 00:02:26,670
用两把钥匙。
和两个主键
with two keys.

52
Expand All @@ -265,7 +265,7 @@ with zero where the padding is applied.

54
00:02:32,550 --> 00:02:34,260
第二把钥匙,注意面具
第二个键值,注意力 mask
The second key, attention mask,

55
Expand All @@ -280,7 +280,7 @@ so the model does not pay attention to it.

57
00:02:38,940 --> 00:02:42,090
这就是标记化步骤中的全部内容
这就是 token 化步骤中的全部内容
This is all what is inside the tokenization step.

58
Expand Down Expand Up @@ -350,7 +350,7 @@ for our classification problem.

71
00:03:15,030 --> 00:03:19,230
这里的张量有两个句子,每个句子有 16 个标记
这里的张量有两个句子,每个句子有 16 个 token
Here the tensor has two sentences, each of 16 tokens,

72
Expand Down Expand Up @@ -425,12 +425,12 @@ This is because each model

86
00:03:57,270 --> 00:04:00,810
每个模型都会返回 logits。
每个模型都会返回 logits
of the Transformers library returns logits.

87
00:04:00,810 --> 00:04:02,250
为了理解这些逻辑
为了理解这些 logits
To make sense of those logits,

88
Expand Down Expand Up @@ -505,7 +505,7 @@ This is how our classifier built

102
00:04:37,950 --> 00:04:40,230
使用管道功能选择了那些标签
使用 pipeline 功能选择了那些标签
with the pipeline function picked those labels

103
Expand Down