Skip to content

Commit

Permalink
docs(zh-cn): Reviewed No. 08 - What happens inside the pipeline funct…
Browse files Browse the repository at this point in the history
…ion? (PyTorch) (#454)
  • Loading branch information
innovation64 authored Feb 13, 2023
1 parent 0d55521 commit f47740c
Showing 1 changed file with 26 additions and 26 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

2
00:00:05,340 --> 00:00:07,563
- 管道函数内部发生了什么
- pipeline 函数内部发生了什么
- What happens inside the pipeline function?

3
Expand All @@ -25,22 +25,22 @@ of the Transformers library.

6
00:00:15,090 --> 00:00:16,860
更具体地说,我们将看看
详细来讲,我们将举例
More specifically, we will look

7
00:00:16,860 --> 00:00:19,200
在情绪分析管道中
在情绪分析的 pipeline 中
at the sentiment analysis pipeline,

8
00:00:19,200 --> 00:00:22,020
以及它是如何从以下两个句子开始的
它是如何从以下两个句子开始的
and how it went from the two following sentences,

9
00:00:22,020 --> 00:00:23,970
正负标签
将正负标签
to the positive and negative labels

10
Expand All @@ -50,12 +50,12 @@ with their respective scores.

11
00:00:26,760 --> 00:00:29,190
正如我们在管道演示中看到的那样
正如我们在 pipeline 展示中看到的那样
As we have seen in the pipeline presentation,

12
00:00:29,190 --> 00:00:31,860
管道分为三个阶段
pipeline 分为三个阶段
there are three stages in the pipeline.

13
Expand All @@ -65,7 +65,7 @@ First, we convert the raw texts to numbers

14
00:00:34,620 --> 00:00:37,173
该模型可以理解使用分词器
该模型可以通过使用分词器理解
the model can make sense of using a tokenizer.

15
Expand All @@ -75,17 +75,17 @@ Then those numbers go through the model,

16
00:00:40,530 --> 00:00:41,943
输出逻辑
输出 logits
which outputs logits.

17
00:00:42,780 --> 00:00:45,600
最后,后处理步骤变换
最后,后处理步骤转换
Finally, the post-processing steps transforms

18
00:00:45,600 --> 00:00:48,150
那些登录到标签和分数
那些 logits 包含标签和分数
those logits into labels and scores.

19
Expand All @@ -100,17 +100,17 @@ and how to replicate them using the Transformers library,

21
00:00:53,640 --> 00:00:56,043
从第一阶段开始,标记化
从第一阶段开始,token 化
beginning with the first stage, tokenization.

22
00:00:57,915 --> 00:01:00,360
令牌化过程有几个步骤
token 化过程有几个步骤
The tokenization process has several steps.

23
00:01:00,360 --> 00:01:04,950
首先,文本被分成称为标记的小块
首先,文本被分成称为 token 的小块
First, the text is split into small chunks called tokens.

24
Expand All @@ -120,7 +120,7 @@ They can be words, parts of words or punctuation symbols.

25
00:01:08,550 --> 00:01:11,580
然后 tokenizer 将有一些特殊的标记
然后 tokenizer 将有一些特殊的 token
Then the tokenizer will had some special tokens,

26
Expand All @@ -130,17 +130,17 @@ if the model expect them.

27
00:01:13,500 --> 00:01:16,860
这里的模型在开头使用期望 CLS 令牌
这里的模型在开头使用期望 CLS token
Here the model uses expects a CLS token at the beginning

28
00:01:16,860 --> 00:01:19,743
以及用于分类的句子末尾的 SEP 标记
以及用于分类的句子末尾的 SEP token
and a SEP token at the end of the sentence to classify.

29
00:01:20,580 --> 00:01:24,180
最后,标记器将每个标记与其唯一 ID 匹配
最后,tokenizer 将每个 token 与其唯一 ID 匹配
Lastly, the tokenizer matches each token to its unique ID

30
Expand Down Expand Up @@ -180,7 +180,7 @@ Here the checkpoint used by default

37
00:01:45,360 --> 00:01:47,280
用于情绪分析管道
用于情绪分析的 pipeline
for the sentiment analysis pipeline

38
Expand Down Expand Up @@ -250,7 +250,7 @@ Looking at the result, we see we have a dictionary

51
00:02:25,590 --> 00:02:26,670
用两把钥匙。
和两个主键
with two keys.

52
Expand All @@ -265,7 +265,7 @@ with zero where the padding is applied.

54
00:02:32,550 --> 00:02:34,260
第二把钥匙,注意面具
第二个键值,注意力 mask
The second key, attention mask,

55
Expand All @@ -280,7 +280,7 @@ so the model does not pay attention to it.

57
00:02:38,940 --> 00:02:42,090
这就是标记化步骤中的全部内容
这就是 token 化步骤中的全部内容
This is all what is inside the tokenization step.

58
Expand Down Expand Up @@ -350,7 +350,7 @@ for our classification problem.

71
00:03:15,030 --> 00:03:19,230
这里的张量有两个句子,每个句子有 16 个标记
这里的张量有两个句子,每个句子有 16 个 token
Here the tensor has two sentences, each of 16 tokens,

72
Expand Down Expand Up @@ -425,12 +425,12 @@ This is because each model

86
00:03:57,270 --> 00:04:00,810
每个模型都会返回 logits。
每个模型都会返回 logits
of the Transformers library returns logits.

87
00:04:00,810 --> 00:04:02,250
为了理解这些逻辑
为了理解这些 logits
To make sense of those logits,

88
Expand Down Expand Up @@ -505,7 +505,7 @@ This is how our classifier built

102
00:04:37,950 --> 00:04:40,230
使用管道功能选择了那些标签
使用 pipeline 功能选择了那些标签
with the pipeline function picked those labels

103
Expand Down

0 comments on commit f47740c

Please sign in to comment.