From f47740c6893ddfbe1d9352a9f167224373f29538 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E6=9D=8E=E6=B4=8B?= <45715979+innovation64@users.noreply.github.com> Date: Tue, 14 Feb 2023 00:54:07 +0800 Subject: [PATCH] docs(zh-cn): Reviewed No. 08 - What happens inside the pipeline function? (PyTorch) (#454) --- ...inside-the-pipeline-function-(pytorch).srt | 52 +++++++++---------- 1 file changed, 26 insertions(+), 26 deletions(-) diff --git a/subtitles/zh-CN/08_what-happens-inside-the-pipeline-function-(pytorch).srt b/subtitles/zh-CN/08_what-happens-inside-the-pipeline-function-(pytorch).srt index ca6c0276f..fffc4139f 100644 --- a/subtitles/zh-CN/08_what-happens-inside-the-pipeline-function-(pytorch).srt +++ b/subtitles/zh-CN/08_what-happens-inside-the-pipeline-function-(pytorch).srt @@ -5,7 +5,7 @@ 2 00:00:05,340 --> 00:00:07,563 -- 管道函数内部发生了什么? +- pipeline 函数内部发生了什么? - What happens inside the pipeline function? 3 @@ -25,22 +25,22 @@ of the Transformers library. 6 00:00:15,090 --> 00:00:16,860 -更具体地说,我们将看看 +详细来讲,我们将举例 More specifically, we will look 7 00:00:16,860 --> 00:00:19,200 -在情绪分析管道中, +在情绪分析的 pipeline 中, at the sentiment analysis pipeline, 8 00:00:19,200 --> 00:00:22,020 -以及它是如何从以下两个句子开始的, +它是如何从以下两个句子开始的, and how it went from the two following sentences, 9 00:00:22,020 --> 00:00:23,970 -正负标签 +将正负标签 to the positive and negative labels 10 @@ -50,12 +50,12 @@ with their respective scores. 11 00:00:26,760 --> 00:00:29,190 -正如我们在管道演示中看到的那样, +正如我们在 pipeline 展示中看到的那样, As we have seen in the pipeline presentation, 12 00:00:29,190 --> 00:00:31,860 -管道分为三个阶段。 +pipeline 分为三个阶段。 there are three stages in the pipeline. 13 @@ -65,7 +65,7 @@ First, we convert the raw texts to numbers 14 00:00:34,620 --> 00:00:37,173 -该模型可以理解使用分词器。 +该模型可以通过使用分词器理解。 the model can make sense of using a tokenizer. 15 @@ -75,17 +75,17 @@ Then those numbers go through the model, 16 00:00:40,530 --> 00:00:41,943 -输出逻辑。 +输出 logits 。 which outputs logits. 17 00:00:42,780 --> 00:00:45,600 -最后,后处理步骤变换 +最后,后处理步骤转换 Finally, the post-processing steps transforms 18 00:00:45,600 --> 00:00:48,150 -那些登录到标签和分数。 +那些 logits 包含标签和分数。 those logits into labels and scores. 19 @@ -100,17 +100,17 @@ and how to replicate them using the Transformers library, 21 00:00:53,640 --> 00:00:56,043 -从第一阶段开始,标记化。 +从第一阶段开始,token 化。 beginning with the first stage, tokenization. 22 00:00:57,915 --> 00:01:00,360 -令牌化过程有几个步骤。 +token 化过程有几个步骤。 The tokenization process has several steps. 23 00:01:00,360 --> 00:01:04,950 -首先,文本被分成称为标记的小块。 +首先,文本被分成称为 token 的小块。 First, the text is split into small chunks called tokens. 24 @@ -120,7 +120,7 @@ They can be words, parts of words or punctuation symbols. 25 00:01:08,550 --> 00:01:11,580 -然后 tokenizer 将有一些特殊的标记, +然后 tokenizer 将有一些特殊的 token , Then the tokenizer will had some special tokens, 26 @@ -130,17 +130,17 @@ if the model expect them. 27 00:01:13,500 --> 00:01:16,860 -这里的模型在开头使用期望 CLS 令牌 +这里的模型在开头使用期望 CLS token Here the model uses expects a CLS token at the beginning 28 00:01:16,860 --> 00:01:19,743 -以及用于分类的句子末尾的 SEP 标记。 +以及用于分类的句子末尾的 SEP token。 and a SEP token at the end of the sentence to classify. 29 00:01:20,580 --> 00:01:24,180 -最后,标记器将每个标记与其唯一 ID 匹配 +最后,tokenizer 将每个 token 与其唯一 ID 匹配 Lastly, the tokenizer matches each token to its unique ID 30 @@ -180,7 +180,7 @@ Here the checkpoint used by default 37 00:01:45,360 --> 00:01:47,280 -用于情绪分析管道 +用于情绪分析的 pipeline for the sentiment analysis pipeline 38 @@ -250,7 +250,7 @@ Looking at the result, we see we have a dictionary 51 00:02:25,590 --> 00:02:26,670 -用两把钥匙。 +和两个主键 with two keys. 52 @@ -265,7 +265,7 @@ with zero where the padding is applied. 54 00:02:32,550 --> 00:02:34,260 -第二把钥匙,注意面具, +第二个键值,注意力 mask , The second key, attention mask, 55 @@ -280,7 +280,7 @@ so the model does not pay attention to it. 57 00:02:38,940 --> 00:02:42,090 -这就是标记化步骤中的全部内容。 +这就是 token 化步骤中的全部内容。 This is all what is inside the tokenization step. 58 @@ -350,7 +350,7 @@ for our classification problem. 71 00:03:15,030 --> 00:03:19,230 -这里的张量有两个句子,每个句子有 16 个标记, +这里的张量有两个句子,每个句子有 16 个 token , Here the tensor has two sentences, each of 16 tokens, 72 @@ -425,12 +425,12 @@ This is because each model 86 00:03:57,270 --> 00:04:00,810 -每个模型都会返回 logits。 +每个模型都会返回 logits 。 of the Transformers library returns logits. 87 00:04:00,810 --> 00:04:02,250 -为了理解这些逻辑, +为了理解这些 logits , To make sense of those logits, 88 @@ -505,7 +505,7 @@ This is how our classifier built 102 00:04:37,950 --> 00:04:40,230 -使用管道功能选择了那些标签 +使用 pipeline 功能选择了那些标签 with the pipeline function picked those labels 103