From 27a5a363eed5e9c744977274612c2d063b738543 Mon Sep 17 00:00:00 2001 From: Yuan Date: Tue, 14 Feb 2023 00:58:22 +0800 Subject: [PATCH] docs(zh-cn): 33_the-push-to-hub-api-(pytorch).srt (#473) --- .../33_the-push-to-hub-api-(pytorch).srt | 182 +++++++++--------- 1 file changed, 91 insertions(+), 91 deletions(-) diff --git a/subtitles/zh-CN/33_the-push-to-hub-api-(pytorch).srt b/subtitles/zh-CN/33_the-push-to-hub-api-(pytorch).srt index e27897b99..475489f28 100644 --- a/subtitles/zh-CN/33_the-push-to-hub-api-(pytorch).srt +++ b/subtitles/zh-CN/33_the-push-to-hub-api-(pytorch).srt @@ -15,13 +15,13 @@ 4 00:00:05,130 --> 00:00:06,830 -- [Instructor] 所以推送到 hub API。 -- [Instructor] So push to hub API. +- [Instructor] push_to_hub API。 +- [Instructor] So push_to_hub API. 5 00:00:08,310 --> 00:00:10,533 -让我们看一下推送到集线器 API。 -Let's have a look at the push to hub API. +让我们看一下 push_to_hub API。 +Let's have a look at the push_to_hub API. 6 00:00:11,730 --> 00:00:14,640 @@ -30,12 +30,12 @@ You will need to be logged in with your Hugging Face account 7 00:00:14,640 --> 00:00:17,400 -你可以通过执行第一个单元格来做到这一点, +你可以通过执行第一个单元格里的操作来登录, which you can do by executing this first cell, 8 00:00:17,400 --> 00:00:21,123 -或者在终端中输入 huggingface-cli login。 +或者在 terminal 中输入 huggingface-cli login。 or by typing huggingface-cli login in a terminal. 9 @@ -45,37 +45,37 @@ Just enter you username and password, then click login, 10 00:00:26,640 --> 00:00:28,620 -这将存储一个通知令牌 -this will store a notification token +登录后将存储一个授权 token +this will store a authentication token 11 00:00:28,620 --> 00:00:30,670 -在你正在使用的机器的缓存中。 +到你正在使用的机器的缓存中。 in the cache of the machine you're using. 12 00:00:31,890 --> 00:00:35,790 -现在,让我们对 BERT 模型进行微调 +现在,让我们基于 GLUE COLA 数据集 Now, let's launch a fine tuning of a BERT model 13 00:00:35,790 --> 00:00:37,920 -在 GLUE COLA 数据集上。 +对 BERT 模型进行微调。 on the GLUE COLA dataset. 14 00:00:37,920 --> 00:00:39,600 -我们不会讨论微调代码 +我们不会深入探讨微调代码 We won't go over the fine tuning code 15 00:00:39,600 --> 00:00:42,270 -因为你可以在任何变压器教程中找到它, +你可以在任何 transformer 教程 because you can find it in any transformer tutorial, 16 00:00:42,270 --> 00:00:44,670 -或通过查看下面的视频链接。 +或查看下面的视频链接找到相关参考。 or by looking at the videos link below. 17 @@ -85,7 +85,7 @@ What interests us here is 18 00:00:46,470 --> 00:00:48,970 -我们如何在训练期间利用模型中心。 +如何在训练期间利用 model hub。 how we can leverage the model hub during training. 19 @@ -95,72 +95,72 @@ This is done with the "push_to_hub=true" argument 20 00:00:52,980 --> 00:00:55,530 -传入你的 TrainingArguments。 +将该参数添加到你的 TrainingArguments。 passed in your TrainingArguments. 21 00:00:55,530 --> 00:00:57,240 -这将自动上传你的模型 +每次保存时将自动上传 This will automatically upload your model 22 00:00:57,240 --> 00:00:59,400 -每次保存到集线器, +你的模型到 Hub,在我们的示例中, to the Hub each time it is saved, 23 00:00:59,400 --> 00:01:01,323 -所以我们案例中的每个时代。 +每个 epoch 都会如此操作。 so every epoch in our case. 24 00:01:02,280 --> 00:01:04,860 -这允许你从不同的机器恢复训练 +如果当前的被打断 This allows you to resume training from a different machine 25 00:01:04,860 --> 00:01:06,873 -如果当前的被打断。 +这允许你从不同的机器恢复训练之前的训练。 if the current one gets interrupted. 26 00:01:08,220 --> 00:01:10,440 -该模型将在你的名称空间中更新 -The model will be updated in your name space +该模型将使用 +The model will be updated in your namespace 27 00:01:10,440 --> 00:01:14,640 -使用你默认选择的输出目录的名称。 +你默认选择的输出目录的名称在你的 namespace 中更新。 with the name of the output directory you picked by default. 28 00:01:14,640 --> 00:01:16,020 -你可以选择其他名称 +你可以通过将其 You can choose another name 29 00:01:16,020 --> 00:01:19,113 -通过将其传递给 hub_model_id 参数。 +传递给 hub_model_id 参数选择其他名称。 by passing it to the hub_model_id argument. 30 00:01:20,070 --> 00:01:23,370 -你还可以推动你所属的组织内部 +你还可以通过传递完整的仓库名称 You can also push inside an organization you are a member of 31 00:01:23,370 --> 00:01:25,740 -通过传递完整的存储库名称, +将模型 push 到你所属的组织内部, by passing a full repository name, 32 00:01:25,740 --> 00:01:28,933 -以组织名称 /, +使用 organization/ 的形式, with the name of the organization/, 33 00:01:28,933 --> 00:01:30,433 -你要选择的型号 ID。 +再加上你所选的 model ID。 the model ID you want to pick. 34 @@ -175,38 +175,38 @@ and wait a little bit. 36 00:01:36,960 --> 00:01:39,033 -我会从视频中缩短等待时间。 +视频中会跳过等待的过程。 I'll cut the waiting time from the video. 37 00:01:43,260 --> 00:01:46,350 -请注意,模型是异步推送的, +请注意,模型是异步 push 的, Note that the model is pushed asynchronously, 38 00:01:46,350 --> 00:01:47,730 -意味着训练继续 +意味着当你的模型上传到 hub 时, meaning that the training continues 39 00:01:47,730 --> 00:01:49,730 -当你的模型上传到集线器时。 +训练将继续进行。 while your model is uploaded to the hub. 40 00:01:51,060 --> 00:01:52,950 -当你的第一次提交完成时, +当你的第一次 commit 完成时, When your first commit is finished, 41 00:01:52,950 --> 00:01:55,650 -你可以去中心检查你的模型 +你可以通过查看你的 namespace you can go inspect your model on the Hub 42 00:01:55,650 --> 00:01:57,960 -通过查看你的名称空间, -by looking inside your name space, +去 Hub 检查你的模型, +by looking inside your namespace, 43 00:01:57,960 --> 00:01:59,943 @@ -215,22 +215,22 @@ and you'll find it at the very top. 44 00:02:01,980 --> 00:02:04,200 -你甚至可以开始使用它的推理小部件 +你甚至可以在继续训练的同时 You can even start playing with its inference widget 45 00:02:04,200 --> 00:02:06,630 -在继续训练的同时。 +开始使用它的 inference 小部件。 while it's continuing the training. 46 00:02:06,630 --> 00:02:09,270 Cola 数据集让模型确定 -The Cola data set tasks the model with determining +The Cola dataset tasks the model with determining 47 00:02:09,270 --> 00:02:11,970 -如果句子在语法上是正确的。 +句子在语法上是否是正确的。 if the sentence is grammatically correct on that. 48 @@ -240,102 +240,102 @@ So we pick an example of incorrect sentence to test it. 49 00:02:15,510 --> 00:02:16,950 -请注意,这需要一些时间 +请注意,第一次尝试使用它时, Note that it'll take a bit of time 50 00:02:16,950 --> 00:02:18,750 -在推理 API 中加载模型, +这需要一些时间才能在 inference API 中 to load your model inside the inference APIs, 51 00:02:18,750 --> 00:02:20,880 -所以第一次尝试使用它。 +完成模型加载。 so first time you try to use it. 52 00:02:20,880 --> 00:02:23,280 -我们将按时间从视频中删减。 +我们将根据时间从视频中删掉。 We'll cut by time from the video. 53 00:02:23,280 --> 00:02:24,870 -标签有问题, +标签有点问题, There is something wrong with the labels, 54 00:02:24,870 --> 00:02:27,360 -但我们稍后会在本视频中修复它。 +我们稍后会在本视频中修复它。 but we'll fix it later in this video. 55 00:02:27,360 --> 00:02:29,520 -一旦你的训练结束, +一旦你的训练完成, Once your training is finished, 56 00:02:29,520 --> 00:02:31,770 -你应该和教练一起做最后一击 +你应该使用 trainer.push_to_hub 方法 you should do one last push with the trainer 57 00:02:31,770 --> 00:02:33,840 -推到一个方法。 +最后再提交一次。 that pushed to a method. 58 00:02:33,840 --> 00:02:35,430 -这是有两个原因的。 +这其中有两个原因。 This is for two reason. 59 00:02:35,430 --> 00:02:36,750 -首先,这将确保 +首先,若你尚未完成 First, this will make sure 60 00:02:36,750 --> 00:02:39,180 -你正在预测模型的最终版本 +这将确保你正在预测模型的 you are predicting the final version of your model 61 00:02:39,180 --> 00:02:40,680 -如果你还没有。 +最终版本。 if you didn't already. 62 00:02:40,680 --> 00:02:42,480 -例如,如果你曾经保存 +例如,如果你曾经是在每一步保存 For instance, if you used to save 63 00:02:42,480 --> 00:02:46,980 -每一步策略而不是每秒, +而不是每秒保存, every in step strategy instead of every second, 64 00:02:46,980 --> 00:02:48,180 -这将起草一张模型卡 +这将创建一个 model card this will draft a model card 65 00:02:48,180 --> 00:02:51,120 -那将是你的模型回购的登陆页面。 +那将是你的 model repo 的最初始页面。 that will be the landing page of your model repo. 66 00:02:51,120 --> 00:02:52,260 -提交完成后, +commit 完成后, Once the commit is done, 67 00:02:52,260 --> 00:02:54,810 -让我们回到我们的模型页面并刷新。 +让我们回到我们的 model 页面并刷新。 let's go back on our model page and refresh. 68 00:02:54,810 --> 00:02:56,820 -我们可以看到制图者模型卡 +我们可以看到 model card 的草稿 We can see the drafters model card 69 @@ -345,17 +345,17 @@ which includes information, 70 00:02:58,080 --> 00:03:00,381 -我们发现调整了哪一种模型。 +以及哪个模型被调整过。 and which one model we find tuned. 71 00:03:00,381 --> 00:03:03,570 -所以最终评估损失和指标, +接下来是最终的评估 loss 和 metric, So final evaluation loss and metric, 72 00:03:03,570 --> 00:03:06,300 -使用的训练超参数, +使用过的训练超参数, the training hyperparameter used, 73 @@ -380,42 +380,42 @@ On top of all that information, 77 00:03:16,860 --> 00:03:19,740 -培训师还包括一些解释的元数据 +trainer 还包括一些 metadata,它可以 the trainer also included some metadata that is interpreted 78 00:03:19,740 --> 00:03:22,650 -通过模型云中的 Hugging Face 网站。 +通过 model cloud 上的 HuggingFace 网站解析。 by the Hugging Face website in the model cloud. 79 00:03:22,650 --> 00:03:26,010 -你获得了一个漂亮的小部件中报告的指标的价值 +你将会得到一个漂亮的 widget 所返回的相关指标数值 You get the value of the metrics reported in a nice widget 80 00:03:26,010 --> 00:03:29,640 -以及带有代码的论文排行榜的链接。 +以及一个链接指向 leaderboard(Paper with Code)。 as well as a link to a leaderboard with paper with code. 81 00:03:29,640 --> 00:03:32,550 -所以 Tensorboard runs 也被推送了 +并且 Tensorboard 的运行结果也包含 So the Tensorboard runs have also been pushed 82 00:03:32,550 --> 00:03:34,560 -到这份报告,我们可以看看他们 +在这份报告中,我们可以在 Model Hub 中 to this report, and we can look at them 83 00:03:34,560 --> 00:03:36,000 -直接从模型中心 +通过点击子菜单中的 directly from the model hub 84 00:03:36,000 --> 00:03:38,850 -通过单击训练指标子菜单。 +Training metrics 查看报告。 by clicking on the training metrics sub menu. 85 @@ -430,52 +430,52 @@ to fine-tune your model, 87 00:03:42,510 --> 00:03:43,770 -你可以使用 push_to_hub 方法 +你可以在模型上直接 you can use a push_to_hub method 88 00:03:43,770 --> 00:03:46,427 -在模型上,并直接标记器。 +使用 push_to_hub 方法和分词器。 on the model, and tokenizer directly. 89 00:03:46,427 --> 00:03:50,160 -让我们测试一下以修复推理小部件中的所有标签。 +让我们测试一下以修复 inference widget 中的所有标签。 Let's test this to fix all labels in the inference widget. 90 00:03:50,160 --> 00:03:52,740 -推理小部件使用不同的标签名称 +inference widget 使用不同的标签名称 The inference widget was using different names for labels 91 00:03:52,740 --> 00:03:54,810 -因为我们没有注明对应 +因为我们没有在整数和标签名称之间 because we did not indicate the correspondence 92 00:03:54,810 --> 00:03:57,030 -在整数和标签名称之间。 +注明关联性。 between integer and label names. 93 00:03:57,030 --> 00:03:58,740 -我们可以在配置中解决这个问题 +当推送模型配置到 hub 时, We can fix this in the configuration 94 00:03:58,740 --> 00:04:01,350 -通过坐在 label2id, -by sitting the label2id, +我们可以通过将 label2id +by setting the label2id, 95 00:04:01,350 --> 00:04:04,170 -和 id2label 字段通过适当的值 +和 id2label 字段设置为合适的值 and id2label fields through the proper values 96 00:04:04,170 --> 00:04:06,933 -将模型配置推送到集线器时。 +在配置中解决这个问题。 when pushing the model config to the hub. 97 @@ -490,7 +490,7 @@ and the model is now showing the proper label. 99 00:04:13,380 --> 00:04:15,240 -现在模型在集线器上, +现在模型在 hub 上, Now that the model is on the hub, 100 @@ -500,7 +500,7 @@ we can use it from anywhere 101 00:04:17,370 --> 00:04:19,920 -就像我们对任何其他 Transformer 模型一样 +就像我们对任何其他 Transformer 模型 as we would any other Transformer model 102 @@ -510,12 +510,12 @@ with the from_pretrained method 103 00:04:21,113 --> 00:04:22,923 -具有管道功能。 -of with the pipeline function. +或者使用 pipeline 函数。 +or with the pipeline function. 104 00:04:34,350 --> 00:04:36,780 -我们只需要使用集线器的标识符, +我们只需要使用 hub 的标识符, We just have to use the identifier from the hub, 105 @@ -525,12 +525,12 @@ and we can see that the model configuration and weights 106 00:04:39,450 --> 00:04:42,483 -以及标记化的文件会自动下载。 +以及分词处理后的文件会自动下载。 as well as the tokenized files are automatically downloaded. 107 00:04:53,880 --> 00:04:55,950 -在下一次培训中尝试 push_to_hub API +在下一次训练中尝试 push_to_hub API Try the push_to_hub API in the next training 108