-
Notifications
You must be signed in to change notification settings - Fork 770
/
Copy path32_managing-a-repo-on-the-model-hub.srt
850 lines (680 loc) · 19.3 KB
/
32_managing-a-repo-on-the-model-hub.srt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
1
00:00:04,200 --> 00:00:06,210
- [Instructor] 在这段视频中,我们将了解如何
- [Instructor] In this video, we're going to understand how
2
00:00:06,210 --> 00:00:08,280
管理模型仓库
to manage a model repository
3
00:00:08,280 --> 00:00:10,053
在 Hugging Face Hub 模型中心。
on the Hugging Face Hub Model Hub.
4
00:00:10,920 --> 00:00:13,020
为了处理仓库
In order to handle a repository
5
00:00:13,020 --> 00:00:15,450
你应该首先拥有一个 Hugging Face 帐户。
you should first have a Hugging Face account.
6
00:00:15,450 --> 00:00:17,610
在描述中有创建新帐户
A link to create a new account is available
7
00:00:17,610 --> 00:00:18,573
的链接。
in the description.
8
00:00:20,130 --> 00:00:22,980
登录后,你可以创建一个新的仓库
Once you are logged in, you can create a new repository
9
00:00:22,980 --> 00:00:25,890
通过单击 New Model 选项。
by clicking on the New Model option.
10
00:00:25,890 --> 00:00:29,400
你会看到类似下面的模型。
You should be facing a similar model to the following.
11
00:00:29,400 --> 00:00:33,240
在 Owner 输入框中,你可以放置自己的命名空间
In the owner input, you can put either your own namespace
12
00:00:33,240 --> 00:00:35,703
或你组织的任何命名空间。
or any of your organization's namespaces.
13
00:00:36,660 --> 00:00:39,330
Model name 是模型标识符
The Model name is the model identifier
14
00:00:39,330 --> 00:00:40,320
它将被用于
that will then be used
15
00:00:40,320 --> 00:00:43,143
在所选命名空间上识别你的模型。
to identify your model on the chosen namespace.
16
00:00:44,130 --> 00:00:47,700
最后可以在 Public(公共) 和 Private(私有) 之间选择。
The final choice is between public and private.
17
00:00:47,700 --> 00:00:49,950
任何人都可以访问公共模型。
Public models are accessible by anyone.
18
00:00:49,950 --> 00:00:51,840
这是推荐的免费选项,
This is the recommended free option,
19
00:00:51,840 --> 00:00:54,960
因为这使你的模型易于访问和共享。
as this makes your model easily accessible and shareable.
20
00:00:54,960 --> 00:00:57,630
你的命名空间的所有者
The owners of your namespace are the only ones
21
00:00:57,630 --> 00:00:59,523
是唯一可以更新和更改你的模型。
who can update and change your model.
22
00:01:00,450 --> 00:01:03,660
一个更高级的选项是私有选项。
A more advanced option is the private option.
23
00:01:03,660 --> 00:01:04,560
在这种情况下,
In this case,
24
00:01:04,560 --> 00:01:06,000
只有你的命名空间的所有者
only the owners of your namespace
25
00:01:06,000 --> 00:01:08,280
对你的模型有可见性。
will have visibility over your model.
26
00:01:08,280 --> 00:01:10,260
其他用户不会知道它的存在
Other users won't know it exists
27
00:01:10,260 --> 00:01:11,810
并且将无法使用它。
and will not be able to use it.
28
00:01:15,030 --> 00:01:17,030
让我们创建一个虚拟模型来试试看。
Let's create a dummy model to play with.
29
00:01:18,180 --> 00:01:19,710
创建模型后,
Once your model is created,
30
00:01:19,710 --> 00:01:22,230
来自该模型的管理。
comes the management of that model.
31
00:01:22,230 --> 00:01:24,360
你可以使用三个选项卡。
Three tabs are available to you.
32
00:01:24,360 --> 00:01:27,960
你面对的是第一个,这是 Model card 页面。
You're facing the first one, which is the Model card page.
33
00:01:27,960 --> 00:01:29,970
这是你用来向全世界展示模型
This is the page you use to showcase your model
34
00:01:29,970 --> 00:01:31,110
的页面。
to the world.
35
00:01:31,110 --> 00:01:33,260
我们稍后会看到它是如何完成的。
We'll see how it can be completed in a bit.
36
00:01:34,500 --> 00:01:37,503
第二个是 Files and Versions 选项卡。
The second one is the Files and Versions tab.
37
00:01:38,340 --> 00:01:40,920
你的模型本身就是一个 Git 仓库。
Your model itself is a Git repository.
38
00:01:40,920 --> 00:01:43,230
如果你不知道什么是 Git 仓库,
If you're unaware of what is a Git repository,
39
00:01:43,230 --> 00:01:46,320
你可以将其视为包含文件的文件夹。
you can think of it as a folder containing files.
40
00:01:46,320 --> 00:01:48,120
如果你以前从未使用过 Git,
If you have never used Git before,
41
00:01:48,120 --> 00:01:50,100
我们建议观看视频描述中
we recommend looking at an introduction
42
00:01:50,100 --> 00:01:52,600
提供的介绍内容。
like the one provided in this video's description.
43
00:01:53,850 --> 00:01:56,910
Git 仓库支持按照时间推移
The Git repository allows you to see the changes happening
44
00:01:56,910 --> 00:02:00,900
查看本文件夹中的变化,也就是版本。
over time in this folder, hence the term versions.
45
00:02:00,900 --> 00:02:03,453
我们稍后会看到如何添加文件和版本。
We'll see how to add files and versions in a bit.
46
00:02:07,020 --> 00:02:09,570
最后一个选项卡是 Settings 选项卡,
The final tab is the settings tab,
47
00:02:09,570 --> 00:02:12,120
可以管理模型的可见性
which allows you to manage your model's visibility
48
00:02:12,120 --> 00:02:13,203
和可用性。
and availability.
49
00:02:14,790 --> 00:02:17,673
让我们首先从将文件添加到仓库开始。
Let's first start by adding files to the repository.
50
00:02:18,540 --> 00:02:19,560
还好有 add file 按钮
Files can be added
51
00:02:19,560 --> 00:02:23,340
通过网页操作即可添加文件。
through the web interface thanks to the add file button.
52
00:02:23,340 --> 00:02:27,060
添加的文件可以是任何类型,python,JSON,纯文本,
The added files can be of any type, python, JSON, text,
53
00:02:27,060 --> 00:02:27,893
任君选择。
you name it.
54
00:02:28,740 --> 00:02:31,170
除了你添加的文件及其内容,
Alongside your added file and its content,
55
00:02:31,170 --> 00:02:33,363
你还应该命名你的 change 或 commit。
you should name your change or commit.
56
00:02:36,330 --> 00:02:38,400
通常,使用 Hugging Face Hub Python 库
Generally, adding files is simpler
57
00:02:38,400 --> 00:02:40,770
或使用命令行添加文件
by using the Hugging Face Hub Python library
58
00:02:40,770 --> 00:02:43,050
比较简单。
or by using the command-line.
59
00:02:43,050 --> 00:02:44,310
我们将展示如何使用
We'll showcase how to do this
60
00:02:44,310 --> 00:02:46,290
Hugging Face Hub Python 库做到这一点
using the Hugging Face Hub Python library,
61
00:02:46,290 --> 00:02:48,060
并且在描述中有一个链接
and there is a link in the description
62
00:02:48,060 --> 00:02:49,800
可以指向这个视频的前一个版本,
to the previous version of this video,
63
00:02:49,800 --> 00:02:52,743
展示如何使用 Git 和命令行执行此操作。
showcasing how to do this using Git and the command-line.
64
00:02:53,610 --> 00:02:54,840
首先,确保你已登录
First, make sure you're logged
65
00:02:54,840 --> 00:02:56,460
进入你的 Hugging Face 帐户,
into your Hugging Face account,
66
00:02:56,460 --> 00:02:59,523
可以通过命令行或者 Python 运行时中操作。
either through the command-line or in a Python runtime.
67
00:03:04,634 --> 00:03:06,390
我们要看的第一种方法
The first approach we'll take a look at
68
00:03:06,390 --> 00:03:08,880
正在使用 upload_file 方法。
is using the upload file method.
69
00:03:08,880 --> 00:03:10,770
这提供了一个极其简单的 API
This offers an extremely simple API
70
00:03:10,770 --> 00:03:12,630
通过 hub 上传文件。
to upload files through the hub.
71
00:03:12,630 --> 00:03:14,190
其中三个必需的参数
The three required parameters
72
00:03:14,190 --> 00:03:16,083
是文件的当前位置,
are the current location of the file,
73
00:03:18,570 --> 00:03:21,300
该文件在仓库中的路径,
the path of that file in the repository,
74
00:03:21,300 --> 00:03:24,050
以及你要推送到的仓库的标识符。
and the id of the repository to which you're pushing.
75
00:03:25,650 --> 00:03:27,930
还有一些额外的参数。
There are a few additional parameters.
76
00:03:27,930 --> 00:03:29,100
token 参数,
The token parameter,
77
00:03:29,100 --> 00:03:31,200
如果你想指定一个和登录时
if you would like to specify a different token
78
00:03:31,200 --> 00:03:33,650
所保存的不同的 token,
than the one saved in your cache with your login,
79
00:03:34,830 --> 00:03:36,750
repo_type 参数,
the repo type parameter,
80
00:03:36,750 --> 00:03:40,503
如果你想推送到 dataset 或 space。
if you would like to push to a dataset or a space.
81
00:03:42,300 --> 00:03:45,690
我们将使用这种方法上传一个名为 readme.md 的文件
We'll upload a file called readme.md to the repository
82
00:03:45,690 --> 00:03:47,190
到仓库。
using this method.
83
00:03:47,190 --> 00:03:49,710
我们首先用那个名字保存一个文件,
We first start by saving a file with that name,
84
00:03:49,710 --> 00:03:51,210
其中包含一些关于
which contains some information
85
00:03:51,210 --> 00:03:52,920
仓库本身的信息。
about the repository itself.
86
00:03:52,920 --> 00:03:54,243
在这里,一个标题。
Here, a title.
87
00:03:55,950 --> 00:03:57,420
现在文件已保存,
Now that the file is saved,
88
00:03:57,420 --> 00:04:00,513
让我们使用 upload_file 方法将其上传到 hub。
let's use the upload file method to upload it to the hub.
89
00:04:01,560 --> 00:04:03,540
如果我们切换到 Web 界面一秒钟
If we switch to the web interface for a second
90
00:04:03,540 --> 00:04:07,080
并刷新页面,我们会看到显示了 README。
and refresh the page, we'll see that the README is shown.
91
00:04:07,080 --> 00:04:08,883
文件上传成功。
The file upload was a success.
92
00:04:10,170 --> 00:04:13,500
除了这个方法之外还有一个 delete_file 方法
Alongside this method exists a delete file method
93
00:04:13,500 --> 00:04:16,170
这样你就可以完全管理你的仓库。
so that you may manage your repository fully.
94
00:04:16,170 --> 00:04:18,820
我们将使用它来删除我们刚刚创建的文件。
We'll use it to delete the file we have just created.
95
00:04:22,860 --> 00:04:25,320
如果我们再次刷新页面,很好,
If we refresh the page once again, good,
96
00:04:25,320 --> 00:04:26,973
该文件确实被删除了。
the file was indeed deleted.
97
00:04:29,070 --> 00:04:32,730
这两种方法操作起来非常简单。
This approach using only these two methods is super simple.
98
00:04:32,730 --> 00:04:35,400
它不需要安装 Git 或 Git LFS,
It doesn't need Git or Git LFS installed,
99
00:04:35,400 --> 00:04:37,650
但它确实有一个限制。
but it does come with a limitation.
100
00:04:37,650 --> 00:04:39,600
一个人可以上传的最大文件大小
The maximum file size one can upload
101
00:04:39,600 --> 00:04:41,313
限制为 5 GB。
is limited to five gigabytes.
102
00:04:42,360 --> 00:04:43,890
为了克服这个限制,
To overcome this limit,
103
00:04:43,890 --> 00:04:45,540
我们来看看第二种方法
let's take a look at the second method
104
00:04:45,540 --> 00:04:47,643
这是仓库实用程序。
which is the repository utility.
105
00:04:48,600 --> 00:04:51,840
该类封装了 Git 和 Git LFS 方法,
This class is a wrapper over Git and Git LFS methods,
106
00:04:51,840 --> 00:04:53,850
它抽象了大部分的复杂性
which abstracts most of the complexity
107
00:04:53,850 --> 00:04:55,500
并提供灵活的 API
and offers a flexible API
108
00:04:55,500 --> 00:04:57,990
管理你的在线仓库。
to manage your online repositories.
109
00:04:57,990 --> 00:04:59,690
让我们来看看它是如何工作的。
Let's take a look at how it works.
110
00:05:03,870 --> 00:05:08,369
我们首先从实例化仓库实用程序开始。
We first start by instantiating the repository utility.
111
00:05:08,369 --> 00:05:10,380
为了克隆我们刚刚创建的仓库
We provide the clone from parameter,
112
00:05:10,380 --> 00:05:13,383
我们可以通过传递参数进行克隆。
in order to clone the repository we just created.
113
00:05:14,400 --> 00:05:18,750
仓库现已克隆到本地文件夹中。
The repository is now cloned in the local folder.
114
00:05:18,750 --> 00:05:22,200
我们刚刚初始化的 repo 对象
The repo object that we have just initialized
115
00:05:22,200 --> 00:05:24,873
提供了很多对我们有用的方法。
offers quite a few methods which are useful for us.
116
00:05:25,920 --> 00:05:28,800
我们有兴趣将模型推送到 hub。
We're interested in pushing a model to the hub.
117
00:05:28,800 --> 00:05:31,170
我将从加载模型和分词器开始
I'll start by loading a model and tokenizer
118
00:05:31,170 --> 00:05:32,643
这是几个小时前训练过的。
I trained a few hours ago.
119
00:05:34,380 --> 00:05:36,810
我们现在将遵循传统的 Git 方法
We'll now follow the traditional Git approach
120
00:05:36,810 --> 00:05:38,670
首先 pull 最新的更改内容
by first pulling latest changes
121
00:05:38,670 --> 00:05:40,053
使用 git_pull 方法。
using the git_pull method.
122
00:05:40,980 --> 00:05:43,170
我们刚刚克隆了仓库,
We just cloned the repository,
123
00:05:43,170 --> 00:05:45,780
所以除非这是一个超级活跃的仓库,
so unless this is a super active repository,
124
00:05:45,780 --> 00:05:48,660
否则不太可能内容的变化。
it's unlikely that new changes are available.
125
00:05:48,660 --> 00:05:51,000
但在做任何新的事情之前养成 pull 最新内容
But it's always a good idea to pull the latest changes
126
00:05:51,000 --> 00:05:52,300
的好习惯也是不错的。
before doing anything new.
127
00:05:53,220 --> 00:05:55,200
现在我们已经 pull 了仓库,
Now that we have pulled the repository,
128
00:05:55,200 --> 00:05:58,500
我会将模型和分词器保存在该文件夹中。
I'll save the model and tokenizer inside that folder.
129
00:05:58,500 --> 00:06:01,200
这包括模型权重、配置文件、
This includes the model weights, configuration file,
130
00:06:01,200 --> 00:06:02,673
和分词器文件。
and tokenizer files.
131
00:06:04,440 --> 00:06:05,820
现在模型已保存,
Now that the model is saved,
132
00:06:05,820 --> 00:06:07,890
我们将继续使用传统的 Git 方法
we'll continue with the traditional Git approach
133
00:06:07,890 --> 00:06:10,620
并将其推送到远程仓库。
and push it to the remote repository.
134
00:06:10,620 --> 00:06:12,150
如果我们使用命令行,
If we were using the command-line,
135
00:06:12,150 --> 00:06:14,250
我们将不得不调用一些
there are a few Git LFS specific commands
136
00:06:14,250 --> 00:06:15,600
特定的 Git LFS 命令。
we would have to invoke.
137
00:06:15,600 --> 00:06:17,940
但是在这里,huggingface_hub 包
But here, the Hugging Face hub package
138
00:06:17,940 --> 00:06:20,070
会处理所有这些。
takes care of all of that.
139
00:06:20,070 --> 00:06:24,420
我们将从使用 git_add 方法暂存文件开始。
We'll start by staging the files using the git_add method.
140
00:06:24,420 --> 00:06:27,600
然后我们将使用 git_commit 方法提交这些更改,
We'll then commit these changes using Git commit method,
141
00:06:27,600 --> 00:06:30,690
并提供有用的 commit 信息。
and providing a helpful commit message.
142
00:06:30,690 --> 00:06:33,210
最后,我们将更改推送到远端,
Finally, we'll push the changes to the remote,
143
00:06:33,210 --> 00:06:34,953
使用 git_push 方法。
using the Git push method.
144
00:06:45,090 --> 00:06:47,430
如果我们回到 Files and Versions 选项卡,
If we go back to the Files and Versions tab,
145
00:06:47,430 --> 00:06:49,950
我们现在可以看到新提交的文件。
we can now see the newly committed files.
146
00:06:49,950 --> 00:06:52,600
我们甚至可以在 inference API 中使用模型。
We can even play with the model in the inference API.
147
00:06:53,790 --> 00:06:55,770
不幸的是,我们模型的首页
Unfortunately, the front page of our model
148
00:06:55,770 --> 00:06:57,540
还是显得非常空。
is still very empty.
149
00:06:57,540 --> 00:06:59,280
让我们添加一个 README markdown 文件
Let's add a README markdown file
150
00:06:59,280 --> 00:07:00,753
让整体显得完整一点点。
to complete it a little bit.
151
00:07:01,710 --> 00:07:04,200
这个 README 被称为 Model card
This README is known as the Model card
152
00:07:04,200 --> 00:07:06,030
可以说它同样重要
and it's arguably as important
153
00:07:06,030 --> 00:07:09,330
作为模型仓库中的模型和分词器文件。
as the model and tokenizer files in the model repository.
154
00:07:09,330 --> 00:07:11,280
这是你的模型的综合定义
It is the central definition
155
00:07:11,280 --> 00:07:13,200
和模型文档,
and documentation of your model,
156
00:07:13,200 --> 00:07:16,440
确保社区成员的可重用性
ensuring reusability by fellow community members
157
00:07:16,440 --> 00:07:18,480
和结果的可重复性。
and reproducibility of results.
158
00:07:18,480 --> 00:07:20,760
提供一个平台,让其他成员
Providing a platform on which other members
159
00:07:20,760 --> 00:07:22,293
可以构建他们的工件。
may build their artifacts.
160
00:07:23,220 --> 00:07:25,590
为了简单起见我们只会在此处添加标题
We'll only add a title and a small description here
161
00:07:25,590 --> 00:07:27,060
和简短描述,
for simplicity's sake,
162
00:07:27,060 --> 00:07:29,370
但我们鼓励你添加相关信息
but we encourage you to add information relevant
163
00:07:29,370 --> 00:07:30,990
说明模型是如何训练的,
to how was the model trained,
164
00:07:30,990 --> 00:07:33,120
它的预期用途和限制,
it's intended use and limitations,
165
00:07:33,120 --> 00:07:36,180
以及目前一直的潜在偏差,
as well as it's identified potential biases,
166
00:07:36,180 --> 00:07:37,440
评估结果,
evaluation results,
167
00:07:37,440 --> 00:07:39,843
以及有关如何使用你的模型的代码示例。
and code samples on how to use your model.
168
00:07:41,460 --> 00:07:44,130
为 Model Hub 贡献出色的模型。
Great work contributing a model to the Model Hub.
169
00:07:44,130 --> 00:07:46,440
该模型现在可以在下游库中使用
This model can now be used in downstream libraries
170
00:07:46,440 --> 00:07:48,783
只需指定你的模型标识符。
simply by specifying your model identifier.