-
Notifications
You must be signed in to change notification settings - Fork 776
/
Copy path58_what-is-domain-adaptation.srt
200 lines (160 loc) · 4.75 KB
/
58_what-is-domain-adaptation.srt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
1
00:00:00,000 --> 00:00:01,402
(空气呼啸)
(air whooshing)
2
00:00:01,402 --> 00:00:02,720
(笑脸拍打)
(smiley snapping)
3
00:00:02,720 --> 00:00:05,910
(空气呼啸)
(air whooshing)
4
00:00:05,910 --> 00:00:07,923
- 什么是域适配?
- What is domain adaptation?
5
00:00:09,540 --> 00:00:12,540
在新数据集上微调预训练模型时,
When fine-tuning a pre-trained model on a new dataset,
6
00:00:12,540 --> 00:00:15,480
我们适配新的数据集所获得的微调模型
the fine-tuned model we obtain will make predictions
7
00:00:15,480 --> 00:00:17,433
将做出预测。
that are attuned to this new dataset.
8
00:00:18,840 --> 00:00:21,840
当两个模型用相同的任务训练时,
When the two models are trained with the same task,
9
00:00:21,840 --> 00:00:25,320
我们可以使用相同的输入比较他们的预测结果。
we can then compare their predictions on the same input.
10
00:00:25,320 --> 00:00:27,870
两个模型的预测结果
The predictions of the two models will be different
11
00:00:27,870 --> 00:00:29,790
会以一种方式反映
in a way that reflects the differences
12
00:00:29,790 --> 00:00:31,680
两个数据集之间的差别
between the two datasets,
13
00:00:31,680 --> 00:00:34,053
就是我们称之为域适配的现象。
a phenomenon we call domain adaptation.
14
00:00:35,310 --> 00:00:38,640
让我们通过带有版本微调
Let's look at an example with masked language modeling
15
00:00:38,640 --> 00:00:41,910
比较预训练的 DistilBERT 模型的输出
by comparing the outputs of the pre-trained DistilBERT model
16
00:00:41,910 --> 00:00:43,080
看一个和掩码语言建模相关的例子
with the version fine-tuned
17
00:00:43,080 --> 00:00:45,273
该内容在课程的第 7 章中,链接如下。
in chapter 7 of the course, linked below.
18
00:00:46,500 --> 00:00:49,140
预训练模型进行通用预测,
The pre-trained model makes generic predictions,
19
00:00:49,140 --> 00:00:50,580
而微调模型
whereas the fine-tuned model
20
00:00:50,580 --> 00:00:53,253
它的前两个预测与电影有关。
has its first two predictions linked to cinema.
21
00:00:54,390 --> 00:00:57,210
由于它是基于电影评论数据集上进行了微调,
Since it was fine-tuned on a movie reviews dataset,
22
00:00:57,210 --> 00:00:58,680
因此它像这样调整它的推荐结果
it's perfectly normal to see
23
00:00:58,680 --> 00:01:01,440
是完全正常的。
it adapted its suggestions like this.
24
00:01:01,440 --> 00:01:03,090
注意它作为之后的预训练模型
Notice how it keeps the same prediction
25
00:01:03,090 --> 00:01:05,220
如何保持相同的预测。
as the pre-trained model afterward.
26
00:01:05,220 --> 00:01:08,100
即使微调后的模型适应了新的数据集,
Even if the fine-tuned model adapts to the new dataset,
27
00:01:08,100 --> 00:01:10,450
它不会遗失预先训练的内容。
it's not forgetting what it was pre-trained on.
28
00:01:11,490 --> 00:01:14,220
这是翻译任务的另一个例子。
This is another example on a translation task.
29
00:01:14,220 --> 00:01:17,310
在上面的代码里,我们使用预训练的法语 / 英语模型,
On top, we use a pre-trained French/English model,
30
00:01:17,310 --> 00:01:21,330
在下面的代码里,是我们在第 7 章中微调的版本。
and at the bottom, the version we fine-tuned in chapter 7.
31
00:01:21,330 --> 00:01:23,610
上面的模型在大量文本上进行了预训练,
The top model is pre-trained on lots of texts,
32
00:01:23,610 --> 00:01:25,170
并保留了英文中的技术术语,
and leaves technical English terms,
33
00:01:25,170 --> 00:01:28,350
像 plugin 和 email 这样的单词,是不会被翻译的。
like plugin and email, unchanged in the translation.
34
00:01:28,350 --> 00:01:31,350
法国用户都可以很好地理解两者。
Both are perfectly understood by French people.
35
00:01:31,350 --> 00:01:33,780
为微调模型选择的数据集是
The dataset picked for the fine-tuning is a dataset
36
00:01:33,780 --> 00:01:36,660
一个包含技术文本的数据集
of technical texts where special attention was picked
37
00:01:36,660 --> 00:01:39,150
其中特别将所有内容都翻译为法语。
on translating everything in French.
38
00:01:39,150 --> 00:01:42,090
结果,经过微调的模型适应了该特征
As a result, the fine-tuned model picked that habit
39
00:01:42,090 --> 00:01:44,193
并翻译了 plugin 和 email 两个词。
and translated both plugin and email.
40
00:01:45,942 --> 00:01:50,592
(空气呼啸)
(air whooshing)