-
Notifications
You must be signed in to change notification settings - Fork 776
/
Copy path04_the-transformer-architecture.srt
320 lines (256 loc) · 7.07 KB
/
04_the-transformer-architecture.srt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
1
00:00:00,000 --> 00:00:02,750
(徽标呼啸而过)
(logo whooshing)
2
00:00:05,010 --> 00:00:07,323
- 让我们 Transformer 的架构。
- Let's study the transformer architecture.
3
00:00:09,150 --> 00:00:12,030
该视频是编码器的介绍视频,
This video is the introductory video to the encoders,
4
00:00:12,030 --> 00:00:15,510
解码器和编码器 - 解码器系列视频。
decoders, and encoder-decoder series of videos.
5
00:00:15,510 --> 00:00:16,343
在这个系列中,
In this series,
6
00:00:16,343 --> 00:00:18,900
我们将尝试了解是什么构成了 transformer 网络,
we'll try to understand what makes a transformer network,
7
00:00:18,900 --> 00:00:22,770
我们将尝试用简单、高层次的术语来解释它。
and we'll try to explain it in simple, high-level terms.
8
00:00:22,770 --> 00:00:25,800
无需深入了解神经网络,
No advanced understanding of neural networks is necessary,
9
00:00:25,800 --> 00:00:29,343
但了解基本向量和张量可能会有所帮助。
but an understanding of basic vectors and tensors may help.
10
00:00:32,250 --> 00:00:33,270
开始,
To get started,
11
00:00:33,270 --> 00:00:34,530
我们将处理这张图
we'll take up this diagram
12
00:00:34,530 --> 00:00:36,630
从原来的变压器纸,
from the original transformer paper,
13
00:00:36,630 --> 00:00:40,140
Vaswani 等人题为 “注意力就是你所需要的”。
entitled "Attention Is All You Need", by Vaswani et al.
14
00:00:40,140 --> 00:00:41,010
正如我们将在这里看到的,
As we'll see here,
15
00:00:41,010 --> 00:00:42,780
我们只能利用它的一部分,
we can leverage only some parts of it,
16
00:00:42,780 --> 00:00:44,630
根据我们正在尝试做的事情。
according to what we're trying to do.
17
00:00:45,480 --> 00:00:47,610
我们想深入到特定的层次,
We want to dive into the specific layers,
18
00:00:47,610 --> 00:00:48,990
建立那个架构,
building up that architecture,
19
00:00:48,990 --> 00:00:51,390
但我们会尝试理解不同的方式
but we'll try to understand the different ways
20
00:00:51,390 --> 00:00:52,893
可以使用此架构。
this architecture can be used.
21
00:00:55,170 --> 00:00:56,003
让我们先开始
Let's first start
22
00:00:56,003 --> 00:00:58,260
通过将该架构分成两部分。
by splitting that architecture into two parts.
23
00:00:58,260 --> 00:00:59,910
在左边我们有编码器,
On the left we have the encoder,
24
00:00:59,910 --> 00:01:01,980
右边是解码器。
and on the right, the decoder.
25
00:01:01,980 --> 00:01:03,330
这两个可以一起使用,
These two can be used together,
26
00:01:03,330 --> 00:01:05,330
但它们也可以独立使用。
but they can also be used independently.
27
00:01:06,180 --> 00:01:08,610
让我们了解这些是如何工作的。
Let's understand how these work.
28
00:01:08,610 --> 00:01:11,460
编码器接受表示文本的输入。
The encoder accepts inputs that represent text.
29
00:01:11,460 --> 00:01:13,620
它转换这个文本,这些词,
It converts this text, these words,
30
00:01:13,620 --> 00:01:15,675
成数值表示。
into numerical representations.
31
00:01:15,675 --> 00:01:17,400
这些数值表示
These numerical representations
32
00:01:17,400 --> 00:01:20,460
也可以称为嵌入或特征。
can also be called embeddings, or features.
33
00:01:20,460 --> 00:01:23,100
我们会看到它使用了 self-attention 机制
We'll see that it uses the self-attention mechanism
34
00:01:23,100 --> 00:01:24,483
作为其主要组成部分。
as its main component.
35
00:01:25,500 --> 00:01:27,120
我们建议你查看视频
We recommend you check out the video
36
00:01:27,120 --> 00:01:29,700
关于编码器具体要了解
on encoders specifically to understand
37
00:01:29,700 --> 00:01:31,680
这个数字表示是什么,
what is this numerical representation,
38
00:01:31,680 --> 00:01:33,690
以及它是如何工作的。
as well as how it works.
39
00:01:33,690 --> 00:01:36,660
我们将更详细地研究自注意力机制,
We'll study the self-attention mechanism in more detail,
40
00:01:36,660 --> 00:01:38,913
以及它的双向属性。
as well as its bi-directional properties.
41
00:01:40,650 --> 00:01:42,780
解码器类似于编码器。
The decoder is similar to the encoder.
42
00:01:42,780 --> 00:01:45,630
它还可以接受文本输入。
It can also accept text inputs.
43
00:01:45,630 --> 00:01:48,210
它使用与编码器类似的机制,
It uses a similar mechanism as the encoder,
44
00:01:48,210 --> 00:01:51,150
这也是掩蔽的自我关注。
which is the masked self-attention as well.
45
00:01:51,150 --> 00:01:52,590
它不同于编码器
It differs from the encoder
46
00:01:52,590 --> 00:01:54,990
由于其单向特性
due to its uni-directional feature
47
00:01:54,990 --> 00:01:58,590
并且传统上以自回归方式使用。
and is traditionally used in an auto-regressive manner.
48
00:01:58,590 --> 00:02:01,650
在这里,我们也建议你查看有关解码器的视频
Here too, we recommend you check out the video on decoders
49
00:02:01,650 --> 00:02:04,000
特别是要了解所有这些是如何工作的。
especially to understand how all of this works.
50
00:02:06,810 --> 00:02:07,890
结合两部分
Combining the two parts
51
00:02:07,890 --> 00:02:10,200
结果就是所谓的编码器 - 解码器,
results in what is known as an encoder-decoder,
52
00:02:10,200 --> 00:02:12,720
或序列到序列转换器。
or a sequence-to-sequence transformer.
53
00:02:12,720 --> 00:02:14,280
编码器接受输入
The encoder accepts inputs
54
00:02:14,280 --> 00:02:17,850
并计算这些输入的高级表示。
and computes a high-level representation of those inputs.
55
00:02:17,850 --> 00:02:20,252
然后将这些输出传递给解码器。
These outputs are then passed to the decoder.
56
00:02:20,252 --> 00:02:22,860
解码器使用编码器的输出,
The decoder uses the encoder's output,
57
00:02:22,860 --> 00:02:26,370
与其他输入一起生成预测。
alongside other inputs to generate a prediction.
58
00:02:26,370 --> 00:02:27,900
然后它预测输出,
It then predicts an output,
59
00:02:27,900 --> 00:02:30,248
它将在未来的迭代中重复使用,
which it will re-use in future iterations,
60
00:02:30,248 --> 00:02:32,662
因此,术语自回归。
hence the term, auto-regressive.
61
00:02:32,662 --> 00:02:34,740
最后,为了理解
Finally, to get an understanding
62
00:02:34,740 --> 00:02:36,690
编码器 - 解码器作为一个整体,
of the encoder-decoders as a whole,
63
00:02:36,690 --> 00:02:39,670
我们建议你查看有关编码器 - 解码器的视频。
we recommend you check out the video on encoder-decoders.
64
00:02:39,670 --> 00:02:42,420
(徽标呼啸而过)
(logo whooshing)