-
Notifications
You must be signed in to change notification settings - Fork 770
/
Copy path02_the-carbon-footprint-of-transformers.srt
645 lines (516 loc) · 15.1 KB
/
02_the-carbon-footprint-of-transformers.srt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
1
00:00:05,580 --> 00:00:08,820
- 让我们谈谈 transformer 的碳足迹。
- So let's talk about the carbon footprint of transformers.
2
00:00:08,820 --> 00:00:10,530
也许您看过这样的头条新闻
Maybe you've seen headlines such as this one
3
00:00:10,530 --> 00:00:13,530
训练单个 AI 模型排放的碳含量
that training a single AI model can emit as much carbon
4
00:00:13,530 --> 00:00:16,020
相当于五辆汽车生命周期的总排放量。
as five cars in their lifetimes.
5
00:00:16,020 --> 00:00:19,440
那么什么时候是真的,而且总是真的吗?
So when is this true and is it always true?
6
00:00:19,440 --> 00:00:21,803
其实呢,这实际上取决于几个因素。
Well, it actually depends on several things.
7
00:00:21,803 --> 00:00:23,430
最重要的一点,这取决于
Most importantly, it depends
8
00:00:23,430 --> 00:00:24,960
您所使用的能源类型。
on the type of energy you're using.
9
00:00:24,960 --> 00:00:26,267
如果您使用的是可再生能源,例如
If you're using renewable energy such as
10
00:00:26,267 --> 00:00:30,670
太阳能、风能、水力发电,那么
solar, wind, hydroelectricity, you're really
11
00:00:30,670 --> 00:00:33,810
根本不会排放任何碳,非常非常少。
not emitting any carbon at all, very, very little.
12
00:00:33,810 --> 00:00:36,769
如果您使用的是煤炭等不可再生能源
If you're using non-renewable energy sources such as coal
13
00:00:36,769 --> 00:00:39,570
那么它们的碳足迹要高得多
then their carbon footprint is a lot higher
14
00:00:39,570 --> 00:00:43,260
因为本质上您正在排放大量的温室气体。
'cuz essentially you are emitting a lot of greenhouse gases.
15
00:00:43,260 --> 00:00:44,670
另一个因素是训练时间。
Another aspect is training time.
16
00:00:44,670 --> 00:00:47,232
所以您训练的时间越长,消耗的能量就越多
So the longer you train, the more energy you use
17
00:00:47,232 --> 00:00:50,250
您使用的能源越多,您排放的碳就越多,对吗?
the more energy you use, the more carbon you emit, right?
18
00:00:50,250 --> 00:00:51,270
所以所有这些因素一起进行考虑
So this really adds up
19
00:00:51,270 --> 00:00:53,520
特别是如果您正在训练大型模型
especially if you're training large models for
20
00:00:53,520 --> 00:00:56,460
且持续了数小时、数天或数周的时间。
for hours and days and weeks.
21
00:00:56,460 --> 00:00:58,380
您使用的硬件也很重要
The hardware you use also matters
22
00:00:58,380 --> 00:01:00,930
例如某些 GPU 效率更高
because some GPUs, for example, are more efficient
23
00:01:00,930 --> 00:01:05,460
相比较别的硬件来说,其利用效率使用得当。
than others and utilizing efficiency use properly.
24
00:01:05,460 --> 00:01:07,500
一直能够百分百地被使用
So using them a hundred percent all the time
25
00:01:07,500 --> 00:01:10,650
可以真正减少您的能源消耗。
can really reduce the energy consumption that you have.
26
00:01:10,650 --> 00:01:13,290
进一步减少您的碳足迹。
And then once again, reduce your carbon footprint.
27
00:01:13,290 --> 00:01:15,870
还有其他因素比如 IO
There's also other aspects such as IO
28
00:01:15,870 --> 00:01:17,730
比如数据,等等。
such as data, et cetera, et cetera.
29
00:01:17,730 --> 00:01:20,940
但这三点是您应该关注的主要因素。
But these are the main three that you should focus on.
30
00:01:20,940 --> 00:01:23,340
所以当我谈论能源和碳强度时
So when I talk about energy sources and carbon intensity
31
00:01:23,340 --> 00:01:24,420
那个的真实意义是什么?
what does that really mean?
32
00:01:24,420 --> 00:01:27,480
所以如果您看屏幕顶部
So if you look at the top of the screen
33
00:01:27,480 --> 00:01:30,480
您可以看到印度孟买的云计算实例
you have a carbon footprint
34
00:01:30,480 --> 00:01:33,860
所产生的碳足迹
of a cloud computing instance in Mumbai, India
35
00:01:33,860 --> 00:01:38,700
每千瓦时排放 920 克二氧化碳。
which emits 920 grams of CO2 per kilowatt hour.
36
00:01:38,700 --> 00:01:40,110
这差不多有一公斤
This is almost one kilogram
37
00:01:40,110 --> 00:01:43,680
每千瓦时电力使用的二氧化碳排放量。
of CO2 per kilowatt hour of electricity used.
38
00:01:43,680 --> 00:01:45,150
如果您把它与加拿大蒙特利尔,
If you compare that with Canada, Montreal
39
00:01:45,150 --> 00:01:48,720
也就是我现在所在的位置相比,每千克小时排放 20 克二氧化碳。
where I am right now, 20 grams of CO2 per kilo hour.
40
00:01:48,720 --> 00:01:50,040
所以它们有着非常大的区别。
So that's a really, really big difference.
41
00:01:50,040 --> 00:01:54,240
碳排放量几乎增加了 40 倍
Almost more than 40 times more carbon emitted
42
00:01:54,240 --> 00:01:55,950
在孟买对蒙特利尔。
in Mumbai versus Montreal.
43
00:01:55,950 --> 00:01:57,720
所以这真的都需要考虑进去。
And so this can really, really add up.
44
00:01:57,720 --> 00:01:59,820
例如,如果您要训练一个模型数周
If you're training a model for weeks, for example
45
00:01:59,820 --> 00:02:01,920
您乘以 40
you're multiplying times 40
46
00:02:01,920 --> 00:02:03,450
您排放的碳。
the carbon that you're emitting.
47
00:02:03,450 --> 00:02:05,070
因此选择合适的实例
So choosing the right instance
48
00:02:05,070 --> 00:02:07,080
选择低碳计算实例
choosing a low carbon compute instance
49
00:02:07,080 --> 00:02:09,690
这真的是您能做的最有影响力的事情。
is really the most impactful thing that you can do.
50
00:02:09,690 --> 00:02:13,020
这就是它真正可以产生影响的地方
And this is where it can really add up
51
00:02:13,020 --> 00:02:15,930
如果您正在一个碳密集的地区
if you're training in a very intensive
52
00:02:15,930 --> 00:02:17,580
进行非常密集的训练
in a very carbon intensive region
53
00:02:19,170 --> 00:02:21,750
其他要考虑的因素,例如
other elements to consider, for example
54
00:02:21,750 --> 00:02:22,770
使用预训练模型
using pre-trained models
55
00:02:22,770 --> 00:02:25,590
这就是回收的机器学习等价物。
that's the machine learning equivalent of recycling.
56
00:02:25,590 --> 00:02:28,292
当您有可用的预训练模型时
When you have pre-trained models available using them
57
00:02:28,292 --> 00:02:30,120
您根本没有排放任何碳,对吧?
you're not emitting any carbon at all, right?
58
00:02:30,120 --> 00:02:31,230
因为您没有在训练任何东西。
You're not retraining anything.
59
00:02:31,230 --> 00:02:33,450
因此先看看当前已经有了哪些工具
So that's also doing your homework
60
00:02:33,450 --> 00:02:35,574
能够帮助您处理所需要进行的任务。
and kind of looking around what already exists.
61
00:02:35,574 --> 00:02:37,890
微调而不是从头开始训练。
Fine tuning instead of training from scratch.
62
00:02:37,890 --> 00:02:38,723
所以再一次
So once again
63
00:02:38,723 --> 00:02:40,590
如果您找到几乎是您需要的模型
if you find a model that is almost what you need
64
00:02:40,590 --> 00:02:43,530
但对最后几层的调整不是很精细
but not quite fine tuning the last couple of layers
65
00:02:43,530 --> 00:02:45,210
通过调整来达到目的
making it really fit your purpose instead
66
00:02:45,210 --> 00:02:46,500
而不是从头通过训练 transformer
of training a large transformer
67
00:02:46,500 --> 00:02:48,810
这样的话会大大提高您的效率
from scratch can really help,
68
00:02:48,810 --> 00:02:51,270
从较小的实验开始
starting with smaller experiments
69
00:02:51,270 --> 00:02:52,800
并边调试边工作。
and debugging as you go.
70
00:02:52,800 --> 00:02:54,630
这意味着,例如,培训
So that means, for example, training
71
00:02:54,630 --> 00:02:58,770
弄清楚数据编码
figuring out data encoding, figuring out, you know
72
00:02:58,770 --> 00:03:01,170
确保没有小错误
making sure that there's no small bugs, that you'll
73
00:03:01,170 --> 00:03:03,840
您会意识到,经过 16 个小时的训练
you'll realize, you know, 16 hours into training
74
00:03:03,840 --> 00:03:05,820
从小事做起,真正确保
starting small and really making sure
75
00:03:05,820 --> 00:03:08,760
您在做什么,您的代码是什么,这样才是稳妥的。
that what you're doing, what your code is, is stable.
76
00:03:08,760 --> 00:03:11,430
最后做一个文献综述
And then finally doing a literature review to
77
00:03:11,430 --> 00:03:13,740
选择超参数范围,然后跟随
choose hyper parameter ranges and then following
78
00:03:13,740 --> 00:03:15,900
使用随机搜索而不是网格搜索。
up with a random search instead of a grid search.
79
00:03:15,900 --> 00:03:18,420
所以随机搜索超参数
So random searches for hyper parameters
80
00:03:18,420 --> 00:03:21,300
在寻找最佳配置作为网格搜索时
combinations have actually shown to be as efficient
81
00:03:21,300 --> 00:03:24,000
组合实际上被证明是有效的。
in finding the optimal configuration as grid search.
82
00:03:24,000 --> 00:03:27,510
但显然您并没有尝试所有可能的组合
But obviously you're not trying all possible combinations
83
00:03:27,510 --> 00:03:29,520
您只是在尝试其中的一部分。
you're only trying a subset of them.
84
00:03:29,520 --> 00:03:31,800
所以这也很有帮助。
So this can really help as well.
85
00:03:31,800 --> 00:03:32,760
所以现在如果我们回去
So now if we go back
86
00:03:32,760 --> 00:03:36,300
2019 年 Strubell 等人的原始论文
to the original paper by Strubell et all in 2019
87
00:03:36,300 --> 00:03:39,180
关于那五辆车的论文中。
the infamous five cars in their lifetimes paper.
88
00:03:39,180 --> 00:03:40,013
如果您只是考虑
If you just look
89
00:03:40,013 --> 00:03:43,606
一个 2 亿周边 transformer 的因素
at a transformer of 200 million perimeter transformer
90
00:03:43,606 --> 00:03:46,950
它的碳足迹约为 200 磅二氧化碳
it is carbon footprint is around 200 pounds of CO2
91
00:03:46,950 --> 00:03:47,940
这很多
which is significant
92
00:03:47,940 --> 00:03:49,980
但它离五辆汽车还差得很远,对吧?
but it's nowhere near five cars, right?
93
00:03:49,980 --> 00:03:52,893
这甚至不是跨大西洋航班。
It's not even a transatlantic flight.
94
00:03:52,893 --> 00:03:55,020
它真正到达这一量级的方式是当您在做
How it really adds up is when you're doing
95
00:03:55,020 --> 00:03:56,190
神经架构搜索的时候
neural architecture search
96
00:03:56,190 --> 00:03:58,560
当您进行超参数调整时,以及
when you're doing hyper parameter tuning, and
97
00:03:58,560 --> 00:04:00,930
在尝试所有可能的组合的时候
this is trying all possible combinations
98
00:04:00,930 --> 00:04:01,763
等等,等等。
et cetera, et cetera.
99
00:04:01,763 --> 00:04:02,596
这是就像
And this is where
100
00:04:02,596 --> 00:04:05,400
就像 600,000 磅的二氧化碳来自哪里一样。
like the 600,000 pounds of CO2 came from.
101
00:04:05,400 --> 00:04:08,490
所以这真的是需要把所有因素考虑进来才可以。
So this is really where things add up.
102
00:04:08,490 --> 00:04:11,880
所以,但如果您正认真地做事
So, but if you're doing things mindfully and conscientiously
103
00:04:11,880 --> 00:04:16,410
那么您的碳足迹就不会那么大,
then your carbon footprint wont be as big as,
104
00:04:16,410 --> 00:04:20,040
正如本文所暗示的,一些工具可以用来计算
as the paper implied, some tools to figure
105
00:04:20,040 --> 00:04:22,111
出您排放的 CO2 量。
out how much CO2 exactly you're emitting.
106
00:04:22,111 --> 00:04:24,270
有一个基于网络的工具叫做 machine
There's a web-based tool called machine
107
00:04:24,270 --> 00:04:26,430
学习提交计算器,它可以让您
learning submissions calculator, which allows you
108
00:04:26,430 --> 00:04:29,010
手动输入,例如,您使用的硬件
to manually input, for example, which hardware you used
109
00:04:29,010 --> 00:04:30,488
您用了多少小时
how many hours you used it for
110
00:04:30,488 --> 00:04:34,260
它位于本地或云端。
where it was located locally or in the cloud.
111
00:04:34,260 --> 00:04:35,640
然后它会给您一个估计
And then it's gonna give you an estimate
112
00:04:35,640 --> 00:04:37,560
您排放了多少二氧化碳。
of how much CO2 you emitted.
113
00:04:37,560 --> 00:04:40,200
另一个以编程方式执行此操作的工具,
Another tool that does this programmatically,
114
00:04:40,200 --> 00:04:41,190
称为代号碳。
is called code carbon.
115
00:04:41,190 --> 00:04:45,112
您可以 PIP 安装它,您可以去 GitHub
So you can PIP install it, you can, you can go to the GitHub
116
00:04:45,112 --> 00:04:48,120
它与您的代码并行运行。
and essentially it runs in parallel to your code.
117
00:04:48,120 --> 00:04:49,085
所以基本上您调用它
So essentially you call it
118
00:04:49,085 --> 00:04:51,060
然后交给它做所有的训练。
and then you do all your training.
119
00:04:51,060 --> 00:04:53,760
最后它会给您一个估计
And then at the end it's gonna give you an estimate
120
00:04:53,760 --> 00:04:57,210
包含排放量估算值的 CSV 文件。
a CSV file with an estimate of your emissions.
121
00:04:57,210 --> 00:04:59,250
它会给您一些比较。
And it's gonna give you some comparisons.
122
00:04:59,250 --> 00:05:01,230
它有一个可视化用户界面,您可以在其中真正看到
It's got a visual UI where you can really look
123
00:05:01,230 --> 00:05:04,680
这与开车或看电视相比如何。
at how this compares to driving a car or watching TV.
124
00:05:04,680 --> 00:05:06,060
所以它可以给您一个想法
So it can give you an idea
125
00:05:06,060 --> 00:05:07,740
您的排放范围也是如此。
of the scope of your emissions as well.
126
00:05:07,740 --> 00:05:09,930
实际上,code carbon 已经集成到 auto
And actually, code carbon is already integrated into auto
127
00:05:09,930 --> 00:05:12,270
和 LP 中,希望人们能够
and LP and hopefully people will be using it
128
00:05:12,270 --> 00:05:15,240
开箱即用,轻松跟踪所有训练和部署 transformer
out of the box and easily tracking their emissions all
129
00:05:15,240 --> 00:05:17,523
的碳排放。
through training and deploying transformers.