Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs(reactant): simplify the enzyme call (#987)
* docs(reactant): simplify the enzyme call * docs: fix enzyme call
- Loading branch information
817ce1a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lux Benchmarks
Dense(512 => 512, identity)(512 x 128)/forward/CPU/2 thread(s)
70542
ns398708.5
ns0.18
Dense(512 => 512, identity)(512 x 128)/forward/CPU/4 thread(s)
73167
ns72333.5
ns1.01
Dense(512 => 512, identity)(512 x 128)/forward/CPU/8 thread(s)
74208
ns74041.5
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/1 thread(s)
71541
ns71167
ns1.01
Dense(512 => 512, identity)(512 x 128)/forward/GPU/CUDA
43995
ns44998
ns0.98
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/2 thread(s)
270729
ns1310521
ns0.21
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/4 thread(s)
325667
ns272334
ns1.20
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/8 thread(s)
270604
ns260208
ns1.04
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/1 thread(s)
313917
ns286645.5
ns1.10
Dense(512 => 512, identity)(512 x 128)/zygote/GPU/CUDA
194622
ns192473
ns1.01
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/2 thread(s)
403875
ns1286000
ns0.31
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/4 thread(s)
406167
ns408333
ns0.99
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/8 thread(s)
403500
ns425666
ns0.95
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/1 thread(s)
320625
ns336750
ns0.95
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1536229.5
ns1782417
ns0.86
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1202875
ns1203812.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1389000.5
ns1389854
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
2432625
ns2353000.5
ns1.03
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/GPU/CUDA
212925
ns213060
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12265687.5
ns12139583.5
ns1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
9563916.5
ns9558042
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9302437.5
ns9325479.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
18003541
ns18029333
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1895448
ns1903843
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17325000
ns17304896
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
14341125
ns14365229
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14345958
ns14311583.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
21071084
ns21182146
ns0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
122021416.5
ns250844916
ns0.49
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
174130042
ns174043167
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
149172395.5
ns147706708.5
ns1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
107349875.5
ns104215334
ns1.03
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5475745
ns5509992
ns0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
584274792
ns1208153416
ns0.48
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
533708583
ns535649167
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
443275083.5
ns438878250
ns1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
630283625
ns631915667
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
38140342
ns38034017.5
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
704421479
ns1069304583
ns0.66
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
675174833
ns667503333
ns1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
596452895.5
ns616964458.5
ns0.97
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
745204646
ns744392396
ns1.00
lenet(28, 28, 1, 32)/forward/CPU/2 thread(s)
869333
ns1118938
ns0.78
lenet(28, 28, 1, 32)/forward/CPU/4 thread(s)
817958
ns826541.5
ns0.99
lenet(28, 28, 1, 32)/forward/CPU/8 thread(s)
1226229
ns1218813
ns1.01
lenet(28, 28, 1, 32)/forward/CPU/1 thread(s)
963333
ns944417
ns1.02
lenet(28, 28, 1, 32)/forward/GPU/CUDA
268571
ns275412.5
ns0.98
lenet(28, 28, 1, 32)/zygote/CPU/2 thread(s)
2688583
ns3230584
ns0.83
lenet(28, 28, 1, 32)/zygote/CPU/4 thread(s)
2413000
ns2410042
ns1.00
lenet(28, 28, 1, 32)/zygote/CPU/8 thread(s)
3293125
ns3297917
ns1.00
lenet(28, 28, 1, 32)/zygote/CPU/1 thread(s)
3266709
ns3279687.5
ns1.00
lenet(28, 28, 1, 32)/zygote/GPU/CUDA
1066059
ns1070834
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
6707687
ns6940479
ns0.97
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
6421541
ns6345229.5
ns1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
6561458.5
ns6498146
ns1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
7617542
ns7619292
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
210412
ns210830
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
24351959
ns24348020.5
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
21777750.5
ns21802625
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
21667208
ns21625979
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
29689167
ns29737250
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1975433.5
ns1970602
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
48587208
ns37161750
ns1.31
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
45703354.5
ns45515166
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
45666250
ns45712979.5
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
49344271
ns49484646
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
13366687
ns13755125
ns0.97
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
12397708.5
ns12438333
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
12505375
ns12501666.5
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
15199833
ns15168208
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
513574.5
ns513180
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
47354416
ns47815084
ns0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
41793209
ns41719750
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
41201583.5
ns41210729.5
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
58251333
ns58345583.5
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
3066964
ns3217941
ns0.95
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
97142208.5
ns95291875
ns1.02
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
91465292
ns91202166
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
91248709
ns91466833.5
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
98939750
ns98989375
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
287147312
ns416233916
ns0.69
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
339525208
ns339464334
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
316913333
ns313767938
ns1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
268449250
ns271058375
ns0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
7090723.5
ns7061082.5
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
974481667
ns1545657458
ns0.63
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
870526958
ns898188583
ns0.97
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
828389833.5
ns826159521
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
1109051750
ns1107530292
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
33710535
ns33786267
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1767149875
ns1790658917
ns0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
1687610417
ns1722150625
ns0.98
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1595678667
ns1650904250
ns0.97
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
1666392333
ns1671180334
ns1.00
lenet(28, 28, 1, 128)/forward/CPU/2 thread(s)
1551125
ns2097417
ns0.74
lenet(28, 28, 1, 128)/forward/CPU/4 thread(s)
1261353.5
ns1264709
ns1.00
lenet(28, 28, 1, 128)/forward/CPU/8 thread(s)
1653208
ns1649958
ns1.00
lenet(28, 28, 1, 128)/forward/CPU/1 thread(s)
2149875
ns2137875
ns1.01
lenet(28, 28, 1, 128)/forward/GPU/CUDA
263573.5
ns267573.5
ns0.99
lenet(28, 28, 1, 128)/zygote/CPU/2 thread(s)
7881604
ns9675541.5
ns0.81
lenet(28, 28, 1, 128)/zygote/CPU/4 thread(s)
6520750
ns6563291
ns0.99
lenet(28, 28, 1, 128)/zygote/CPU/8 thread(s)
7254500
ns7221208.5
ns1.00
lenet(28, 28, 1, 128)/zygote/CPU/1 thread(s)
10440833.5
ns10480041.5
ns1.00
lenet(28, 28, 1, 128)/zygote/GPU/CUDA
1084797
ns1100631
ns0.99
vgg16(32, 32, 3, 32)/forward/CPU/2 thread(s)
191678542
ns377462270.5
ns0.51
vgg16(32, 32, 3, 32)/forward/CPU/4 thread(s)
141721416
ns141558375
ns1.00
vgg16(32, 32, 3, 32)/forward/CPU/8 thread(s)
139923854
ns127416125
ns1.10
vgg16(32, 32, 3, 32)/forward/CPU/1 thread(s)
177014209
ns176760875
ns1.00
vgg16(32, 32, 3, 32)/forward/GPU/CUDA
4838270
ns4873783.5
ns0.99
vgg16(32, 32, 3, 32)/zygote/CPU/2 thread(s)
624362333
ns1122686458
ns0.56
vgg16(32, 32, 3, 32)/zygote/CPU/4 thread(s)
618016750
ns507815583
ns1.22
vgg16(32, 32, 3, 32)/zygote/CPU/8 thread(s)
592371875
ns593816333
ns1.00
vgg16(32, 32, 3, 32)/zygote/CPU/1 thread(s)
805157583
ns502835542
ns1.60
vgg16(32, 32, 3, 32)/zygote/GPU/CUDA
16284095
ns16273151
ns1.00
lenet(28, 28, 1, 64)/forward/CPU/2 thread(s)
1085854.5
ns1054520.5
ns1.03
lenet(28, 28, 1, 64)/forward/CPU/4 thread(s)
975729
ns957083
ns1.02
lenet(28, 28, 1, 64)/forward/CPU/8 thread(s)
1357208
ns1358375
ns1.00
lenet(28, 28, 1, 64)/forward/CPU/1 thread(s)
1349083
ns1298728.5
ns1.04
lenet(28, 28, 1, 64)/forward/GPU/CUDA
265378.5
ns273330.5
ns0.97
lenet(28, 28, 1, 64)/zygote/CPU/2 thread(s)
4474791.5
ns4965250
ns0.90
lenet(28, 28, 1, 64)/zygote/CPU/4 thread(s)
3739187.5
ns3766375
ns0.99
lenet(28, 28, 1, 64)/zygote/CPU/8 thread(s)
4560417
ns4597458
ns0.99
lenet(28, 28, 1, 64)/zygote/CPU/1 thread(s)
5712666.5
ns5551125
ns1.03
lenet(28, 28, 1, 64)/zygote/GPU/CUDA
1122289
ns1169825
ns0.96
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
23588062.5
ns70642000
ns0.33
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
35175438
ns33503041.5
ns1.05
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
37473250
ns37118042
ns1.01
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
35351792
ns35169250
ns1.01
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1831536.5
ns1857688
ns0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
184620749.5
ns354366208
ns0.52
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
159156729
ns158337812.5
ns1.01
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
184643833.5
ns184451937.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
383154875
ns383352917
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
16517121
ns16518743
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
293382125
ns390000249.5
ns0.75
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
245326959
ns243640875
ns1.01
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
291862916.5
ns294140479
ns0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
434596291
ns434507583
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/2 thread(s)
763937917
ns1277952958
ns0.60
vgg16(32, 32, 3, 128)/forward/CPU/4 thread(s)
482397583
ns485919333.5
ns0.99
vgg16(32, 32, 3, 128)/forward/CPU/8 thread(s)
442223645.5
ns433213020.5
ns1.02
vgg16(32, 32, 3, 128)/forward/CPU/1 thread(s)
863476083
ns864952167
ns1.00
vgg16(32, 32, 3, 128)/forward/GPU/CUDA
12467223
ns12478846
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/2 thread(s)
1869815583
ns3528826062
ns0.53
vgg16(32, 32, 3, 128)/zygote/CPU/4 thread(s)
1631627208
ns1558207834
ns1.05
vgg16(32, 32, 3, 128)/zygote/CPU/8 thread(s)
1585055312.5
ns1473262062.5
ns1.08
vgg16(32, 32, 3, 128)/zygote/CPU/1 thread(s)
2117325187.5
ns2071178020.5
ns1.02
vgg16(32, 32, 3, 128)/zygote/GPU/CUDA
49578956
ns49689763
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3059354
ns3411625
ns0.90
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2093791.5
ns2088417
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2289083
ns2285125
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
4739583.5
ns4873000
ns0.97
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
579124
ns585949
ns0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
25412708
ns25945083
ns0.98
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
19875521
ns19714270.5
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
18774500
ns18949479
ns0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
36713334
ns36845500
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
3000437
ns3206157
ns0.94
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
34769000
ns54184583.5
ns0.64
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
29459354
ns29555770.5
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
29088291.5
ns29848896
ns0.97
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
42657458
ns43608062.5
ns0.98
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1650417
ns1785500
ns0.92
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1199791.5
ns1173062.5
ns1.02
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1400458
ns1396875
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
2479437.5
ns2459000
ns1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
217327
ns217473
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12731874.5
ns12550750
ns1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
9939375
ns9958167
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9706875
ns9729166
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
18298958
ns18404208
ns0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1954694
ns1959796
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17720541
ns17667854
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
14711375
ns14631583
ns1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14650125
ns14658583
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
21313917
ns21427250
ns0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
23731791.5
ns70575125
ns0.34
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
34171063
ns33678083.5
ns1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
37737000
ns37450417
ns1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
35038125
ns35578375
ns0.98
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1844121
ns1839026
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
304074583
ns471605125
ns0.64
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
228800917
ns227950667
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
191127750
ns191225709
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
390163042
ns390127875
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
13910975
ns13948109
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
298073062.5
ns413570792
ns0.72
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
251492333
ns249784875
ns1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
297366500.5
ns300285375
ns0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
439290375
ns439800375
ns1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/2 thread(s)
2412875
ns4206625
ns0.57
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/4 thread(s)
2369417
ns2269125
ns1.04
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/8 thread(s)
2319187
ns2408833
ns0.96
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/1 thread(s)
2416791.5
ns2413208
ns1.00
mlp7layer_bn(gelu)(32 x 256)/forward/GPU/CUDA
591605
ns588604
ns1.01
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/2 thread(s)
6527458
ns11056542
ns0.59
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/4 thread(s)
6513833
ns6181729
ns1.05
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/8 thread(s)
6566958
ns6557458.5
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/1 thread(s)
6523625
ns6527958
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/GPU/CUDA
1406564
ns1411902
ns1.00
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/2 thread(s)
17550208
ns17068000
ns1.03
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/4 thread(s)
17517771
ns17515792
ns1.00
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/8 thread(s)
17546062
ns17552604
ns1.00
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/1 thread(s)
14106333
ns14105729
ns1.00
Dense(512 => 512, relu)(512 x 128)/forward/CPU/2 thread(s)
67500
ns820083
ns0.08230874192002517
Dense(512 => 512, relu)(512 x 128)/forward/CPU/4 thread(s)
69166.5
ns73166.5
ns0.95
Dense(512 => 512, relu)(512 x 128)/forward/CPU/8 thread(s)
70917
ns70625
ns1.00
Dense(512 => 512, relu)(512 x 128)/forward/CPU/1 thread(s)
69083
ns68708
ns1.01
Dense(512 => 512, relu)(512 x 128)/forward/GPU/CUDA
48376
ns48967
ns0.99
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/2 thread(s)
326250
ns1511625
ns0.22
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/4 thread(s)
320145.5
ns316459
ns1.01
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/8 thread(s)
294645.5
ns325979
ns0.90
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/1 thread(s)
329020.5
ns325312
ns1.01
Dense(512 => 512, relu)(512 x 128)/zygote/GPU/CUDA
216051
ns216972.5
ns1.00
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/2 thread(s)
443604.5
ns1538396
ns0.29
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/4 thread(s)
445166
ns403375
ns1.10
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/8 thread(s)
433041
ns444000.5
ns0.98
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/1 thread(s)
335625
ns374167
ns0.90
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3039374.5
ns3392562.5
ns0.90
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2078959
ns2049750
ns1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2285042
ns2295500
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
4855854
ns4870250
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/GPU/CUDA
585428
ns577280
ns1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
23586375
ns24079791
ns0.98
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
18045479
ns17988417
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
18421541.5
ns18387542
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
36084354
ns36117729.5
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2895805
ns3098822
ns0.93
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
34493896
ns53510938
ns0.64
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
27617667
ns27579604
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
29463292
ns29114500
ns1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
41630042
ns41915417
ns0.99
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
121571333
ns250358250
ns0.49
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
174174208
ns173849209
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
149053875
ns147986979
ns1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
103999666
ns104290083
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5461247
ns5468752
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
471938937.5
ns1095527645.5
ns0.43
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
534939750
ns535207166
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
436707291.5
ns432356541.5
ns1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
722476709
ns724055375
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
35165698
ns35153495
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
642315333
ns1027502854.5
ns0.63
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
658599833
ns659696479
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
583130687.5
ns602230000
ns0.97
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
735672250
ns733800000
ns1.00
mlp7layer_bn(relu)(32 x 256)/forward/CPU/2 thread(s)
402041.5
ns2044334
ns0.20
mlp7layer_bn(relu)(32 x 256)/forward/CPU/4 thread(s)
443417
ns367604.5
ns1.21
mlp7layer_bn(relu)(32 x 256)/forward/CPU/8 thread(s)
335916.5
ns319083.5
ns1.05
mlp7layer_bn(relu)(32 x 256)/forward/CPU/1 thread(s)
315708
ns401958
ns0.79
mlp7layer_bn(relu)(32 x 256)/forward/GPU/CUDA
574217
ns582402.5
ns0.99
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/2 thread(s)
2022125
ns6397416
ns0.32
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/4 thread(s)
2006458
ns2005000
ns1.00
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/8 thread(s)
1845834
ns1821770.5
ns1.01
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/1 thread(s)
1993334
ns2025333
ns0.98
mlp7layer_bn(relu)(32 x 256)/zygote/GPU/CUDA
1315953
ns1327577
ns0.99
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/2 thread(s)
5776604
ns9976292
ns0.58
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/4 thread(s)
5777125
ns5768729
ns1.00
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/8 thread(s)
5805750
ns5775813
ns1.01
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/1 thread(s)
2873000
ns2876833
ns1.00
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/2 thread(s)
103792
ns547584
ns0.19
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/4 thread(s)
104167
ns103209
ns1.01
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/8 thread(s)
105666
ns105375
ns1.00
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/1 thread(s)
103625
ns104020.5
ns1.00
Dense(128 => 128, gelu)(128 x 128)/forward/GPU/CUDA
28031
ns27680
ns1.01
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/2 thread(s)
209334
ns526083
ns0.40
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/4 thread(s)
219479
ns212584
ns1.03
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/8 thread(s)
209959
ns209792
ns1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/1 thread(s)
209250
ns209333
ns1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/GPU/CUDA
218695
ns219101
ns1.00
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/2 thread(s)
706687.5
ns1037667
ns0.68
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/4 thread(s)
715687.5
ns716416
ns1.00
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/8 thread(s)
707166.5
ns707708
ns1.00
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/1 thread(s)
691750
ns686166
ns1.01
Dense(128 => 128, relu)(128 x 128)/forward/CPU/2 thread(s)
13667
ns461458.5
ns0.029616964472428182
Dense(128 => 128, relu)(128 x 128)/forward/CPU/4 thread(s)
13375
ns13500
ns0.99
Dense(128 => 128, relu)(128 x 128)/forward/CPU/8 thread(s)
14750
ns14083.5
ns1.05
Dense(128 => 128, relu)(128 x 128)/forward/CPU/1 thread(s)
12709
ns13562.5
ns0.94
Dense(128 => 128, relu)(128 x 128)/forward/GPU/CUDA
27661
ns27872
ns0.99
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/2 thread(s)
25791
ns339792
ns0.07590231671139991
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/4 thread(s)
25667
ns25875
ns0.99
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/8 thread(s)
26083
ns26334
ns0.99
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/1 thread(s)
25917
ns25958
ns1.00
Dense(128 => 128, relu)(128 x 128)/zygote/GPU/CUDA
207395
ns208652
ns0.99
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/2 thread(s)
45375
ns352917
ns0.13
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/4 thread(s)
45958
ns51667
ns0.89
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/8 thread(s)
46166
ns46667
ns0.99
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/1 thread(s)
30459
ns28208
ns1.08
vgg16(32, 32, 3, 64)/forward/CPU/2 thread(s)
320027084
ns596075833.5
ns0.54
vgg16(32, 32, 3, 64)/forward/CPU/4 thread(s)
290608500
ns261796375
ns1.11
vgg16(32, 32, 3, 64)/forward/CPU/8 thread(s)
291095958
ns274968937.5
ns1.06
vgg16(32, 32, 3, 64)/forward/CPU/1 thread(s)
319727583
ns319947500
ns1.00
vgg16(32, 32, 3, 64)/forward/GPU/CUDA
7667238
ns7667816.5
ns1.00
vgg16(32, 32, 3, 64)/zygote/CPU/2 thread(s)
1234222187.5
ns2039634875
ns0.61
vgg16(32, 32, 3, 64)/zygote/CPU/4 thread(s)
994781770.5
ns996403792
ns1.00
vgg16(32, 32, 3, 64)/zygote/CPU/8 thread(s)
921801250
ns881761792
ns1.05
vgg16(32, 32, 3, 64)/zygote/CPU/1 thread(s)
1564104917
ns1561629292
ns1.00
vgg16(32, 32, 3, 64)/zygote/GPU/CUDA
27046210
ns27305261
ns0.99
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/2 thread(s)
413917
ns773791.5
ns0.53
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/4 thread(s)
414208
ns417208
ns0.99
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/8 thread(s)
417333
ns418834
ns1.00
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/1 thread(s)
415125
ns420375
ns0.99
Dense(512 => 512, gelu)(512 x 128)/forward/GPU/CUDA
47319.5
ns48116
ns0.98
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/2 thread(s)
1092167
ns2076250
ns0.53
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/4 thread(s)
1065333
ns1095687.5
ns0.97
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/8 thread(s)
1072584
ns1073708
ns1.00
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/1 thread(s)
1079166
ns1082062
ns1.00
Dense(512 => 512, gelu)(512 x 128)/zygote/GPU/CUDA
227192
ns229623.5
ns0.99
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/2 thread(s)
3114917
ns4054312.5
ns0.77
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/4 thread(s)
3108625
ns2996124.5
ns1.04
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/8 thread(s)
3108959
ns3071666.5
ns1.01
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/1 thread(s)
3008708.5
ns3030459
ns0.99
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/2 thread(s)
533375
ns1439791
ns0.37
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/4 thread(s)
495958
ns435083.5
ns1.14
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/8 thread(s)
453271
ns527417
ns0.86
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/1 thread(s)
445520.5
ns514834
ns0.87
mlp7layer_bn(tanh)(32 x 256)/forward/GPU/CUDA
585889.5
ns583408
ns1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/2 thread(s)
2134229.5
ns6197020.5
ns0.34
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/4 thread(s)
2114834
ns2129000
ns0.99
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/8 thread(s)
2150208
ns2140812
ns1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/1 thread(s)
2125500
ns2135000
ns1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/GPU/CUDA
1371879.5
ns1357853
ns1.01
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/2 thread(s)
7941062.5
ns11881396
ns0.67
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/4 thread(s)
7913562
ns7914188
ns1.00
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/8 thread(s)
7966875
ns7944458
ns1.00
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/1 thread(s)
4886708.5
ns4861000
ns1.01
Dense(16 => 16, relu)(16 x 128)/forward/CPU/2 thread(s)
6646
ns4542
ns1.46
Dense(16 => 16, relu)(16 x 128)/forward/CPU/4 thread(s)
7417
ns7750
ns0.96
Dense(16 => 16, relu)(16 x 128)/forward/CPU/8 thread(s)
7875
ns7791.5
ns1.01
Dense(16 => 16, relu)(16 x 128)/forward/CPU/1 thread(s)
7312
ns6709
ns1.09
Dense(16 => 16, relu)(16 x 128)/forward/GPU/CUDA
25255
ns25133
ns1.00
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/2 thread(s)
7625
ns9250
ns0.82
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/4 thread(s)
7667
ns7542
ns1.02
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/8 thread(s)
7625
ns7667
ns0.99
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/1 thread(s)
7708
ns7250
ns1.06
Dense(16 => 16, relu)(16 x 128)/zygote/GPU/CUDA
191623.5
ns192989.5
ns0.99
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/2 thread(s)
9042
ns9375
ns0.96
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/4 thread(s)
9042
ns9209
ns0.98
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/8 thread(s)
9084
ns9167
ns0.99
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/1 thread(s)
5917
ns5958
ns0.99
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/2 thread(s)
19541
ns15625
ns1.25
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/4 thread(s)
20250
ns20625
ns0.98
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/8 thread(s)
21292
ns21166
ns1.01
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/1 thread(s)
20250
ns20000
ns1.01
Dense(16 => 16, gelu)(16 x 128)/forward/GPU/CUDA
25135
ns25087
ns1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/2 thread(s)
33917
ns30875
ns1.10
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/4 thread(s)
33167
ns33937.5
ns0.98
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/8 thread(s)
33791
ns33459
ns1.01
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/1 thread(s)
33583
ns33916
ns0.99
Dense(16 => 16, gelu)(16 x 128)/zygote/GPU/CUDA
202824.5
ns203029.5
ns1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/2 thread(s)
94646
ns93000
ns1.02
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/4 thread(s)
94334
ns95042
ns0.99
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/8 thread(s)
94958
ns95250
ns1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/1 thread(s)
92208
ns92458
ns1.00
Dense(128 => 128, identity)(128 x 128)/forward/CPU/2 thread(s)
13000
ns380084
ns0.034202965660222476
Dense(128 => 128, identity)(128 x 128)/forward/CPU/4 thread(s)
14083
ns12875
ns1.09
Dense(128 => 128, identity)(128 x 128)/forward/CPU/8 thread(s)
15709
ns15125
ns1.04
Dense(128 => 128, identity)(128 x 128)/forward/CPU/1 thread(s)
13708
ns13166
ns1.04
Dense(128 => 128, identity)(128 x 128)/forward/GPU/CUDA
26307.5
ns26429
ns1.00
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/2 thread(s)
23562.5
ns290416.5
ns0.0811334755428841
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/4 thread(s)
24459
ns23875
ns1.02
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/8 thread(s)
23500
ns23042
ns1.02
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/1 thread(s)
24209
ns24000
ns1.01
Dense(128 => 128, identity)(128 x 128)/zygote/GPU/CUDA
172990
ns171766
ns1.01
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/2 thread(s)
56958
ns310458
ns0.18
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/4 thread(s)
58750
ns57208.5
ns1.03
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/8 thread(s)
57291.5
ns57292
ns1.00
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/1 thread(s)
37708
ns34625
ns1.09
Dense(16 => 16, identity)(16 x 128)/forward/CPU/2 thread(s)
5958
ns3292
ns1.81
Dense(16 => 16, identity)(16 x 128)/forward/CPU/4 thread(s)
6917
ns6875
ns1.01
Dense(16 => 16, identity)(16 x 128)/forward/CPU/8 thread(s)
7708
ns7875
ns0.98
Dense(16 => 16, identity)(16 x 128)/forward/CPU/1 thread(s)
7125
ns6875
ns1.04
Dense(16 => 16, identity)(16 x 128)/forward/GPU/CUDA
23646.5
ns23455
ns1.01
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/2 thread(s)
5291.5
ns6792
ns0.78
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/4 thread(s)
5125
ns5416
ns0.95
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/8 thread(s)
5417
ns5541
ns0.98
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/1 thread(s)
5208
ns4958
ns1.05
Dense(16 => 16, identity)(16 x 128)/zygote/GPU/CUDA
177367
ns175743.5
ns1.01
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/2 thread(s)
9292
ns8541
ns1.09
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/4 thread(s)
9083
ns9084
ns1.00
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/8 thread(s)
9208
ns9334
ns0.99
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/1 thread(s)
6125
ns6083
ns1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
107224959
ns153171271
ns0.70
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
116148083.5
ns117466187.5
ns0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
120301520.5
ns119681583
ns1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
117718792
ns117629167
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
2639626.5
ns2629890
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
399020250
ns560880125
ns0.71
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
369329750
ns370900291.5
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
396359583
ns399068916
ns0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
634997083
ns632760125
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
15142582
ns15150542
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
806322750
ns762982250
ns1.06
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
758735875
ns758524167
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
811941167
ns810307458
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
910428709
ns908363459
ns1.00
This comment was automatically generated by workflow using github-action-benchmark.