Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce cudf library size #7583

Merged

Conversation

robertmaynard
Copy link
Contributor

This PR combines two major changes that improve cudf's final binary size.

The optimizations are:

  • Explicitly telling fatbin to always compress device code
  • Not generating asserts with __FILE__, __LINE__, etc when in release mode.

With this PR we take cudf when built for ALL archs from 1.3GB to 329MB.

Since this has such a dramatic change, I did some analysis on library sizes, compile times, runtime startup times, and
runtime performance.

Library sizes

variant release 1 arch release all archs
branch-0.19 335MB 1.3GB
compression 133MB 476MB
compression + no_assert 110MB 329MB
rdc + comp + no_assert 150MB Not done

We see that compression and the removal of the asserts while staying with whole compilation ( no -rdc ) is best for performance. RDC taking more space makes sense to me as it has a less agressive optimizer and therefore generates more
sass/ptx.

Compile times

variant release 1 arch release all archs
branch-0.19 15m50.411s 44m12.849s
compression 15m14.021s 44m11.658s
compression + no_assert 12m53.049s 33m59.388s
rdc + comp + no_assert 12m58.550s Not done

Compile times are interesting as we see that compression has no negative effect. I expect that the reduced IO ( from smaller binaries ) offsets the time spent compressing.

Runtime starup times for all archs:

variant branch-0.19 compression + no_assert
ctest -j1 4m0.326s 3m57.500s
DISPATCHER_TEST * 12 0m3.441s 0m3.520s

The goal here was to time the tests to see if we had any measurable differences.
The DISPATCHER_TEST * 12 was selected as the runtime ~0.3s would hopefully highligh any performance differences, and therefore running it 12 times would
magnify the difference and wash out any system load impacts.

I think from these rather crude measurements we are safe to assume no massive runtime loading time costs.

Runtime performance for all archs:

I used the REDUCTION_BENCH as my baseline for detecting any performance changes. The benchmarks executed on a V100 on a shared lab machine.

REDUCTION_BENCH ./compare benchmarks branch_0.19 this_compression_branch
Comparing ./reduction_0.19 to ./reduction_compressed
Benchmark                                                                Time             CPU      Time Old      Time New       CPU Old       CPU New
-----------------------------------------------------------------------------------------------------------------------------------------------------
Reduction/bool_all/10000/manual_time                                  +0.1038         +0.0785         16621         18346         37366         40299
Reduction/bool_all/100000/manual_time                                 +0.0969         +0.0779         17057         18710         37738         40676
Reduction/bool_all/1000000/manual_time                                +0.0709         +0.0694         18175         19463         38256         40913
Reduction/bool_all/10000000/manual_time                               +0.0472         +0.0479         35489         37165         53792         56368
Reduction/bool_all/100000000/manual_time                              +0.0026         +0.0120        145630        146014        164349        166325
Reduction/int8_t_all/10000/manual_time                                +0.1415         +0.1098         16286         18591         36729         40762
Reduction/int8_t_all/100000/manual_time                               +0.0457         +0.0500         17180         17966         37798         39688
Reduction/int8_t_all/1000000/manual_time                              +0.0656         +0.0565         18297         19498         38522         40700
Reduction/int8_t_all/10000000/manual_time                             +0.0322         +0.0345         36117         37279         54645         56528
Reduction/int8_t_all/100000000/manual_time                            +0.0041         +0.0132        146222        146819        165053        167232
Reduction/int32_t_all/10000/manual_time                               +0.0931         +0.0781         16849         18417         37471         40397
Reduction/int32_t_all/100000/manual_time                              +0.0841         +0.0775         17371         18831         37772         40701
Reduction/int32_t_all/1000000/manual_time                             +0.0660         +0.0709         22323         23797         41633         44584
Reduction/int32_t_all/10000000/manual_time                            +0.0171         +0.0324         68303         69470         87006         89822
Reduction/int32_t_all/100000000/manual_time                           +0.0028         +0.0065        472041        473370        490693        493869
Reduction/float_all/10000/manual_time                                 +0.0847         +0.0734         16951         18387         37708         40474
Reduction/float_all/100000/manual_time                                +0.0962         +0.0880         16939         18569         37229         40504
Reduction/float_all/1000000/manual_time                               +0.1879         +0.1679         22348         26547         41549         48526
Reduction/float_all/10000000/manual_time                              +0.0465         +0.0652         68851         72054         87438         93139
Reduction/float_all/100000000/manual_time                             +0.0081         +0.0126        472210        476037        491030        497241
Reduction/bool_any/10000/manual_time                                  +0.0764         +0.0743         16292         17537         36830         39568
Reduction/bool_any/100000/manual_time                                 +0.0568         +0.0591         16633         17578         37197         39394
Reduction/bool_any/1000000/manual_time                                +0.0771         +0.0593         18024         19414         38439         40720
Reduction/bool_any/10000000/manual_time                               +0.0388         +0.0452         34990         36347         53210         55614
Reduction/bool_any/100000000/manual_time                              +0.0012         +0.0096        145729        145903        164726        166315
Reduction/int8_t_any/10000/manual_time                                +0.1207         +0.0954         15686         17579         36213         39669
Reduction/int8_t_any/100000/manual_time                               +0.0546         +0.0580         16597         17503         37103         39254
Reduction/int8_t_any/1000000/manual_time                              +0.0801         +0.0669         17938         19374         38027         40572
Reduction/int8_t_any/10000000/manual_time                             +0.0492         +0.0541         34726         36434         52812         55671
Reduction/int8_t_any/100000000/manual_time                            +0.0028         +0.0119        145924        146340        164711        166670
Reduction/int32_t_any/10000/manual_time                               +0.0489         +0.0539         16805         17626         37587         39614
Reduction/int32_t_any/100000/manual_time                              +0.0824         +0.0717         16985         18384         37432         40114
Reduction/int32_t_any/1000000/manual_time                             +0.0588         +0.0590         22338         23652         41842         44311
Reduction/int32_t_any/10000000/manual_time                            +0.0203         +0.0339         67672         69045         86353         89280
Reduction/int32_t_any/100000000/manual_time                           +0.0023         +0.0054        471708        472770        490461        493130
Reduction/float_any/10000/manual_time                                 +0.0454         +0.0494         16730         17490         37526         39382
Reduction/float_any/100000/manual_time                                +0.0895         +0.0739         16872         18381         37349         40110
Reduction/float_any/1000000/manual_time                               +0.0554         +0.0527         22402         23643         42014         44228
Reduction/float_any/10000000/manual_time                              +0.0026         +0.0081         68877         69060         88443         89159
Reduction/float_any/100000000/manual_time                             +0.0022         +0.0049        471781        472801        490753        493170
ReductionDictionary/int32_t_all/10000/manual_time                     +0.0503         +0.0535         30648         32190         51412         54160
ReductionDictionary/int32_t_all/100000/manual_time                    +0.0696         +0.0642         31076         33240         51545         54855
ReductionDictionary/int32_t_all/1000000/manual_time                   +0.9359         +0.4786         38709         74935         58597         86644
ReductionDictionary/int32_t_all/10000000/manual_time                  +0.0289         +0.0625         95351         98111        113516        120611
ReductionDictionary/int32_t_all/100000000/manual_time                 -0.0963         -0.0910        645050        582915        663356        602971
ReductionDictionary/float_all/10000/manual_time                       +0.0476         +0.0504         30177         31615         50814         53374
ReductionDictionary/float_all/100000/manual_time                      -0.0305         -0.0166         34208         33166         55546         54626
ReductionDictionary/float_all/1000000/manual_time                     -0.0258         -0.0137         40372         39329         61080         60242
ReductionDictionary/float_all/10000000/manual_time                    -0.0366         -0.0210         95681         92175        113937        111545
ReductionDictionary/float_all/100000000/manual_time                   -0.0784         -0.0733        644233        593744        662793        614231
ReductionDictionary/int32_t_any/10000/manual_time                     +0.0718         +0.0680         29882         32026         50723         54172
ReductionDictionary/int32_t_any/100000/manual_time                    +0.0776         +0.0704         30316         32670         50640         54203
ReductionDictionary/int32_t_any/1000000/manual_time                   +0.0291         +0.0372         37311         38397         57039         59159
ReductionDictionary/int32_t_any/10000000/manual_time                  -0.0497         -0.0285         95928         91163        113924        110675
ReductionDictionary/int32_t_any/100000000/manual_time                 -0.1160         -0.1109        655803        579715        673920        599181
ReductionDictionary/float_any/10000/manual_time                       +0.0561         +0.0563         29905         31583         50566         53413
ReductionDictionary/float_any/100000/manual_time                      +0.0622         +0.0581         30431         32322         50788         53740
ReductionDictionary/float_any/1000000/manual_time                     +0.0403         +0.0473         36955         38444         56562         59237
ReductionDictionary/float_any/10000000/manual_time                    -0.0626         -0.0421         96949         90881        115099        110252
ReductionDictionary/float_any/100000000/manual_time                   -0.1203         -0.1154        661222        581658        679363        600965
ReductionDictionary/int32_t_min/10000/manual_time                     +0.0539         +0.0525         45796         48265         66802         70306
ReductionDictionary/int32_t_min/100000/manual_time                    +0.0955         +0.0868         46170         50579         66882         72687
ReductionDictionary/int32_t_min/1000000/manual_time                   +0.0683         +0.0647         47495         50738         67291         71646
ReductionDictionary/int32_t_min/10000000/manual_time                  +0.0065         +0.0160         97469         98105        115997        117850
ReductionDictionary/int32_t_min/100000000/manual_time                 +0.0045         +0.0070        501086        503357        519638        523253
ReductionDictionary/float_min/10000/manual_time                       +0.0721         +0.0653         45551         48834         66485         70829
ReductionDictionary/float_min/100000/manual_time                      +0.0966         +0.0865         44953         49295         65457         71120
ReductionDictionary/float_min/1000000/manual_time                     +0.0618         +0.0582         47227         50145         67004         70900
ReductionDictionary/float_min/10000000/manual_time                    +0.0052         +0.0167         97166         97669        115559        117490
ReductionDictionary/float_min/100000000/manual_time                   +0.0060         +0.0083        501044        504029        519674        523988
ReductionDictionary/int32_t_max/10000/manual_time                     +0.0596         +0.0558         44987         47667         65991         69670
ReductionDictionary/int32_t_max/100000/manual_time                    +0.0735         +0.0625         44640         47921         65423         69515
ReductionDictionary/int32_t_max/1000000/manual_time                   +0.0610         +0.0569         46224         49042         65957         69707
ReductionDictionary/int32_t_max/10000000/manual_time                  -0.0100         -0.0009         97078         96105        115787        115685
ReductionDictionary/int32_t_max/100000000/manual_time                 +0.0021         +0.0041        500466        501504        518959        521082
ReductionDictionary/float_max/10000/manual_time                       +0.0815         +0.0716         44141         47738         65134         69798
ReductionDictionary/float_max/100000/manual_time                      +0.0481         +0.0466         46378         48609         67284         70422
ReductionDictionary/float_max/1000000/manual_time                     +0.0091         +0.0180         48140         48579         67837         69055
ReductionDictionary/float_max/10000000/manual_time                    -0.0000         +0.0063         96823         96820        115498        116228
ReductionDictionary/float_max/100000000/manual_time                   +0.0044         +0.0061        501050        503259        519938        523126
ReductionDictionary/int32_t_mean/10000/manual_time                    -0.0259         -0.0018         45053         43885         66162         66041
ReductionDictionary/int32_t_mean/100000/manual_time                   +0.0448         +0.0484         44776         46782         65662         68843
ReductionDictionary/int32_t_mean/1000000/manual_time                  +0.0134         +0.0263         52069         52766         72163         74061
ReductionDictionary/int32_t_mean/10000000/manual_time                 -0.1592         -0.1321        135248        113712        153825        133504
ReductionDictionary/int32_t_mean/100000000/manual_time                -0.2259         -0.2180        856793        663232        875288        684472
ReductionDictionary/float_mean/10000/manual_time                      +0.1263         +0.1185         44256         49847         65389         73139
ReductionDictionary/float_mean/100000/manual_time                     +0.1304         +0.1231         44038         49780         64743         72713
ReductionDictionary/float_mean/1000000/manual_time                    +0.0792         +0.0887         51678         55771         71756         78124
ReductionDictionary/float_mean/10000000/manual_time                   -0.1397         -0.1058        136928        117802        155376        138931
ReductionDictionary/float_mean/100000000/manual_time                  -0.2322         -0.2256        861014        661106        879508        681082
Reduction/bool_sum/10000/manual_time                                  +0.1048         +0.0881         16256         17960         36958         40216
Reduction/bool_sum/100000/manual_time                                 +0.0424         +0.0473         17106         17832         37986         39781
Reduction/bool_sum/1000000/manual_time                                +0.0837         +0.0620         17928         19428         38311         40688
Reduction/bool_sum/10000000/manual_time                               +0.0408         +0.0395         34927         36352         53394         55502
Reduction/bool_sum/100000000/manual_time                              +0.0030         +0.0102        145074        145516        164126        165798
Reduction/int8_t_sum/10000/manual_time                                +0.1105         +0.0832         15704         17438         36410         39440
Reduction/int8_t_sum/100000/manual_time                               +0.0694         +0.0629         16539         17688         37233         39575
Reduction/int8_t_sum/1000000/manual_time                              +0.0783         +0.0574         17880         19279         38316         40517
Reduction/int8_t_sum/10000000/manual_time                             +0.0416         +0.0406         34809         36256         53305         55467
Reduction/int8_t_sum/100000000/manual_time                            +0.0010         +0.0098        144214        144358        163105        164700
Reduction/int32_t_sum/10000/manual_time                               +0.0074         +0.0264         17459         17588         38551         39570
Reduction/int32_t_sum/100000/manual_time                              +0.0791         +0.0684         17086         18438         37705         40286
Reduction/int32_t_sum/1000000/manual_time                             -0.0064         -0.0027         23751         23599         44384         44263
Reduction/int32_t_sum/10000000/manual_time                            +0.0238         +0.0358         67606         69212         86499         89593
Reduction/int32_t_sum/100000000/manual_time                           +0.0002         -0.0043        472981        473093        495882        493743
Reduction/int64_t_sum/10000/manual_time                               +0.0340         +0.0383         17246         17833         38306         39772
Reduction/int64_t_sum/100000/manual_time                              +0.0755         +0.0719         17841         19188         38242         40990
Reduction/int64_t_sum/1000000/manual_time                             +0.0576         +0.0635         28702         30354         47263         50266
Reduction/int64_t_sum/10000000/manual_time                            +0.0119         +0.0141        112911        114253        132783        134658
Reduction/int64_t_sum/100000000/manual_time                           +0.0013         +0.0021        920797        921986        940549        942485
Reduction/float_sum/10000/manual_time                                 +0.0882         +0.0707         16479         17932         37421         40067
Reduction/float_sum/100000/manual_time                                +0.0903         +0.0774         16677         18183         37163         40041
Reduction/float_sum/1000000/manual_time                               +0.0571         +0.0622         22393         23673         41960         44569
Reduction/float_sum/10000000/manual_time                              +0.0179         +0.0345         67968         69184         86691         89678
Reduction/float_sum/100000000/manual_time                             +0.0015         -0.0015        472247        472973        494236        493474
Reduction/double_sum/10000/manual_time                                +0.0982         +0.0794         16036         17611         36657         39568
Reduction/double_sum/100000/manual_time                               +0.0153         +0.0340         18453         18735         38955         40281
Reduction/double_sum/1000000/manual_time                              +0.0596         +0.0636         28623         30328         47153         50154
Reduction/double_sum/10000000/manual_time                             +0.0089         +0.0130        113426        114441        132746        134476
Reduction/double_sum/100000000/manual_time                            +0.0026         +0.0044        920131        922540        939374        943549
Reduction/int32_t_product/10000/manual_time                           +0.1183         +0.0902         16570         18529         37443         40819
Reduction/int32_t_product/100000/manual_time                          +0.1097         +0.0905         17076         18948         37741         41158
Reduction/int32_t_product/1000000/manual_time                         +0.0655         +0.0622         22023         23465         41604         44194
Reduction/int32_t_product/10000000/manual_time                        +0.0237         +0.0374         67629         69233         86354         89583
Reduction/int32_t_product/100000000/manual_time                       +0.0003         +0.0024        473110        473265        492461        493666
Reduction/float_product/10000/manual_time                             +0.0839         +0.0615         16972         18395         38034         40374
Reduction/float_product/100000/manual_time                            +0.0787         +0.0718         17091         18436         37602         40302
Reduction/float_product/1000000/manual_time                           +0.0686         +0.0714         22282         23810         41856         44844
Reduction/float_product/10000000/manual_time                          +0.0135         +0.0191         68313         69233         87798         89474
Reduction/float_product/100000000/manual_time                         +0.0016         -0.0034        472681        473450        495621        493925
Reduction/int64_t_min/10000/manual_time                               +0.0885         +0.0697         16886         18380         37763         40393
Reduction/int64_t_min/100000/manual_time                              +0.0490         +0.0534         17988         18869         38269         40311
Reduction/int64_t_min/1000000/manual_time                             +0.0087         +0.0218         30355         30618         49407         50485
Reduction/int64_t_min/10000000/manual_time                            +0.0092         +0.0179        113906        114949        132977        135352
Reduction/int64_t_min/100000000/manual_time                           +0.0011         +0.0020        921089        922086        940543        942460
Reduction/double_min/10000/manual_time                                +0.0818         +0.0662         16848         18226         37674         40169
Reduction/double_min/100000/manual_time                               +0.0839         +0.0761         17470         18936         37537         40394
Reduction/double_min/1000000/manual_time                              +0.0582         +0.0629         28967         30653         47503         50490
Reduction/double_min/10000000/manual_time                             +0.0135         +0.0246        113413        114940        132250        135504
Reduction/double_min/100000000/manual_time                            +0.0017         +0.0027        920812        922344        940237        942769
Reduction/timestamp_ms_min/10000/manual_time                          +0.0910         +0.0673         16926         18466         37673         40210
Reduction/timestamp_ms_min/100000/manual_time                         +0.0830         +0.0738         17894         19379         38112         40925
Reduction/timestamp_ms_min/1000000/manual_time                        +0.0339         +0.0429         30113         31133         48915         51012
Reduction/timestamp_ms_min/10000000/manual_time                       +0.0109         +0.0189        114185        115433        133132        135650
Reduction/timestamp_ms_min/100000000/manual_time                      +0.0053         +0.0076        920924        925845        940096        947215
Reduction/int8_t_mean/10000/manual_time                               +0.0489         +0.0517         29025         30445         50078         52670
Reduction/int8_t_mean/100000/manual_time                              +0.0401         +0.0493         29218         30389         50050         52515
Reduction/int8_t_mean/1000000/manual_time                             -0.0215         -0.0036         33316         32600         54162         53965
Reduction/int8_t_mean/10000000/manual_time                            +0.0387         +0.0407         46726         48536         64936         67581
Reduction/int8_t_mean/100000000/manual_time                           +0.0007         +0.0058        161956        162077        180866        181918
Reduction/float_mean/10000/manual_time                                +0.0916         +0.0827         28959         31611         49748         53860
Reduction/float_mean/100000/manual_time                               +0.0369         +0.0485         30720         31854         51489         53988
Reduction/float_mean/1000000/manual_time                              +0.0218         +0.0310         36650         37448         56777         58539
Reduction/float_mean/10000000/manual_time                             +0.0266         +0.0402         81361         83528         99648        103653
Reduction/float_mean/100000000/manual_time                            -0.0036         -0.0003        489644        487869        508151        507999
Reduction/int32_t_variance/10000/manual_time                          +0.0337         +0.0343         29718         30718         50967         52713
Reduction/int32_t_variance/100000/manual_time                         -0.0506         -0.0189         33774         32065         55094         54054
Reduction/int32_t_variance/1000000/manual_time                        +0.0364         +0.0316         36062         37374         56511         58297
Reduction/int32_t_variance/10000000/manual_time                       +0.0107         +0.0197         83950         84849        102507        104529
Reduction/int32_t_variance/100000000/manual_time                      -0.0003         +0.0018        491601        491463        510405        511323
Reduction/double_variance/10000/manual_time                           +0.0536         +0.0457         29157         30721         50266         52565
Reduction/double_variance/100000/manual_time                          +0.0367         +0.0415         30552         31674         51014         53130
Reduction/double_variance/1000000/manual_time                         +0.0238         +0.0334         43737         44780         62726         64821
Reduction/double_variance/10000000/manual_time                        +0.0046         +0.0084        129382        129980        148232        149476
Reduction/double_variance/100000000/manual_time                       +0.0027         +0.0042        934085        936595        952680        956708
Reduction/int64_t_std/10000/manual_time                               +0.0638         +0.0592         28956         30803         49847         52798
Reduction/int64_t_std/100000/manual_time                              -0.0742         -0.0356         34976         32380         56179         54180
Reduction/int64_t_std/1000000/manual_time                             -0.1048         -0.0836         49651         44446         70034         64177
Reduction/int64_t_std/10000000/manual_time                            +0.0077         +0.0118        129373        130373        148033        149784
Reduction/int64_t_std/100000000/manual_time                           +0.0025         +0.0037        934055        936408        952668        956168
Reduction/float_std/10000/manual_time                                 +0.0754         +0.0699         28573         30728         49316         52764
Reduction/float_std/100000/manual_time                                +0.0945         +0.0846         28682         31394         49008         53155
Reduction/float_std/1000000/manual_time                               +0.0356         +0.0437         36195         37485         56074         58526
Reduction/float_std/10000000/manual_time                              +0.0175         +0.0292         83490         84951        101651        104623
Reduction/float_std/100000000/manual_time                             +0.0018         +0.0049        490773        491670        509017        511496
ReductionScan/int8_no_nulls/10000/manual_time                         +0.0523         +0.0511         15603         16420         36157         38005
ReductionScan/int8_no_nulls/100000/manual_time                        +0.0181         +0.0295         17358         17672         37950         39070
ReductionScan/int8_no_nulls/1000000/manual_time                       +0.0251         +0.0390         26876         27550         45671         47451
ReductionScan/int8_no_nulls/10000000/manual_time                      +0.0166         +0.0278         89319         90800        109601        112649
ReductionScan/int8_no_nulls/100000000/manual_time                     +0.0007         +0.0019        735210        735689        756198        757634
ReductionScan/int32_no_nulls/10000/manual_time                        +0.0515         +0.0559         15234         16019         35716         37712
ReductionScan/int32_no_nulls/100000/manual_time                       +0.0593         +0.0583         16808         17805         36980         39134
ReductionScan/int32_no_nulls/1000000/manual_time                      +0.0390         +0.0469         26405         27436         44665         46759
ReductionScan/int32_no_nulls/10000000/manual_time                     +0.0103         +0.0196        114948        116137        135580        138232
ReductionScan/int32_no_nulls/100000000/manual_time                    +0.0032         +0.0056       1016197       1019401       1037281       1043090
ReductionScan/uint64_no_nulls/10000/manual_time                       +0.0162         +0.0308         16222         16485         36939         38077
ReductionScan/uint64_no_nulls/100000/manual_time                      +0.0588         +0.0685         18751         19852         38174         40789
ReductionScan/uint64_no_nulls/1000000/manual_time                     +0.0444         +0.0604         36339         37953         56819         60253
ReductionScan/uint64_no_nulls/10000000/manual_time                    -0.0072         +0.0016        217943        216376        238710        239081
ReductionScan/uint64_no_nulls/100000000/manual_time                   -0.0009         -0.0007       2001645       1999848       2023397       2021947
ReductionScan/float_no_nulls/10000/manual_time                        +0.0298         +0.0399         15775         16246         36567         38028
ReductionScan/float_no_nulls/100000/manual_time                       +0.0534         +0.0505         16888         17789         37162         39037
ReductionScan/float_no_nulls/1000000/manual_time                      +0.0339         +0.0416         26655         27558         45083         46960
ReductionScan/float_no_nulls/10000000/manual_time                     +0.0071         +0.0155        115565        116383        136602        138716
ReductionScan/float_no_nulls/100000000/manual_time                    -0.0019         -0.0006       1019440       1017475       1040591       1040016
ReductionScan/int16_nulls/10000/manual_time                           +0.0510         +0.0442         35588         37402         57805         60363
ReductionScan/int16_nulls/100000/manual_time                          +0.0442         +0.0450         37789         39458         59383         62055
ReductionScan/int16_nulls/1000000/manual_time                         -0.0270         -0.0174         48836         47519         69731         68521
ReductionScan/int16_nulls/10000000/manual_time                        -0.0046         +0.0034        122849        122281        145028        145520
ReductionScan/int16_nulls/100000000/manual_time                       -0.0025         -0.0107        864450        862291        892772        883246
ReductionScan/uint32_nulls/10000/manual_time                          -0.0178         -0.0017         37685         37016         59988         59883
ReductionScan/uint32_nulls/100000/manual_time                         +0.0478         +0.0494         37349         39134         58805         61708
ReductionScan/uint32_nulls/1000000/manual_time                        +0.0085         +0.0130         49350         49771         69758         70663
ReductionScan/uint32_nulls/10000000/manual_time                       +0.0013         +0.0086        146704        146896        169127        170574
ReductionScan/uint32_nulls/100000000/manual_time                      +0.0085         +0.0111       1112967       1122427       1133374       1145988
ReductionScan/double_nulls/10000/manual_time                          +0.0623         +0.0611         35538         37753         57332         60833
ReductionScan/double_nulls/100000/manual_time                         +0.0602         +0.0635         38508         40827         59025         62775
ReductionScan/double_nulls/1000000/manual_time                        +0.0202         +0.0275         55522         56645         77436         79567
ReductionScan/double_nulls/10000000/manual_time                       +0.0084         +0.0118        243996        246039        265992        269142
ReductionScan/double_nulls/100000000/manual_time                      -0.0013         -0.0008       2120668       2117888       2140996       2139376
Reduction/bool_minmax/10000/manual_time                               -0.1377         -0.0650         22410         19325         43937         41081
Reduction/bool_minmax/100000/manual_time                              +0.0523         +0.0537         18947         19937         39665         41795
Reduction/bool_minmax/1000000/manual_time                             +0.0734         +0.0617         19146         20551         39413         41846
Reduction/bool_minmax/10000000/manual_time                            +0.0358         +0.0438         37836         39189         56031         58483
Reduction/bool_minmax/100000000/manual_time                           +0.0000         -0.0063        149637        149641        170798        169728
Reduction/int8_t_minmax/10000/manual_time                             +0.0361         +0.0438         18831         19512         39671         41408
Reduction/int8_t_minmax/100000/manual_time                            +0.0629         +0.0571         18429         19588         39093         41327
Reduction/int8_t_minmax/1000000/manual_time                           +0.0874         +0.0696         18956         20613         39054         41772
Reduction/int8_t_minmax/10000000/manual_time                          +0.0369         +0.0413         37781         39175         55998         58312
Reduction/int8_t_minmax/100000000/manual_time                         -0.0002         +0.0059        149816        149783        168898        169894
Reduction/int32_t_minmax/10000/manual_time                            +0.0813         +0.0613         18423         19921         39271         41676
Reduction/int32_t_minmax/100000/manual_time                           +0.0848         +0.0668         18560         20133         39161         41779
Reduction/int32_t_minmax/1000000/manual_time                          +0.0630         +0.0610         23265         24730         42732         45337
Reduction/int32_t_minmax/10000000/manual_time                         +0.0107         +0.0196         69505         70248         88260         89989
Reduction/int32_t_minmax/100000000/manual_time                        -0.0004         +0.0009        474594        474410        493944        494395
Reduction/timestamp_ms_minmax/10000/manual_time                       +0.0716         +0.0605         19241         20618         40084         42509
Reduction/timestamp_ms_minmax/100000/manual_time                      +0.0418         +0.0513         19638         20459         39825         41867
Reduction/timestamp_ms_minmax/1000000/manual_time                     +0.0559         +0.0635         30346         32042         48851         51954
Reduction/timestamp_ms_minmax/10000000/manual_time                    +0.0030         +0.0025        117085        117437        136688        137033
Reduction/timestamp_ms_minmax/100000000/manual_time                   +0.0010         -0.0037        921557        922453        945924        942449
Reduction/float_minmax/10000/manual_time                              +0.0650         +0.0521         18507         19710         39467         41526
Reduction/float_minmax/100000/manual_time                             +0.0740         +0.0619         18504         19874         39105         41525
Reduction/float_minmax/1000000/manual_time                            +0.0499         +0.0373         23385         24552         43561         45185
Reduction/float_minmax/10000000/manual_time                           +0.0163         +0.0314         69400         70529         88043         90807
Reduction/float_minmax/100000000/manual_time                          +0.0000         +0.0025        474346        474361        493215        494461

It looks like the compression has a minor cost? I would have thought the removal of the release_assert would have improved performance. But maybe reduce isn't using it?

@robertmaynard robertmaynard requested review from a team as code owners March 12, 2021 19:53
@github-actions github-actions bot added CMake CMake build issue libcudf Affects libcudf (C++/CUDA) code. labels Mar 12, 2021
Copy link
Collaborator

@kkraus14 kkraus14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmake lgtm

@jrhemstad
Copy link
Contributor

The runtime difference in benchmarks is a little interesting. I'd be curious to see results from a few other benchmarks.

@robertmaynard
Copy link
Contributor Author

The runtime difference in benchmarks is a little interesting. I'd be curious to see results from a few other benchmarks.

Given the background in #3044 I can run the Dispatcher benchmark, anything else you are interested in?

@jrhemstad
Copy link
Contributor

The runtime difference in benchmarks is a little interesting. I'd be curious to see results from a few other benchmarks.

Given the background in #3044 I can run the Dispatcher benchmark, anything else you are interested in?

Present day Jake is less convinced than past Jake that the dispatcher benchmark will be revelatory. There's no actual work being done in any of the device code in the benchmark that would be impacted.

I'd maybe try some big ticket ones like join or sort.

@robertmaynard
Copy link
Contributor Author

JOIN_BENCH
Comparing /host_pwd/Work/compose/results/join_0.19 to /host_pwd/Work/compose/results/join_compressed
Benchmark                                                                                 Time             CPU      Time Old      Time New       CPU Old       CPU New
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Join<int32_t, int32_t>/join_32bit/100000/100000/manual_time                            -0.0486         -0.0482             0             0             0             0
Join<int32_t, int32_t>/join_32bit/100000/400000/manual_time                            -0.0806         -0.0781             0             0             0             0
Join<int32_t, int32_t>/join_32bit/100000/1000000/manual_time                           -0.0936         -0.0907             1             0             1             1
Join<int32_t, int32_t>/join_32bit/10000000/10000000/manual_time                        +0.0101         +0.0100            14            14            14            14
Join<int32_t, int32_t>/join_32bit/10000000/40000000/manual_time                        +0.0164         +0.0165            34            35            34            35
Join<int32_t, int32_t>/join_32bit/10000000/100000000/manual_time                       +0.0124         +0.0123            82            83            82            83
Join<int32_t, int32_t>/join_32bit/100000000/100000000/manual_time                      +0.0158         +0.0158           140           142           140           142
Join<int32_t, int32_t>/join_32bit/80000000/240000000/manual_time                       +0.0242         +0.0242           226           232           226           232
Join<int64_t, int64_t>/join_64bit/50000000/50000000/manual_time                        +0.0154         +0.0154            71            72            71            72
Join<int64_t, int64_t>/join_64bit/40000000/120000000/manual_time                       +0.0148         +0.0148           117           119           117           119
Join<int32_t, int32_t>/join_32bit_nulls/100000/100000/manual_time                      -0.0264         -0.0254             0             0             0             0
Join<int32_t, int32_t>/join_32bit_nulls/100000/400000/manual_time                      -0.0272         -0.0260             0             0             0             0
Join<int32_t, int32_t>/join_32bit_nulls/100000/1000000/manual_time                     -0.0477         -0.0460             0             0             0             0
Join<int32_t, int32_t>/join_32bit_nulls/10000000/10000000/manual_time                  -0.0619         -0.0616             3             3             3             3
Join<int32_t, int32_t>/join_32bit_nulls/10000000/40000000/manual_time                  -0.1081         -0.1080             8             7             8             7
Join<int32_t, int32_t>/join_32bit_nulls/10000000/100000000/manual_time                 -0.1490         -0.1488            18            15            18            15
Join<int32_t, int32_t>/join_32bit_nulls/100000000/100000000/manual_time                -0.0572         -0.0571            31            29            31            29
Join<int32_t, int32_t>/join_32bit_nulls/80000000/240000000/manual_time                 -0.0882         -0.0881            49            44            49            44
Join<int64_t, int64_t>/join_64bit_nulls/50000000/50000000/manual_time                  -0.0579         -0.0578            16            15            16            15
Join<int64_t, int64_t>/join_64bit_nulls/40000000/120000000/manual_time                 -0.0908         -0.0909            25            23            25            23
SORT_BENCH

Comparing /host_pwd/Work/compose/results/sort_0.19 to /host_pwd/Work/compose/results/sort_compressed
Benchmark                                                              Time             CPU      Time Old      Time New       CPU Old       CPU New
---------------------------------------------------------------------------------------------------------------------------------------------------
Sort<false>/unstable_no_nulls/1024/1/manual_time                    -0.0042         -0.0011             0             0             0             0
Sort<false>/unstable_no_nulls/4096/1/manual_time                    -0.0043         -0.0050             0             0             0             0
Sort<false>/unstable_no_nulls/32768/1/manual_time                   -0.0030         -0.0014             0             0             0             0
Sort<false>/unstable_no_nulls/262144/1/manual_time                  -0.0136         -0.0113             0             0             0             0
Sort<false>/unstable_no_nulls/2097152/1/manual_time                 -0.0093         -0.0090             1             1             1             1
Sort<false>/unstable_no_nulls/16777216/1/manual_time                -0.0017         -0.0017             3             3             3             3
Sort<false>/unstable_no_nulls/67108864/1/manual_time                -0.0007         -0.0007            13            13            13            13
Sort<false>/unstable_no_nulls/1024/8/manual_time                    +0.0168         +0.0164             0             0             0             0
Sort<false>/unstable_no_nulls/4096/8/manual_time                    +0.0191         +0.0184             1             1             1             1
Sort<false>/unstable_no_nulls/32768/8/manual_time                   +0.0166         +0.0158             1             1             1             1
Sort<false>/unstable_no_nulls/262144/8/manual_time                  +0.0157         +0.0153             1             1             1             1
Sort<false>/unstable_no_nulls/2097152/8/manual_time                 +0.0354         +0.0354            19            19            19            19
Sort<false>/unstable_no_nulls/16777216/8/manual_time                +0.0202         +0.0203           243           248           243           248
Sort<false>/unstable_no_nulls/67108864/8/manual_time                +0.0175         +0.0176          1270          1293          1270          1293
Sort<true>/stable_no_nulls/1024/1/manual_time                       +0.0050         +0.0039             0             0             0             0
Sort<true>/stable_no_nulls/4096/1/manual_time                       -0.0796         -0.0691             0             0             0             0
Sort<true>/stable_no_nulls/32768/1/manual_time                      -0.0404         -0.0423             0             0             0             0
Sort<true>/stable_no_nulls/262144/1/manual_time                     -0.0353         -0.0354             0             0             0             0
Sort<true>/stable_no_nulls/2097152/1/manual_time                    -0.0182         -0.0199             1             1             1             1
Sort<true>/stable_no_nulls/16777216/1/manual_time                   -0.0022         -0.0021             3             3             3             3
Sort<true>/stable_no_nulls/67108864/1/manual_time                   -0.0010         -0.0009            13            13            13            13
Sort<true>/stable_no_nulls/1024/8/manual_time                       +0.0160         +0.0149             0             0             0             0
Sort<true>/stable_no_nulls/4096/8/manual_time                       +0.0190         +0.0183             1             1             1             1
Sort<true>/stable_no_nulls/32768/8/manual_time                      +0.0174         +0.0168             1             1             1             1
Sort<true>/stable_no_nulls/262144/8/manual_time                     +0.0157         +0.0155             1             1             1             1
Sort<true>/stable_no_nulls/2097152/8/manual_time                    +0.0382         +0.0382            19            19            19            19
Sort<true>/stable_no_nulls/16777216/8/manual_time                   +0.0197         +0.0197           243           248           243           248
Sort<true>/stable_no_nulls/67108864/8/manual_time                   +0.0175         +0.0175          1270          1292          1270          1292
Sort<false>/unstable/1024/1/manual_time                             -0.0622         -0.0547             0             0             0             0
Sort<false>/unstable/4096/1/manual_time                             -0.0386         -0.0355             0             0             0             0
Sort<false>/unstable/32768/1/manual_time                            -0.0012         -0.0019             0             0             0             0
Sort<false>/unstable/262144/1/manual_time                           -0.0059         -0.0059             0             0             0             0
Sort<false>/unstable/2097152/1/manual_time                          +0.0195         +0.0194             5             5             5             6
Sort<false>/unstable/16777216/1/manual_time                         +0.0171         +0.0172            43            44            43            44
Sort<false>/unstable/67108864/1/manual_time                         +0.0197         +0.0197           176           179           176           179
Sort<false>/unstable/1024/8/manual_time                             -0.0036         -0.0030             1             1             1             1
Sort<false>/unstable/4096/8/manual_time                             -0.0297         -0.0285             1             1             1             1
Sort<false>/unstable/32768/8/manual_time                            +0.0033         +0.0035             1             1             1             1
Sort<false>/unstable/262144/8/manual_time                           -0.0235         -0.0230             2             2             2             2
Sort<false>/unstable/2097152/8/manual_time                          -0.1053         -0.1052            26            23            26            23
Sort<false>/unstable/16777216/8/manual_time                         -0.0583         -0.0583           323           304           323           304
Sort<false>/unstable/67108864/8/manual_time                         -0.0452         -0.0452          1810          1728          1810          1728
Sort<true>/stable/1024/1/manual_time                                -0.0588         -0.0498             0             0             0             0
Sort<true>/stable/4096/1/manual_time                                -0.0367         -0.0322             0             0             0             0
Sort<true>/stable/32768/1/manual_time                               -0.0001         +0.0003             0             0             0             0
Sort<true>/stable/262144/1/manual_time                              -0.0053         -0.0045             0             0             0             0
Sort<true>/stable/2097152/1/manual_time                             +0.0188         +0.0187             5             5             5             6
Sort<true>/stable/16777216/1/manual_time                            +0.0168         +0.0168            43            44            43            44
Sort<true>/stable/67108864/1/manual_time                            +0.0221         +0.0221           175           179           175           179
Sort<true>/stable/1024/8/manual_time                                -0.0050         -0.0049             1             1             1             1
Sort<true>/stable/4096/8/manual_time                                -0.0337         -0.0329             1             1             1             1
Sort<true>/stable/32768/8/manual_time                               +0.0029         +0.0029             1             1             1             1
Sort<true>/stable/262144/8/manual_time                              -0.0236         -0.0231             2             2             2             2
Sort<true>/stable/2097152/8/manual_time                             -0.1014         -0.1014            26            23            26            23
Sort<true>/stable/16777216/8/manual_time                            -0.0584         -0.0583           323           304           323           304
Sort<true>/stable/67108864/8/manual_time                            -0.0452         -0.0452          1810          1728          1810          1728
Sort/strings/1024/manual_time                                       -0.0045         -0.0043             1             1             1             1
Sort/strings/4096/manual_time                                       +0.0006         +0.0006             2             2             2             2
Sort/strings/32768/manual_time                                      -0.0002         -0.0001             3             3             3             3
Sort/strings/262144/manual_time                                     -0.0024         -0.0023             4             4             4             4
Sort/strings/2097152/manual_time                                    -0.0062         -0.0063            55            55            55            55
Sort/strings/16777216/manual_time                                   -0.0002         -0.0002           486           486           486           486
TYPE_DISPATCHER_BENCH
Comparing /host_pwd/Work/compose/results/dispatcher_0.19 to /host_pwd/Work/compose/results/dispatcher_compressed
Benchmark                                                                       Time             CPU      Time Old      Time New       CPU Old       CPU New
------------------------------------------------------------------------------------------------------------------------------------------------------------
TypeDispatcher/fp64_bandwidth_host/1/1024/1/manual_time                      +0.0063         +0.0333          4832          4862         26684         27571
TypeDispatcher/fp64_bandwidth_host/2/1024/1/manual_time                      -0.0170         +0.0225          7425          7299         29032         29685
TypeDispatcher/fp64_bandwidth_host/4/1024/1/manual_time                      -0.0793         -0.0050         12377         11395         34006         33837
TypeDispatcher/fp64_bandwidth_host/8/1024/1/manual_time                      -0.1078         -0.0373         21465         19152         43202         41592
TypeDispatcher/fp64_bandwidth_host/1/2048/1/manual_time                      -0.0028         +0.0411          5208          5193         26816         27918
TypeDispatcher/fp64_bandwidth_host/2/2048/1/manual_time                      -0.0505         +0.0274          7664          7277         29253         30055
TypeDispatcher/fp64_bandwidth_host/4/2048/1/manual_time                      -0.0704         +0.0063         12510         11629         34045         34260
TypeDispatcher/fp64_bandwidth_host/8/2048/1/manual_time                      +0.0128         +0.0325         21506         21781         43116         44517
TypeDispatcher/fp64_bandwidth_host/1/4096/1/manual_time                      -0.0247         +0.0254          5320          5188         27088         27775
TypeDispatcher/fp64_bandwidth_host/2/4096/1/manual_time                      -0.0611         +0.0149          7942          7457         29524         29963
TypeDispatcher/fp64_bandwidth_host/4/4096/1/manual_time                      +0.0157         +0.0362         12726         12927         34138         35375
TypeDispatcher/fp64_bandwidth_host/8/4096/1/manual_time                      +0.0137         +0.0253         21414         21708         42910         43997
TypeDispatcher/fp64_bandwidth_host/1/8192/1/manual_time                      -0.0515         +0.0263          5291          5019         27084         27797
TypeDispatcher/fp64_bandwidth_host/2/8192/1/manual_time                      -0.0004         +0.0210          7970          7967         29723         30346
TypeDispatcher/fp64_bandwidth_host/4/8192/1/manual_time                      -0.0060         +0.0283         12791         12714         34330         35302
TypeDispatcher/fp64_bandwidth_host/8/8192/1/manual_time                      +0.0438         +0.0508         21812         22767         43289         45488
TypeDispatcher/fp64_bandwidth_host/1/16384/1/manual_time                     +0.0196         +0.0465          5380          5485         27222         28487
TypeDispatcher/fp64_bandwidth_host/2/16384/1/manual_time                     -0.0099         +0.0419          8090          8010         29583         30822
TypeDispatcher/fp64_bandwidth_host/4/16384/1/manual_time                     +0.0277         +0.0439         13214         13581         34663         36187
TypeDispatcher/fp64_bandwidth_host/8/16384/1/manual_time                     +0.0196         +0.0320         23450         23909         45410         46861
TypeDispatcher/fp64_bandwidth_host/1/32768/1/manual_time                     +0.0138         +0.0279          5529          5605         27362         28127
TypeDispatcher/fp64_bandwidth_host/2/32768/1/manual_time                     +0.0111         +0.0178          8382          8475         30113         30648
TypeDispatcher/fp64_bandwidth_host/4/32768/1/manual_time                     +0.0141         +0.0239         13845         14041         35591         36443
TypeDispatcher/fp64_bandwidth_host/8/32768/1/manual_time                     +0.0021         +0.0074         24977         25029         47108         47455
TypeDispatcher/fp64_bandwidth_host/1/65536/1/manual_time                     -0.0524         +0.0258          6449          6111         28005         28728
TypeDispatcher/fp64_bandwidth_host/2/65536/1/manual_time                     +0.0134         +0.0253          9400          9527         30987         31771
TypeDispatcher/fp64_bandwidth_host/4/65536/1/manual_time                     -0.0018         +0.0184         16137         16107         37927         38624
TypeDispatcher/fp64_bandwidth_host/8/65536/1/manual_time                     +0.0010         +0.0129         29434         29464         51320         51980
TypeDispatcher/fp64_bandwidth_host/1/131072/1/manual_time                    +0.0060         +0.0218          7213          7256         28974         29605
TypeDispatcher/fp64_bandwidth_host/2/131072/1/manual_time                    -0.0108         +0.0216         11575         11449         33595         34320
TypeDispatcher/fp64_bandwidth_host/4/131072/1/manual_time                    -0.0069         +0.0173         19833         19696         41830         42554
TypeDispatcher/fp64_bandwidth_host/8/131072/1/manual_time                    -0.0007         +0.0035         36455         36428         59150         59355
TypeDispatcher/fp64_bandwidth_host/1/262144/1/manual_time                    +0.0067         +0.0304          9689          9754         31425         32379
TypeDispatcher/fp64_bandwidth_host/2/262144/1/manual_time                    +0.0091         +0.0155         16551         16701         38405         38999
TypeDispatcher/fp64_bandwidth_host/4/262144/1/manual_time                    +0.0020         +0.0137         30391         30452         52393         53110
TypeDispatcher/fp64_bandwidth_host/8/262144/1/manual_time                    -0.0003         +0.0045         57965         57949         79968         80326
TypeDispatcher/fp64_bandwidth_host/1/524288/1/manual_time                    -0.0092         +0.0223         14871         14734         36594         37411
TypeDispatcher/fp64_bandwidth_host/2/524288/1/manual_time                    -0.0080         +0.0132         26928         26711         48802         49444
TypeDispatcher/fp64_bandwidth_host/4/524288/1/manual_time                    -0.0123         +0.0027         51158         50526         73129         73328
TypeDispatcher/fp64_bandwidth_host/8/524288/1/manual_time                    -0.0089         -0.0001         99246         98363        121190        121184
TypeDispatcher/fp64_bandwidth_host/1/1048576/1/manual_time                   +0.0058         +0.0178         25265         25411         47146         47986
TypeDispatcher/fp64_bandwidth_host/2/1048576/1/manual_time                   -0.0003         +0.0149         47903         47890         69597         70634
TypeDispatcher/fp64_bandwidth_host/4/1048576/1/manual_time                   -0.0007         +0.0071         93159         93094        115145        115966
TypeDispatcher/fp64_bandwidth_host/8/1048576/1/manual_time                   -0.0019         -0.0007        183569        183219        206170        206035
TypeDispatcher/fp64_bandwidth_host/1/2097152/1/manual_time                   -0.0013         +0.0104         45937         45879         67697         68402
TypeDispatcher/fp64_bandwidth_host/2/2097152/1/manual_time                   -0.0006         +0.0069         89336         89284        111194        111958
TypeDispatcher/fp64_bandwidth_host/4/2097152/1/manual_time                   -0.0015         +0.0032        176157        175892        198054        198681
TypeDispatcher/fp64_bandwidth_host/8/2097152/1/manual_time                   -0.0013         +0.0009        349794        349351        371842        372186
TypeDispatcher/fp64_bandwidth_host/1/4194304/1/manual_time                   -0.0006         +0.0070         87215         87160        109211        109971
TypeDispatcher/fp64_bandwidth_host/2/4194304/1/manual_time                   -0.0010         +0.0026        171872        171706        193842        194338
TypeDispatcher/fp64_bandwidth_host/4/4194304/1/manual_time                   -0.0007         +0.0012        341211        340986        363309        363758
TypeDispatcher/fp64_bandwidth_host/8/4194304/1/manual_time                   -0.0008         +0.0002        679859        679346        701970        702128
TypeDispatcher/fp64_bandwidth_host/1/8388608/1/manual_time                   -0.0007         +0.0029        169722        169610        191657        192220
TypeDispatcher/fp64_bandwidth_host/2/8388608/1/manual_time                   -0.0006         -0.0012        337052        336836        360018        359594
TypeDispatcher/fp64_bandwidth_host/4/8388608/1/manual_time                   -0.0005         +0.0008        671327        671019        693241        693765
TypeDispatcher/fp64_bandwidth_host/8/8388608/1/manual_time                   -0.0004         -0.0005       1340128       1339611       1363243       1362543
TypeDispatcher/fp64_bandwidth_host/1/16777216/1/manual_time                  -0.0002         -0.0003        334789        334727        357674        357563
TypeDispatcher/fp64_bandwidth_host/2/16777216/1/manual_time                  -0.0000         +0.0002        667047        667034        690012        690159
TypeDispatcher/fp64_bandwidth_host/4/16777216/1/manual_time                  -0.0002         +0.0002       1331543       1331317       1354724       1354953
TypeDispatcher/fp64_bandwidth_host/8/16777216/1/manual_time                  -0.0001         -0.0001       2660484       2660181       2683566       2683306
TypeDispatcher/fp64_bandwidth_host/1/33554432/1/manual_time                  -0.0002         +0.0011        664954        664833        687197        687922
TypeDispatcher/fp64_bandwidth_host/2/33554432/1/manual_time                  -0.0001         +0.0004       1327320       1327129       1349489       1350034
TypeDispatcher/fp64_bandwidth_host/4/33554432/1/manual_time                  -0.0000         -0.0023       2652047       2651935       2674440       2668420
TypeDispatcher/fp64_bandwidth_host/8/33554432/1/manual_time                  -0.0001         +0.0000       5301597       5301139       5324511       5324513
TypeDispatcher/fp64_bandwidth_host/1/67108864/1/manual_time                  -0.0001         +0.0006       1325174       1325056       1347420       1348189
TypeDispatcher/fp64_bandwidth_host/2/67108864/1/manual_time                  -0.0001         +0.0002       2647792       2647630       2670070       2670600
TypeDispatcher/fp64_bandwidth_host/4/67108864/1/manual_time                  -0.0000         +0.0001       5292955       5292839       5315559       5316068
TypeDispatcher/fp64_bandwidth_host/8/67108864/1/manual_time                  -0.0000         +0.0001      10583381      10583257      10605597      10606145
TypeDispatcher/fp64_bandwidth_device/1/1024/1/manual_time                    -0.0227         -0.0036         15030         14689         37320         37186
TypeDispatcher/fp64_bandwidth_device/2/1024/1/manual_time                    -0.0271         -0.0003         15956         15524         38401         38391
TypeDispatcher/fp64_bandwidth_device/4/1024/1/manual_time                    -0.0635         -0.0238         16856         15785         39194         38260
TypeDispatcher/fp64_bandwidth_device/8/1024/1/manual_time                    -0.1106         -0.0496         20109         17886         42493         40385
TypeDispatcher/fp64_bandwidth_device/1/2048/1/manual_time                    -0.0405         -0.0148         15347         14726         37717         37158
TypeDispatcher/fp64_bandwidth_device/2/2048/1/manual_time                    -0.0627         -0.0329         15946         14946         38439         37175
TypeDispatcher/fp64_bandwidth_device/4/2048/1/manual_time                    -0.0784         -0.0304         17130         15787         39555         38351
TypeDispatcher/fp64_bandwidth_device/8/2048/1/manual_time                    -0.0351         -0.0149         19876         19179         42205         41577
TypeDispatcher/fp64_bandwidth_device/1/4096/1/manual_time                    -0.0387         -0.0159         15342         14749         37637         37038
TypeDispatcher/fp64_bandwidth_device/2/4096/1/manual_time                    -0.0669         -0.0281         16244         15157         38647         37560
TypeDispatcher/fp64_bandwidth_device/4/4096/1/manual_time                    -0.0309         -0.0123         17313         16778         39663         39175
TypeDispatcher/fp64_bandwidth_device/8/4096/1/manual_time                    -0.0331         -0.0119         20051         19387         42271         41767
TypeDispatcher/fp64_bandwidth_device/1/8192/1/manual_time                    -0.0376         -0.0086         15341         14764         37582         37257
TypeDispatcher/fp64_bandwidth_device/2/8192/1/manual_time                    -0.0546         -0.0229         16347         15455         38786         37898
TypeDispatcher/fp64_bandwidth_device/4/8192/1/manual_time                    -0.0569         -0.0260         17474         16480         40012         38973
TypeDispatcher/fp64_bandwidth_device/8/8192/1/manual_time                    -0.0185         -0.0083         20261         19887         42660         42304
TypeDispatcher/fp64_bandwidth_device/1/16384/1/manual_time                   -0.0578         -0.0293         15568         14668         38070         36953
TypeDispatcher/fp64_bandwidth_device/2/16384/1/manual_time                   -0.0607         -0.0222         16342         15350         38715         37857
TypeDispatcher/fp64_bandwidth_device/4/16384/1/manual_time                   -0.0339         -0.0165         17862         17257         40254         39591
TypeDispatcher/fp64_bandwidth_device/8/16384/1/manual_time                   -0.0501         -0.0265         21673         20587         44275         43101
TypeDispatcher/fp64_bandwidth_device/1/32768/1/manual_time                   -0.0302         -0.0091         15514         15045         37864         37518
TypeDispatcher/fp64_bandwidth_device/2/32768/1/manual_time                   -0.0485         -0.0316         16698         15888         39370         38127
TypeDispatcher/fp64_bandwidth_device/4/32768/1/manual_time                   -0.0323         -0.0134         18528         17930         41086         40536
TypeDispatcher/fp64_bandwidth_device/8/32768/1/manual_time                   -0.0298         -0.0161         22267         21603         45168         44440
TypeDispatcher/fp64_bandwidth_device/1/65536/1/manual_time                   -0.0520         -0.0189         16205         15363         38522         37792
TypeDispatcher/fp64_bandwidth_device/2/65536/1/manual_time                   -0.0335         -0.0157         17573         16984         39929         39301
TypeDispatcher/fp64_bandwidth_device/4/65536/1/manual_time                   -0.0230         -0.0117         19934         19475         42595         42098
TypeDispatcher/fp64_bandwidth_device/8/65536/1/manual_time                   -0.0236         -0.0102         25906         25294         48459         47963
TypeDispatcher/fp64_bandwidth_device/1/131072/1/manual_time                  -0.0214         -0.0045         17072         16707         39243         39067
TypeDispatcher/fp64_bandwidth_device/2/131072/1/manual_time                  -0.0297         -0.0039         19621         19039         41924         41759
TypeDispatcher/fp64_bandwidth_device/4/131072/1/manual_time                  -0.0283         -0.0133         25226         24512         47790         47155
TypeDispatcher/fp64_bandwidth_device/8/131072/1/manual_time                  -0.0120         -0.0060         36129         35695         58890         58534
TypeDispatcher/fp64_bandwidth_device/1/262144/1/manual_time                  -0.0163         +0.0020         19474         19158         41790         41873
TypeDispatcher/fp64_bandwidth_device/2/262144/1/manual_time                  -0.0178         -0.0076         25271         24821         47801         47440
TypeDispatcher/fp64_bandwidth_device/4/262144/1/manual_time                  -0.0072         -0.0059         35966         35708         59034         58683
TypeDispatcher/fp64_bandwidth_device/8/262144/1/manual_time                  -0.0017         -0.0026         56837         56743         79751         79546
TypeDispatcher/fp64_bandwidth_device/1/524288/1/manual_time                  -0.0215         -0.0048         24765         24233         47134         46908
TypeDispatcher/fp64_bandwidth_device/2/524288/1/manual_time                  -0.0193         -0.0065         36241         35542         58880         58495
TypeDispatcher/fp64_bandwidth_device/4/524288/1/manual_time                  -0.0079         -0.0026         56918         56466         79674         79470
TypeDispatcher/fp64_bandwidth_device/8/524288/1/manual_time                  -0.0068         -0.0038         99283         98611        122101        121636
TypeDispatcher/fp64_bandwidth_device/1/1048576/1/manual_time                 -0.0057         -0.0019         35056         34855         57649         57538
TypeDispatcher/fp64_bandwidth_device/2/1048576/1/manual_time                 -0.0094         -0.0026         56585         56051         79038         78832
TypeDispatcher/fp64_bandwidth_device/4/1048576/1/manual_time                 -0.0105         -0.0058         98401         97371        121080        120378
TypeDispatcher/fp64_bandwidth_device/8/1048576/1/manual_time                 -0.0000         +0.0028        181070        181063        203713        204292
TypeDispatcher/fp64_bandwidth_device/1/2097152/1/manual_time                 -0.0060         +0.0018         55865         55528         78232         78369
TypeDispatcher/fp64_bandwidth_device/2/2097152/1/manual_time                 -0.0072         -0.0038         97875         97174        120512        120057
TypeDispatcher/fp64_bandwidth_device/4/2097152/1/manual_time                 -0.0030         -0.0012        179958        179414        202564        202330
TypeDispatcher/fp64_bandwidth_device/8/2097152/1/manual_time                 -0.0023         -0.0019        345144        344361        367968        367286
TypeDispatcher/fp64_bandwidth_device/1/4194304/1/manual_time                 -0.0044         -0.0013         97141         96718        119757        119598
TypeDispatcher/fp64_bandwidth_device/2/4194304/1/manual_time                 -0.0028         -0.0008        179997        179489        202538        202375
TypeDispatcher/fp64_bandwidth_device/4/4194304/1/manual_time                 -0.0020         -0.0010        345137        344445        367839        367474
TypeDispatcher/fp64_bandwidth_device/8/4194304/1/manual_time                 -0.0015         -0.0007        672824        671839        695465        694979
TypeDispatcher/fp64_bandwidth_device/1/8388608/1/manual_time                 -0.0037         -0.0020        179995        179328        202441        202041
TypeDispatcher/fp64_bandwidth_device/2/8388608/1/manual_time                 -0.0019         -0.0011        345160        344511        367869        367453
TypeDispatcher/fp64_bandwidth_device/4/8388608/1/manual_time                 -0.0015         -0.0011        672221        671195        694924        694179
TypeDispatcher/fp64_bandwidth_device/8/8388608/1/manual_time                 -0.0006         +0.0000       1325029       1324195       1347819       1347820
TypeDispatcher/fp64_bandwidth_device/1/16777216/1/manual_time                -0.0021         -0.0016        345519        344787        368126        367536
TypeDispatcher/fp64_bandwidth_device/2/16777216/1/manual_time                -0.0017         -0.0017        674002        672860        696826        695630
TypeDispatcher/fp64_bandwidth_device/4/16777216/1/manual_time                -0.0006         -0.0001       1334483       1333724       1357320       1357181
TypeDispatcher/fp64_bandwidth_device/8/16777216/1/manual_time                -0.0003         -0.0001       2678195       2677434       2701321       2701021
TypeDispatcher/fp64_bandwidth_device/1/33554432/1/manual_time                -0.0013         -0.0012        676753        675861        699729        698901
TypeDispatcher/fp64_bandwidth_device/2/33554432/1/manual_time                -0.0006         +0.0009       1333702       1332946       1356578       1357802
TypeDispatcher/fp64_bandwidth_device/4/33554432/1/manual_time                -0.0005         -0.0004       2651307       2649965       2674205       2673239
TypeDispatcher/fp64_bandwidth_device/8/33554432/1/manual_time                -0.0009         -0.0008       5283848       5279213       5307439       5303327
TypeDispatcher/fp64_bandwidth_device/1/67108864/1/manual_time                -0.0010         -0.0009       1338922       1337622       1361892       1360709
TypeDispatcher/fp64_bandwidth_device/2/67108864/1/manual_time                -0.0006         -0.0005       2651512       2649844       2674735       2673483
TypeDispatcher/fp64_bandwidth_device/4/67108864/1/manual_time                -0.0005         -0.0004       5262783       5260257       5286135       5284088
TypeDispatcher/fp64_bandwidth_device/8/67108864/1/manual_time                -0.0006         -0.0030      10510856      10504589      10559912      10527946
TypeDispatcher/fp64_bandwidth_no/1/1024/1/manual_time                        -0.0001         +0.0220          5031          5031         26887         27479
TypeDispatcher/fp64_bandwidth_no/2/1024/1/manual_time                        -0.0583         +0.0094          5552          5228         27450         27708
TypeDispatcher/fp64_bandwidth_no/4/1024/1/manual_time                        -0.1229         -0.0118          6606          5793         28632         28293
TypeDispatcher/fp64_bandwidth_no/8/1024/1/manual_time                        -0.1470         -0.0208          8335          7110         30220         29592
TypeDispatcher/fp64_bandwidth_no/1/2048/1/manual_time                        -0.0580         +0.0154          5311          5003         27018         27435
TypeDispatcher/fp64_bandwidth_no/2/2048/1/manual_time                        -0.0784         +0.0036          5744          5294         27626         27726
TypeDispatcher/fp64_bandwidth_no/4/2048/1/manual_time                        -0.0656         +0.0029          6638          6202         28452         28533
TypeDispatcher/fp64_bandwidth_no/8/2048/1/manual_time                        -0.0035         +0.0201          8394          8365         30142         30748
TypeDispatcher/fp64_bandwidth_no/1/4096/1/manual_time                        +0.0025         +0.0265          5077          5089         26882         27595
TypeDispatcher/fp64_bandwidth_no/2/4096/1/manual_time                        +0.0269         +0.0435          5788          5944         27629         28832
TypeDispatcher/fp64_bandwidth_no/4/4096/1/manual_time                        -0.0060         +0.0303          6951          6909         28743         29615
TypeDispatcher/fp64_bandwidth_no/8/4096/1/manual_time                        +0.0036         +0.0239          8732          8763         30344         31069
TypeDispatcher/fp64_bandwidth_no/1/8192/1/manual_time                        -0.0064         +0.0321          5544          5508         27286         28162
TypeDispatcher/fp64_bandwidth_no/2/8192/1/manual_time                        -0.0626         +0.0210          6209          5820         27784         28367
TypeDispatcher/fp64_bandwidth_no/4/8192/1/manual_time                        -0.0328         +0.0203          7231          6994         28917         29504
TypeDispatcher/fp64_bandwidth_no/8/8192/1/manual_time                        +0.0021         +0.0266          9281          9300         31103         31932
TypeDispatcher/fp64_bandwidth_no/1/16384/1/manual_time                       +0.0021         +0.0263          5723          5735         27569         28295
TypeDispatcher/fp64_bandwidth_no/2/16384/1/manual_time                       -0.0148         +0.0565          6380          6285         28155         29745
TypeDispatcher/fp64_bandwidth_no/4/16384/1/manual_time                       +0.0073         +0.0455          7626          7682         29411         30748
TypeDispatcher/fp64_bandwidth_no/8/16384/1/manual_time                       +0.0212         +0.0551          9965         10175         31885         33641
TypeDispatcher/fp64_bandwidth_no/1/32768/1/manual_time                       +0.0308         +0.0575          5830          6009         27534         29119
TypeDispatcher/fp64_bandwidth_no/2/32768/1/manual_time                       +0.0237         +0.0418          6712          6872         28591         29787
TypeDispatcher/fp64_bandwidth_no/4/32768/1/manual_time                       +0.0191         +0.0457          8481          8643         30196         31574
TypeDispatcher/fp64_bandwidth_no/8/32768/1/manual_time                       +0.0167         +0.0239         11344         11533         33562         34363
TypeDispatcher/fp64_bandwidth_no/1/65536/1/manual_time                       +0.0347         +0.0492          6382          6604         28089         29472
TypeDispatcher/fp64_bandwidth_no/2/65536/1/manual_time                       +0.0205         +0.0384          7791          7951         29442         30572
TypeDispatcher/fp64_bandwidth_no/4/65536/1/manual_time                       +0.0325         +0.0313         10589         10933         32494         33510
TypeDispatcher/fp64_bandwidth_no/8/65536/1/manual_time                       -0.0017         +0.0101         16148         16121         38096         38481
TypeDispatcher/fp64_bandwidth_no/1/131072/1/manual_time                      +0.0000         +0.0264          7545          7545         29253         30027
TypeDispatcher/fp64_bandwidth_no/2/131072/1/manual_time                      -0.0188         +0.0224         10184          9993         31973         32688
TypeDispatcher/fp64_bandwidth_no/4/131072/1/manual_time                      +0.0060         +0.0162         15793         15888         37731         38343
TypeDispatcher/fp64_bandwidth_no/8/131072/1/manual_time                      +0.0056         +0.0171         26458         26605         48448         49275
TypeDispatcher/fp64_bandwidth_no/1/262144/1/manual_time                      +0.0085         +0.0218         10012         10097         31933         32628
TypeDispatcher/fp64_bandwidth_no/2/262144/1/manual_time                      +0.0042         +0.0128         15625         15691         37571         38052
TypeDispatcher/fp64_bandwidth_no/4/262144/1/manual_time                      +0.0079         +0.0157         26379         26586         48493         49253
TypeDispatcher/fp64_bandwidth_no/8/262144/1/manual_time                      +0.0007         +0.0040         47232         47266         69411         69691
TypeDispatcher/fp64_bandwidth_no/1/524288/1/manual_time                      +0.0020         +0.0210         15047         15078         36878         37653
TypeDispatcher/fp64_bandwidth_no/2/524288/1/manual_time                      -0.0053         +0.0141         26523         26384         48535         49218
TypeDispatcher/fp64_bandwidth_no/4/524288/1/manual_time                      -0.0070         +0.0097         47324         46991         69295         69965
TypeDispatcher/fp64_bandwidth_no/8/524288/1/manual_time                      -0.0045         +0.0053         89028         88628        110962        111550
TypeDispatcher/fp64_bandwidth_no/1/1048576/1/manual_time                     +0.0018         +0.0131         25664         25710         47568         48189
TypeDispatcher/fp64_bandwidth_no/2/1048576/1/manual_time                     +0.0005         +0.0131         46649         46672         68450         69344
TypeDispatcher/fp64_bandwidth_no/4/1048576/1/manual_time                     +0.0011         +0.0085         88471         88565        110480        111415
TypeDispatcher/fp64_bandwidth_no/8/1048576/1/manual_time                     +0.0012         +0.0053        170928        171136        192894        193926
TypeDispatcher/fp64_bandwidth_no/1/2097152/1/manual_time                     -0.0001         +0.0083         46272         46267         68183         68752
TypeDispatcher/fp64_bandwidth_no/2/2097152/1/manual_time                     -0.0012         +0.0048         88171         88062        110209        110741
TypeDispatcher/fp64_bandwidth_no/4/2097152/1/manual_time                     +0.0002         +0.0043        170050        170081        191959        192789
TypeDispatcher/fp64_bandwidth_no/8/2097152/1/manual_time                     -0.0007         +0.0036        334560        334341        356690        357971
TypeDispatcher/fp64_bandwidth_no/1/4194304/1/manual_time                     -0.0009         +0.0056         87610         87531        109583        110197
TypeDispatcher/fp64_bandwidth_no/2/4194304/1/manual_time                     -0.0017         +0.0022        170342        170054        192375        192800
TypeDispatcher/fp64_bandwidth_no/4/4194304/1/manual_time                     -0.0005         +0.0014        335906        335731        358098        358599
TypeDispatcher/fp64_bandwidth_no/8/4194304/1/manual_time                     -0.0007         -0.0001        662787        662355        685235        685162
TypeDispatcher/fp64_bandwidth_no/1/8388608/1/manual_time                     -0.0002         +0.0028        170179        170141        192205        192738
TypeDispatcher/fp64_bandwidth_no/2/8388608/1/manual_time                     -0.0010         +0.0010        335180        334840        357327        357671
TypeDispatcher/fp64_bandwidth_no/4/8388608/1/manual_time                     -0.0001         +0.0014        661869        661791        683735        684685
TypeDispatcher/fp64_bandwidth_no/8/8388608/1/manual_time                     -0.0002         +0.0004       1314612       1314411       1336873       1337397
TypeDispatcher/fp64_bandwidth_no/1/16777216/1/manual_time                    -0.0004         -0.0011        335536        335407        358509        358127
TypeDispatcher/fp64_bandwidth_no/2/16777216/1/manual_time                    +0.0000         +0.0032        663289        663293        685233        687441
TypeDispatcher/fp64_bandwidth_no/4/16777216/1/manual_time                    -0.0001         -0.0004       1324856       1324662       1348192       1347606
TypeDispatcher/fp64_bandwidth_no/8/16777216/1/manual_time                    +0.0007         +0.0009       2645425       2647265       2667906       2670433
TypeDispatcher/fp64_bandwidth_no/1/33554432/1/manual_time                    -0.0001         +0.0010        666001        665912        688033        688725
TypeDispatcher/fp64_bandwidth_no/2/33554432/1/manual_time                    -0.0000         +0.0005       1321870       1321849       1344036       1344765
TypeDispatcher/fp64_bandwidth_no/4/33554432/1/manual_time                    +0.0007         +0.0009       2639903       2641650       2662294       2664641
TypeDispatcher/fp64_bandwidth_no/8/33554432/1/manual_time                    -0.0000         +0.0001       5264307       5264276       5286980       5287452
TypeDispatcher/fp64_bandwidth_no/1/67108864/1/manual_time                    -0.0001         +0.0005       1327104       1326925       1349374       1350031
TypeDispatcher/fp64_bandwidth_no/2/67108864/1/manual_time                    -0.0000         +0.0002       2638083       2638033       2660432       2660923
TypeDispatcher/fp64_bandwidth_no/4/67108864/1/manual_time                    +0.0001         +0.0021       5255053       5255322       5277566       5288594
TypeDispatcher/fp64_bandwidth_no/8/67108864/1/manual_time                    +0.0001         +0.0002      10517137      10518543      10539960      10541702
TypeDispatcher/fp64_compute_host/1/1024/1/manual_time                        +0.0018         +0.0172         20648         20685         42523         43252
TypeDispatcher/fp64_compute_host/2/1024/1/manual_time                        -0.0065         +0.0102         38842         38589         60616         61237
TypeDispatcher/fp64_compute_host/4/1024/1/manual_time                        -0.0155         -0.0035         74985         73820         96789         96453
TypeDispatcher/fp64_compute_host/8/1024/1/manual_time                        -0.0164         -0.0144        146830        144423        169246        166816
TypeDispatcher/fp64_compute_host/1/2048/1/manual_time                        -0.0118         +0.0077         21022         20774         42903         43235
TypeDispatcher/fp64_compute_host/2/2048/1/manual_time                        -0.0185         -0.0031         39206         38479         61087         60900
TypeDispatcher/fp64_compute_host/4/2048/1/manual_time                        -0.0188         -0.0076         75301         73883         97087         96351
TypeDispatcher/fp64_compute_host/8/2048/1/manual_time                        -0.0026         +0.0013        147303        146923        169069        169288
TypeDispatcher/fp64_compute_host/1/4096/1/manual_time                        -0.0178         -0.0018         21156         20778         43073         42996
TypeDispatcher/fp64_compute_host/2/4096/1/manual_time                        -0.0166         -0.0005         39453         38800         61106         61073
TypeDispatcher/fp64_compute_host/4/4096/1/manual_time                        +0.0001         +0.0049         75565         75570         97308         97786
TypeDispatcher/fp64_compute_host/8/4096/1/manual_time                        +0.0006         +0.0038        147600        147688        169402        170039
TypeDispatcher/fp64_compute_host/1/8192/1/manual_time                        -0.0197         +0.0080         21271         20851         43192         43537
TypeDispatcher/fp64_compute_host/2/8192/1/manual_time                        +0.0001         +0.0085         39499         39502         61175         61696
TypeDispatcher/fp64_compute_host/4/8192/1/manual_time                        -0.0054         +0.0024         75747         75336         97445         97681
TypeDispatcher/fp64_compute_host/8/8192/1/manual_time                        +0.0028         +0.0055        148194        148614        170066        170996
TypeDispatcher/fp64_compute_host/1/16384/1/manual_time                       +0.0004         +0.0129         21333         21342         43389         43951
TypeDispatcher/fp64_compute_host/2/16384/1/manual_time                       -0.0087         +0.0031         39770         39423         61637         61825
TypeDispatcher/fp64_compute_host/4/16384/1/manual_time                       -0.0005         +0.0047         76373         76334         98184         98642
TypeDispatcher/fp64_compute_host/8/16384/1/manual_time                       -0.0022         +0.0009        149675        149338        171623        171771
TypeDispatcher/fp64_compute_host/1/32768/1/manual_time                       +0.0013         +0.0160         37129         37176         59060         60003
TypeDispatcher/fp64_compute_host/2/32768/1/manual_time                       +0.0044         +0.0116         71342         71657         93401         94481
TypeDispatcher/fp64_compute_host/4/32768/1/manual_time                       +0.0005         +0.0089        139512        139582        161299        162730
TypeDispatcher/fp64_compute_host/8/32768/1/manual_time                       -0.0001         +0.0012        276168        276131        298765        299134
TypeDispatcher/fp64_compute_host/1/65536/1/manual_time                       -0.0004         +0.0092         68585         68555         90593         91428
TypeDispatcher/fp64_compute_host/2/65536/1/manual_time                       -0.0002         -0.0016        134211        134181        156579        156333
TypeDispatcher/fp64_compute_host/4/65536/1/manual_time                       +0.0002         +0.0013        265375        265430        287620        287990
TypeDispatcher/fp64_compute_host/8/65536/1/manual_time                       -0.0005         -0.0000        528324        528080        550614        550604
TypeDispatcher/fp64_compute_host/1/131072/1/manual_time                      +0.0007         +0.0023        116050        116134        138173        138489
TypeDispatcher/fp64_compute_host/2/131072/1/manual_time                      -0.0010         -0.0005        229346        229117        251912        251785
TypeDispatcher/fp64_compute_host/4/131072/1/manual_time                      +0.0002         +0.0010        455251        455324        478292        478764
TypeDispatcher/fp64_compute_host/8/131072/1/manual_time                      -0.0000         +0.0004        907801        907783        930292        930672
TypeDispatcher/fp64_compute_host/1/262144/1/manual_time                      -0.0421         -0.0348        222797        213408        245070        236549
TypeDispatcher/fp64_compute_host/2/262144/1/manual_time                      -0.0373         -0.0360        444059        427501        466745        449959
TypeDispatcher/fp64_compute_host/4/262144/1/manual_time                      -0.0349         -0.0354        881074        850313        905842        873735
TypeDispatcher/fp64_compute_host/8/262144/1/manual_time                      -0.0274         -0.0257       1747668       1699813       1770316       1724826
TypeDispatcher/fp64_compute_host/1/524288/1/manual_time                      +0.0085         +0.0103        413328        416825        435611        440087
TypeDispatcher/fp64_compute_host/2/524288/1/manual_time                      +0.0074         +0.0079        823726        829853        846194        852878
TypeDispatcher/fp64_compute_host/4/524288/1/manual_time                      +0.0083         +0.0090       1644679       1658406       1667246       1682211
TypeDispatcher/fp64_compute_host/8/524288/1/manual_time                      +0.0050         +0.0052       3286593       3302965       3309239       3326413
TypeDispatcher/fp64_compute_host/1/1048576/1/manual_time                     -0.0001         +0.0018        819928        819845        842223        843725
TypeDispatcher/fp64_compute_host/2/1048576/1/manual_time                     -0.0004         +0.0010       1637386       1636661       1659821       1661488
TypeDispatcher/fp64_compute_host/4/1048576/1/manual_time                     -0.0003         +0.0006       3271690       3270647       3294216       3296153
TypeDispatcher/fp64_compute_host/8/1048576/1/manual_time                     -0.0005         -0.0001       6540891       6537886       6562946       6562101
TypeDispatcher/fp64_compute_host/1/2097152/1/manual_time                     -0.0017         -0.0005       1623990       1621310       1646275       1645396
TypeDispatcher/fp64_compute_host/2/2097152/1/manual_time                     -0.0024         -0.0015       3246995       3239249       3269492       3264425
TypeDispatcher/fp64_compute_host/4/2097152/1/manual_time                     -0.0019         -0.0011       6487675       6475662       6510321       6503217
TypeDispatcher/fp64_compute_host/8/2097152/1/manual_time                     -0.0020         -0.0018      12974914      12949523      12997542      12974132
TypeDispatcher/fp64_compute_host/1/4194304/1/manual_time                     -0.0004         -0.0000       3224373       3223203       3247229       3247226
TypeDispatcher/fp64_compute_host/2/4194304/1/manual_time                     +0.0031         +0.0033       6446409       6466332       6470273       6491406
TypeDispatcher/fp64_compute_host/4/4194304/1/manual_time                     +0.0030         +0.0032      12890093      12928929      12913001      12953746
TypeDispatcher/fp64_compute_host/8/4194304/1/manual_time                     -0.0002         -0.0001      25777835      25773549      25799619      25796604
TypeDispatcher/fp64_compute_host/1/8388608/1/manual_time                     +0.0002         +0.0003       6425065       6426305       6447829       6449836
TypeDispatcher/fp64_compute_host/2/8388608/1/manual_time                     +0.0003         +0.0003      12846611      12850347      12869138      12873141
TypeDispatcher/fp64_compute_host/4/8388608/1/manual_time                     +0.0001         +0.0004      25693033      25696787      25714700      25723916
TypeDispatcher/fp64_compute_host/8/8388608/1/manual_time                     +0.0002         +0.0003      51376640      51386295      51397340      51413446
TypeDispatcher/fp64_compute_host/1/16777216/1/manual_time                    +0.0003         +0.0004      12816943      12821187      12839373      12843969
TypeDispatcher/fp64_compute_host/2/16777216/1/manual_time                    +0.0003         +0.0005      25630910      25639747      25652565      25666199
TypeDispatcher/fp64_compute_host/4/16777216/1/manual_time                    +0.0002         +0.0002      51258880      51268974      51279078      51289343
TypeDispatcher/fp64_compute_host/8/16777216/1/manual_time                    +0.0002         +0.0003     102514544     102538678     102532942     102559722
TypeDispatcher/fp64_compute_host/1/33554432/1/manual_time                    +0.0004         +0.0005      25612477      25623931      25633604      25645850
TypeDispatcher/fp64_compute_host/2/33554432/1/manual_time                    +0.0003         +0.0003      51227720      51242715      51248236      51263541
TypeDispatcher/fp64_compute_host/4/33554432/1/manual_time                    +0.0003         +0.0004     102450759     102486016     102467446     102503466
TypeDispatcher/fp64_compute_host/8/33554432/1/manual_time                    +0.0004         +0.0003     204890788     204971358     204901784     204962122
TypeDispatcher/fp64_compute_host/1/67108864/1/manual_time                    +0.0002         +0.0003      51209875      51222455      51230999      51248696
TypeDispatcher/fp64_compute_host/2/67108864/1/manual_time                    +0.0004         +0.0010     102415946     102453249     102433845     102536852
TypeDispatcher/fp64_compute_host/4/67108864/1/manual_time                    +0.0003         +0.0005     204834471     204893524     204844742     204943317
TypeDispatcher/fp64_compute_host/8/67108864/1/manual_time                    -0.0032         -0.0034     411097586     409777164     411089549     409695643
TypeDispatcher/fp64_compute_device/1/1024/1/manual_time                      +0.2162         +0.2784         30348         36910         52755         67439
TypeDispatcher/fp64_compute_device/2/1024/1/manual_time                      +0.1031         +0.1542         47163         52024         69951         80739
TypeDispatcher/fp64_compute_device/4/1024/1/manual_time                      -0.0031         +0.0107         80333         80082        102989        104087
TypeDispatcher/fp64_compute_device/8/1024/1/manual_time                      -0.0078         +0.0034        147022        145883        169796        170374
TypeDispatcher/fp64_compute_device/1/2048/1/manual_time                      +0.0159         +0.0395         30593         31079         52952         55041
TypeDispatcher/fp64_compute_device/2/2048/1/manual_time                      +0.0016         +0.0238         47358         47435         69897         71564
TypeDispatcher/fp64_compute_device/4/2048/1/manual_time                      -0.0021         +0.0140         80218         80047        102649        104090
TypeDispatcher/fp64_compute_device/8/2048/1/manual_time                      +0.0053         +0.0141        146519        147295        169008        171392
TypeDispatcher/fp64_compute_device/1/4096/1/manual_time                      +0.0008         +0.0312         30869         30892         53069         54723
TypeDispatcher/fp64_compute_device/2/4096/1/manual_time                      +0.0020         +0.0236         47501         47597         69964         71615
TypeDispatcher/fp64_compute_device/4/4096/1/manual_time                      +0.0098         +0.0236         80405         81193        102895        105320
TypeDispatcher/fp64_compute_device/8/4096/1/manual_time                      +0.0060         +0.0159        146358        147231        168654        171328
TypeDispatcher/fp64_compute_device/1/8192/1/manual_time                      +0.0114         +0.0431         30766         31116         52956         55241
TypeDispatcher/fp64_compute_device/2/8192/1/manual_time                      +0.0132         +0.0323         47356         47983         69464         71707
TypeDispatcher/fp64_compute_device/4/8192/1/manual_time                      +0.0063         +0.0202         80480         80985        103014        105091
TypeDispatcher/fp64_compute_device/8/8192/1/manual_time                      +0.0053         +0.0117        146784        147559        169257        171237
TypeDispatcher/fp64_compute_device/1/16384/1/manual_time                     +0.0179         +0.0349         30891         31445         53182         55038
TypeDispatcher/fp64_compute_device/2/16384/1/manual_time                     +0.0013         +0.0223         47852         47916         70130         71696
TypeDispatcher/fp64_compute_device/4/16384/1/manual_time                     +0.0073         +0.0188         81057         81652        103455        105396
TypeDispatcher/fp64_compute_device/8/16384/1/manual_time                     +0.0019         +0.0075        147566        147851        170103        171372
TypeDispatcher/fp64_compute_device/1/32768/1/manual_time                     +0.0062         +0.0195         46720         47010         68984         70330
TypeDispatcher/fp64_compute_device/2/32768/1/manual_time                     +0.0007         +0.0018         78835         78889        101541        101725
TypeDispatcher/fp64_compute_device/4/32768/1/manual_time                     -0.0010         +0.0033        141758        141621        164242        164783
TypeDispatcher/fp64_compute_device/8/32768/1/manual_time                     +0.0004         +0.0025        267648        267749        290606        291336
TypeDispatcher/fp64_compute_device/1/65536/1/manual_time                     +0.0021         +0.0120         78358         78521        100523        101729
TypeDispatcher/fp64_compute_device/2/65536/1/manual_time                     +0.0007         +0.0050        141798        141895        164275        165098
TypeDispatcher/fp64_compute_device/4/65536/1/manual_time                     +0.0009         +0.0035        266899        267145        289638        290651
TypeDispatcher/fp64_compute_device/8/65536/1/manual_time                     -0.0005         +0.0011        518500        518247        541303        541894
TypeDispatcher/fp64_compute_device/1/131072/1/manual_time                    -0.0011         +0.0024        126088        125946        148614        148969
TypeDispatcher/fp64_compute_device/2/131072/1/manual_time                    +0.0010         +0.0058        235951        236179        258604        260116
TypeDispatcher/fp64_compute_device/4/131072/1/manual_time                    +0.0006         -0.0018        455472        455729        481837        480991
TypeDispatcher/fp64_compute_device/8/131072/1/manual_time                    +0.0006         +0.0033        894900        895472        917794        920848
TypeDispatcher/fp64_compute_device/1/262144/1/manual_time                    -0.0211         -0.0176        233929        228994        256825        252303
TypeDispatcher/fp64_compute_device/2/262144/1/manual_time                    -0.0256         -0.0225        447789        436337        470607        460024
TypeDispatcher/fp64_compute_device/4/262144/1/manual_time                    +0.0017         +0.0036        831230        832647        854222        857270
TypeDispatcher/fp64_compute_device/8/262144/1/manual_time                    -0.0016         -0.0013       1770924       1768073       1793978       1791600
TypeDispatcher/fp64_compute_device/1/524288/1/manual_time                    +0.0055         +0.0075        424025        426343        446826        450190
TypeDispatcher/fp64_compute_device/2/524288/1/manual_time                    +0.0002         +0.0014        831033        831228        853743        854916
TypeDispatcher/fp64_compute_device/4/524288/1/manual_time                    -0.0127         -0.0117       1761473       1739172       1784641       1763761
TypeDispatcher/fp64_compute_device/8/524288/1/manual_time                    -0.0010         -0.0003       3315475       3312007       3338868       3337791
TypeDispatcher/fp64_compute_device/1/1048576/1/manual_time                   -0.0012         -0.0002        831570        830593        854660        854474
TypeDispatcher/fp64_compute_device/2/1048576/1/manual_time                   +0.0002         +0.0007       1645474       1645857       1668290       1669527
TypeDispatcher/fp64_compute_device/4/1048576/1/manual_time                   -0.0012         -0.0009       3334778       3330677       3357985       3355019
TypeDispatcher/fp64_compute_device/8/1048576/1/manual_time                   -0.0011         -0.0009       6530909       6523933       6554104       6547889
TypeDispatcher/fp64_compute_device/1/2097152/1/manual_time                   -0.0003         +0.0024       1634878       1634409       1657614       1661551
TypeDispatcher/fp64_compute_device/2/2097152/1/manual_time                   +0.0010         +0.0017       3244572       3247834       3267611       3273116
TypeDispatcher/fp64_compute_device/4/2097152/1/manual_time                   +0.0002         +0.0009       6523655       6524832       6546783       6552535
TypeDispatcher/fp64_compute_device/8/2097152/1/manual_time                   -0.0005         -0.0005      13028807      13022189      13052659      13046306
TypeDispatcher/fp64_compute_device/1/4194304/1/manual_time                   -0.0003         +0.0001       3238898       3237803       3262059       3262320
TypeDispatcher/fp64_compute_device/2/4194304/1/manual_time                   +0.0000         +0.0003       6461848       6462018       6484713       6486897
TypeDispatcher/fp64_compute_device/4/4194304/1/manual_time                   -0.0001         +0.0001      12955705      12954264      12978310      12979068
TypeDispatcher/fp64_compute_device/8/4194304/1/manual_time                   -0.0012         -0.0011      25882662      25850501      25903941      25874698
TypeDispatcher/fp64_compute_device/1/8388608/1/manual_time                   -0.0003         -0.0002       6442275       6440237       6465627       6464575
TypeDispatcher/fp64_compute_device/2/8388608/1/manual_time                   -0.0001         -0.0001      12868855      12867224      12892088      12891180
TypeDispatcher/fp64_compute_device/4/8388608/1/manual_time                   -0.0004         -0.0002      25767519      25756331      25789573      25783343
TypeDispatcher/fp64_compute_device/8/8388608/1/manual_time                   -0.0005         -0.0004      51493523      51465362      51514081      51491460
TypeDispatcher/fp64_compute_device/1/16777216/1/manual_time                  -0.0002         +0.0001      12842572      12840505      12863556      12865119
TypeDispatcher/fp64_compute_device/2/16777216/1/manual_time                  +0.0002         +0.0002      25672628      25676687      25694911      25700970
TypeDispatcher/fp64_compute_device/4/16777216/1/manual_time                  -0.0002         -0.0002      51373933      51362816      51394947      51386313
TypeDispatcher/fp64_compute_device/8/16777216/1/manual_time                  +0.0001         +0.0004     102700764     102715538     102720105     102760669
TypeDispatcher/fp64_compute_device/1/33554432/1/manual_time                  -0.0001         -0.0001      25659240      25656055      25681385      25678611
TypeDispatcher/fp64_compute_device/2/33554432/1/manual_time                  -0.0001         -0.0000      51295305      51292380      51316008      51313982
TypeDispatcher/fp64_compute_device/4/33554432/1/manual_time                  -0.0002         -0.0002     102629961     102606409     102648185     102625708
TypeDispatcher/fp64_compute_device/8/33554432/1/manual_time                  -0.0003         -0.0003     205226665     205167949     205235882     205178668
TypeDispatcher/fp64_compute_device/1/67108864/1/manual_time                  -0.0002         -0.0002      51283894      51274459      51304209      51294798
TypeDispatcher/fp64_compute_device/2/67108864/1/manual_time                  -0.0001         -0.0002     102537363     102522293     102555289     102539356
TypeDispatcher/fp64_compute_device/4/67108864/1/manual_time                  -0.0002         -0.0002     205104813     205054293     205115127     205066247
TypeDispatcher/fp64_compute_device/8/67108864/1/manual_time                  -0.0004         -0.0004     411511809     411342338     411503935     411331577
TypeDispatcher/fp64_compute_no/1/1024/1/manual_time                          +0.0025         +0.0335         20813         20864         42617         44045
TypeDispatcher/fp64_compute_no/2/1024/1/manual_time                          -0.0107         +0.0034         37303         36904         59465         59667
TypeDispatcher/fp64_compute_no/4/1024/1/manual_time                          -0.0150         -0.0071         70176         69123         92570         91913
TypeDispatcher/fp64_compute_no/8/1024/1/manual_time                          -0.0085         -0.0032        134998        133848        156899        156400
TypeDispatcher/fp64_compute_no/1/2048/1/manual_time                          -0.0156         +0.0117         21158         20828         43056         43561
TypeDispatcher/fp64_compute_no/2/2048/1/manual_time                          -0.0108         +0.0065         37372         36970         59366         59751
TypeDispatcher/fp64_compute_no/4/2048/1/manual_time                          -0.0063         +0.0026         69998         69557         92241         92476
TypeDispatcher/fp64_compute_no/8/2048/1/manual_time                          -0.0005         +0.0048        135322        135259        157267        158021
TypeDispatcher/fp64_compute_no/1/4096/1/manual_time                          -0.0003         +0.0018         21034         21028         43078         43154
TypeDispatcher/fp64_compute_no/2/4096/1/manual_time                          +0.0037         +0.0056         37490         37628         59579         59910
TypeDispatcher/fp64_compute_no/4/4096/1/manual_time                          -0.0020         +0.0036         70426         70286         92428         92764
TypeDispatcher/fp64_compute_no/8/4096/1/manual_time                          +0.0003         +0.0028        135496        135534        157487        157935
TypeDispatcher/fp64_compute_no/1/8192/1/manual_time                          -0.0038         +0.0021         21526         21443         43742         43834
TypeDispatcher/fp64_compute_no/2/8192/1/manual_time                          -0.0092         +0.0036         37923         37574         59826         60042
TypeDispatcher/fp64_compute_no/4/8192/1/manual_time                          -0.0037         +0.0002         70648         70386         92653         92668
TypeDispatcher/fp64_compute_no/8/8192/1/manual_time                          +0.0002         +0.0053        135941        135963        157725        158568
TypeDispatcher/fp64_compute_no/1/16384/1/manual_time                         -0.0134         +0.0106         21617         21328         43540         44004
TypeDispatcher/fp64_compute_no/2/16384/1/manual_time                         -0.0032         +0.0042         38065         37943         60224         60475
TypeDispatcher/fp64_compute_no/4/16384/1/manual_time                         +0.0025         +0.0074         70879         71059         92828         93517
TypeDispatcher/fp64_compute_no/8/16384/1/manual_time                         +0.0036         +0.0052        136088        136578        158256        159080
TypeDispatcher/fp64_compute_no/1/32768/1/manual_time                         +0.0008         +0.0109         37361         37392         59275         59922
TypeDispatcher/fp64_compute_no/2/32768/1/manual_time                         +0.0020         +0.0025         69341         69480         91552         91784
TypeDispatcher/fp64_compute_no/4/32768/1/manual_time                         +0.0020         +0.0037        131856        132118        153944        154512
TypeDispatcher/fp64_compute_no/8/32768/1/manual_time                         +0.0014         +0.0017        257641        258002        280177        280654
TypeDispatcher/fp64_compute_no/1/65536/1/manual_time                         +0.0021         +0.0078         68839         68986         90696         91401
TypeDispatcher/fp64_compute_no/2/65536/1/manual_time                         +0.0016         +0.0025        131928        132143        153915        154304
TypeDispatcher/fp64_compute_no/4/65536/1/manual_time                         +0.0015         +0.0017        256941        257313        279198        279684
TypeDispatcher/fp64_compute_no/8/65536/1/manual_time                         +0.0019         +0.0020        507822        508772        530173        531211
TypeDispatcher/fp64_compute_no/1/131072/1/manual_time                        +0.0013         +0.0030        116349        116502        138392        138803
TypeDispatcher/fp64_compute_no/2/131072/1/manual_time                        +0.0016         +0.0061        226155        226512        248272        249776
TypeDispatcher/fp64_compute_no/4/131072/1/manual_time                        +0.0020         +0.0022        445121        446014        467608        468653
TypeDispatcher/fp64_compute_no/8/131072/1/manual_time                        +0.0022         +0.0026        883319        885278        905698        908045
TypeDispatcher/fp64_compute_no/1/262144/1/manual_time                        -0.0273         -0.0243        222689        216619        245137        239175
TypeDispatcher/fp64_compute_no/2/262144/1/manual_time                        +0.0003         +0.0006        439049        439164        461515        461815
TypeDispatcher/fp64_compute_no/4/262144/1/manual_time                        +0.0022         +0.0023        820044        821855        842641        844588
TypeDispatcher/fp64_compute_no/8/262144/1/manual_time                        -0.0162         -0.0160       1757935       1729390       1780386       1751955
TypeDispatcher/fp64_compute_no/1/524288/1/manual_time                        +0.0056         +0.0081        414039        416351        436355        439886
TypeDispatcher/fp64_compute_no/2/524288/1/manual_time                        +0.0018         +0.0032        820050        821503        842322        845038
TypeDispatcher/fp64_compute_no/4/524288/1/manual_time                        -0.0207         -0.0200       1743491       1707321       1765533       1730229
TypeDispatcher/fp64_compute_no/8/524288/1/manual_time                        +0.0031         +0.0033       3278963       3288982       3301331       3312177
TypeDispatcher/fp64_compute_no/1/1048576/1/manual_time                       +0.0012         +0.0017        820803        821776        843258        844690
TypeDispatcher/fp64_compute_no/2/1048576/1/manual_time                       +0.0012         +0.0015       1633185       1635129       1655780       1658210
TypeDispatcher/fp64_compute_no/4/1048576/1/manual_time                       +0.0004         +0.0008       3320745       3321963       3343070       3345830
TypeDispatcher/fp64_compute_no/8/1048576/1/manual_time                       +0.0010         +0.0012       6511351       6517578       6533983       6541546
TypeDispatcher/fp64_compute_no/1/2097152/1/manual_time                       +0.0007         -0.0027       1622663       1623736       1652769       1648255
TypeDispatcher/fp64_compute_no/2/2097152/1/manual_time                       +0.0024         +0.0031       3230758       3238353       3253123       3263129
TypeDispatcher/fp64_compute_no/4/2097152/1/manual_time                       +0.0008         +0.0013       6510393       6515683       6532646       6540826
TypeDispatcher/fp64_compute_no/8/2097152/1/manual_time                       +0.0006         +0.0014      13010792      13019022      13032944      13051753
TypeDispatcher/fp64_compute_no/1/4194304/1/manual_time                       +0.0001         +0.0009       3227124       3227464       3249683       3252477
TypeDispatcher/fp64_compute_no/2/4194304/1/manual_time                       +0.0005         +0.0010       6447151       6450326       6469331       6476030
TypeDispatcher/fp64_compute_no/4/4194304/1/manual_time                       +0.0001         +0.0005      12943265      12944175      12964959      12970822
TypeDispatcher/fp64_compute_no/8/4194304/1/manual_time                       +0.0011         +0.0012      25839009      25866657      25859862      25891977
TypeDispatcher/fp64_compute_no/1/8388608/1/manual_time                       -0.0034         -0.0032       6452186       6430504       6474476       6453747
TypeDispatcher/fp64_compute_no/2/8388608/1/manual_time                       -0.0033         -0.0033      12898039      12855903      12921538      12878766
TypeDispatcher/fp64_compute_no/4/8388608/1/manual_time                       -0.0028         -0.0027      25827138      25754093      25849298      25780153
TypeDispatcher/fp64_compute_no/8/8388608/1/manual_time                       -0.0034         -0.0030      51639077      51464558      51661305      51505166
TypeDispatcher/fp64_compute_no/1/16777216/1/manual_time                      -0.0035         -0.0033      12877445      12832489      12900665      12858249
TypeDispatcher/fp64_compute_no/2/16777216/1/manual_time                      -0.0032         -0.0030      25744270      25662085      25766094      25689762
TypeDispatcher/fp64_compute_no/4/16777216/1/manual_time                      -0.0032         -0.0029      51530605      51365230      51547766      51397018
TypeDispatcher/fp64_compute_no/8/16777216/1/manual_time                      -0.0033         -0.0033     103007233     102667704     103023234     102685430
TypeDispatcher/fp64_compute_no/1/33554432/1/manual_time                      -0.0033         -0.0031      25730427      25644866      25752334      25672400
TypeDispatcher/fp64_compute_no/2/33554432/1/manual_time                      -0.0032         -0.0022      51449856      51285358      51470315      51359123
TypeDispatcher/fp64_compute_no/4/33554432/1/manual_time                      -0.0030         -0.0027     102920337     102606848     102937265     102656315
TypeDispatcher/fp64_compute_no/8/33554432/1/manual_time                      -0.0033         -0.0032     205810000     205123926     205823395     205169216
TypeDispatcher/fp64_compute_no/1/67108864/1/manual_time                      -0.0032         -0.0027      51432229      51267950      51454378      51317672
TypeDispatcher/fp64_compute_no/2/67108864/1/manual_time                      -0.0032         -0.0031     102844269     102515715     102860046     102538480
TypeDispatcher/fp64_compute_no/4/67108864/1/manual_time                      -0.0031         -0.0029     205709313     205074092     205718995     205118446
TypeDispatcher/fp64_compute_no/8/67108864/1/manual_time                      -0.0032         -0.0031     411375105     410072058     411362193     410103647

@jrhemstad
Copy link
Contributor

jrhemstad commented Mar 15, 2021

So those results just look like noise to me. Roughly +/- 5%.

I think it's safe to conclude there is no runtime performance impact.

@robertmaynard
Copy link
Contributor Author

How can I verify that fatbin.ld is still required with the reduced binary?

@jrhemstad
Copy link
Contributor

How can I verify that fatbin.ld is still required with the reduced binary?

Try a debug build without it?

@harrism
Copy link
Member

harrism commented Mar 16, 2021

The reduction bench results are a bit concerning -- they look much less like noise than the other benchmarks.

@jrhemstad
Copy link
Contributor

To say more definitively, if you collect 10 iterations of each benchmark, you can use the compare.py to do a t-test to test for statistically significant differences.

@robertmaynard robertmaynard force-pushed the reduce_cudf_binary_size branch from 0ae9aa8 to 83248f3 Compare March 16, 2021 20:06
@robertmaynard
Copy link
Contributor Author

robertmaynard commented Mar 17, 2021

Here is the comparison of running the reduction benchmark 10 times.

https://gist.github.com/robertmaynard/2cfb4ba27afcd7e52b2deccce3b78c29#file-results-txt

Given the size, I have also uploaded the benchmark data so people can use compare.py to filter the results / get nicer output, which can be found at:

@harrism
Copy link
Member

harrism commented Mar 18, 2021

@robertmaynard compare.py is capable of producing human-readable (human-pastable) output, not just JSON, so you don't have to make us go run it ourselves. :)

@robertmaynard
Copy link
Contributor Author

@robertmaynard compare.py is capable of producing human-readable (human-pastable) output, not just JSON, so you don't have to make us go run it ourselves. :)

Here is the raw compare results:
https://gist.githubusercontent.com/robertmaynard/2cfb4ba27afcd7e52b2deccce3b78c29/raw/0a0d3b48e82ab26c2dac5af00e428333e852b8a5/results.txt

@davidwendt
Copy link
Contributor

What units are the time values in for the benchmarks? milliseconds, microseconds, nanoseconds?

@robertmaynard
Copy link
Contributor Author

What units are the time values in for the benchmarks? milliseconds, microseconds, nanoseconds?

Interesting the compare.py output doesn't list the units. The Time and CPU values are both nanoseconds

@kkraus14
Copy link
Collaborator

rerun tests

@harrism harrism added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Mar 22, 2021
@harrism
Copy link
Member

harrism commented Mar 22, 2021

I think this is non-breaking even with the change to release_assert -- breaks should only be internal.

@kkraus14
Copy link
Collaborator

../tests/error/error_handling_test.cu:107
Death test: call_kernel()
    Result: failed to die.
 Error msg:
[  DEATH   ] 

@jrhemstad
Copy link
Contributor

../tests/error/error_handling_test.cu:107
Death test: call_kernel()
    Result: failed to die.
 Error msg:
[  DEATH   ] 

Oh yeah, this test is going to fail since release_assert is no longer present in release builds. The test can either selectively disable NDEBUG or just remove the test all together.

@robertmaynard robertmaynard force-pushed the reduce_cudf_binary_size branch from 83248f3 to 1943e4e Compare March 22, 2021 23:50
@kkraus14
Copy link
Collaborator

@gpucibot merge

@codecov
Copy link

codecov bot commented Mar 23, 2021

Codecov Report

Merging #7583 (1943e4e) into branch-0.19 (7871e7a) will increase coverage by 0.60%.
The diff coverage is 88.20%.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.19    #7583      +/-   ##
===============================================
+ Coverage        81.86%   82.47%   +0.60%     
===============================================
  Files              101      101              
  Lines            16884    17397     +513     
===============================================
+ Hits             13822    14348     +526     
+ Misses            3062     3049      -13     
Impacted Files Coverage Δ
python/cudf/cudf/core/index.py 93.45% <ø> (+0.59%) ⬆️
python/cudf/cudf/core/column/column.py 87.86% <60.00%> (+0.10%) ⬆️
python/cudf/cudf/core/series.py 91.69% <85.33%> (+0.90%) ⬆️
python/cudf/cudf/core/column/numerical.py 94.83% <85.71%> (-0.20%) ⬇️
python/cudf/cudf/core/dataframe.py 90.90% <85.71%> (+0.44%) ⬆️
python/cudf/cudf/core/frame.py 89.23% <88.57%> (+0.21%) ⬆️
python/cudf/cudf/core/column/decimal.py 92.75% <90.32%> (-2.12%) ⬇️
python/cudf/cudf/core/column/string.py 86.79% <100.00%> (+0.30%) ⬆️
python/cudf/cudf/core/column_accessor.py 95.45% <100.00%> (+0.14%) ⬆️
python/cudf/cudf/core/dtypes.py 91.13% <100.00%> (+1.40%) ⬆️
... and 56 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 325d5b8...1943e4e. Read the comment docs.

@rapids-bot rapids-bot bot merged commit 0f8035d into rapidsai:branch-0.19 Mar 23, 2021
@robertmaynard robertmaynard deleted the reduce_cudf_binary_size branch March 23, 2021 16:08
robertmaynard added a commit to robertmaynard/cuml that referenced this pull request Apr 1, 2021
By explicitly telling nvcc's fatbin pass to always compress device code
we can ensure that our binaries are the smallest possible size.

See rapidsai/cudf#7583 for additional context.
robertmaynard added a commit to robertmaynard/cugraph that referenced this pull request Apr 1, 2021
By explicitly telling nvcc's fatbin pass to always compress device code
we can ensure that our binaries are the smallest possible size.

See rapidsai/cudf#7583 for additional context.
rapids-bot bot pushed a commit to rapidsai/cugraph that referenced this pull request Apr 2, 2021
By explicitly telling nvcc's fatbin pass to always compress device code
we can ensure that our binaries are the smallest possible size.

See rapidsai/cudf#7583 for additional context.

Authors:
  - Robert Maynard (https://github.com/robertmaynard)

Approvers:
  - Rick Ratzel (https://github.com/rlratzel)
  - Brad Rees (https://github.com/BradReesWork)

URL: #1503
rapids-bot bot pushed a commit to rapidsai/cuml that referenced this pull request Apr 5, 2021
By explicitly telling nvcc's fatbin pass to always compress device code we can ensure that our binaries are the smallest possible size.

See rapidsai/cudf#7583 for additional context.

Authors:
  - Robert Maynard (https://github.com/robertmaynard)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: #3702
rapids-bot bot pushed a commit to rapidsai/cuspatial that referenced this pull request Apr 8, 2021
Decrease library size by using fatbin compression.

cuDF PR for context: rapidsai/cudf#7583

Authors:
  - Dillon Cullinan (https://github.com/dillon-cullinan)

Approvers:
  - Paul Taylor (https://github.com/trxcllnt)
  - Keith Kraus (https://github.com/kkraus14)

URL: #373
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake CMake build issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants