From 0f8035d68cfe45fb8da7cc1bca9b71e35aa05fe4 Mon Sep 17 00:00:00 2001 From: Robert Maynard Date: Mon, 22 Mar 2021 23:12:48 -0400 Subject: [PATCH] Reduce cudf library size (#7583) This PR combines two major changes that improve cudf's final binary size. The optimizations are: - Explicitly telling fatbin to always compress device code - Not generating asserts with `__FILE__`, `__LINE__`, etc when in release mode. With this PR we take cudf when built for `ALL` archs from `1.3GB` to `329MB`. Since this has such a dramatic change, I did some analysis on library sizes, compile times, runtime startup times, and runtime performance. ### Library sizes | variant | release 1 arch | release all archs | | -- | -- | -- | | branch-0.19 | 335MB | 1.3GB | | compression | 133MB | 476MB | | compression + no_assert | 110MB | 329MB | | rdc + comp + no_assert | 150MB | Not done | We see that compression and the removal of the asserts while staying with whole compilation ( no `-rdc` ) is best for performance. RDC taking more space makes sense to me as it has a less agressive optimizer and therefore generates more sass/ptx. ### Compile times | variant | release 1 arch | release all archs | | -- | -- | -- | | branch-0.19 | 15m50.411s | 44m12.849s | | compression | 15m14.021s | 44m11.658s | | compression + no_assert | 12m53.049s | 33m59.388s | | rdc + comp + no_assert | 12m58.550s | Not done | Compile times are interesting as we see that compression has no negative effect. I expect that the reduced IO ( from smaller binaries ) offsets the time spent compressing. ## Runtime starup times for `all archs`: | variant | branch-0.19 | compression + no_assert | | -- | -- | -- | | ctest -j1 | 4m0.326s | 3m57.500s | | DISPATCHER_TEST * 12 | 0m3.441s | 0m3.520s | The goal here was to time the tests to see if we had any measurable differences. The `DISPATCHER_TEST * 12` was selected as the runtime `~0.3s` would hopefully highligh any performance differences, and therefore running it 12 times would magnify the difference and wash out any system load impacts. I think from these rather crude measurements we are safe to assume no massive runtime loading time costs. ## Runtime performance for `all archs`: I used the REDUCTION_BENCH as my baseline for detecting any performance changes. The benchmarks executed on a V100 on a shared lab machine.
REDUCTION_BENCH ./compare benchmarks branch_0.19 this_compression_branch ``` Comparing ./reduction_0.19 to ./reduction_compressed Benchmark Time CPU Time Old Time New CPU Old CPU New ----------------------------------------------------------------------------------------------------------------------------------------------------- Reduction/bool_all/10000/manual_time +0.1038 +0.0785 16621 18346 37366 40299 Reduction/bool_all/100000/manual_time +0.0969 +0.0779 17057 18710 37738 40676 Reduction/bool_all/1000000/manual_time +0.0709 +0.0694 18175 19463 38256 40913 Reduction/bool_all/10000000/manual_time +0.0472 +0.0479 35489 37165 53792 56368 Reduction/bool_all/100000000/manual_time +0.0026 +0.0120 145630 146014 164349 166325 Reduction/int8_t_all/10000/manual_time +0.1415 +0.1098 16286 18591 36729 40762 Reduction/int8_t_all/100000/manual_time +0.0457 +0.0500 17180 17966 37798 39688 Reduction/int8_t_all/1000000/manual_time +0.0656 +0.0565 18297 19498 38522 40700 Reduction/int8_t_all/10000000/manual_time +0.0322 +0.0345 36117 37279 54645 56528 Reduction/int8_t_all/100000000/manual_time +0.0041 +0.0132 146222 146819 165053 167232 Reduction/int32_t_all/10000/manual_time +0.0931 +0.0781 16849 18417 37471 40397 Reduction/int32_t_all/100000/manual_time +0.0841 +0.0775 17371 18831 37772 40701 Reduction/int32_t_all/1000000/manual_time +0.0660 +0.0709 22323 23797 41633 44584 Reduction/int32_t_all/10000000/manual_time +0.0171 +0.0324 68303 69470 87006 89822 Reduction/int32_t_all/100000000/manual_time +0.0028 +0.0065 472041 473370 490693 493869 Reduction/float_all/10000/manual_time +0.0847 +0.0734 16951 18387 37708 40474 Reduction/float_all/100000/manual_time +0.0962 +0.0880 16939 18569 37229 40504 Reduction/float_all/1000000/manual_time +0.1879 +0.1679 22348 26547 41549 48526 Reduction/float_all/10000000/manual_time +0.0465 +0.0652 68851 72054 87438 93139 Reduction/float_all/100000000/manual_time +0.0081 +0.0126 472210 476037 491030 497241 Reduction/bool_any/10000/manual_time +0.0764 +0.0743 16292 17537 36830 39568 Reduction/bool_any/100000/manual_time +0.0568 +0.0591 16633 17578 37197 39394 Reduction/bool_any/1000000/manual_time +0.0771 +0.0593 18024 19414 38439 40720 Reduction/bool_any/10000000/manual_time +0.0388 +0.0452 34990 36347 53210 55614 Reduction/bool_any/100000000/manual_time +0.0012 +0.0096 145729 145903 164726 166315 Reduction/int8_t_any/10000/manual_time +0.1207 +0.0954 15686 17579 36213 39669 Reduction/int8_t_any/100000/manual_time +0.0546 +0.0580 16597 17503 37103 39254 Reduction/int8_t_any/1000000/manual_time +0.0801 +0.0669 17938 19374 38027 40572 Reduction/int8_t_any/10000000/manual_time +0.0492 +0.0541 34726 36434 52812 55671 Reduction/int8_t_any/100000000/manual_time +0.0028 +0.0119 145924 146340 164711 166670 Reduction/int32_t_any/10000/manual_time +0.0489 +0.0539 16805 17626 37587 39614 Reduction/int32_t_any/100000/manual_time +0.0824 +0.0717 16985 18384 37432 40114 Reduction/int32_t_any/1000000/manual_time +0.0588 +0.0590 22338 23652 41842 44311 Reduction/int32_t_any/10000000/manual_time +0.0203 +0.0339 67672 69045 86353 89280 Reduction/int32_t_any/100000000/manual_time +0.0023 +0.0054 471708 472770 490461 493130 Reduction/float_any/10000/manual_time +0.0454 +0.0494 16730 17490 37526 39382 Reduction/float_any/100000/manual_time +0.0895 +0.0739 16872 18381 37349 40110 Reduction/float_any/1000000/manual_time +0.0554 +0.0527 22402 23643 42014 44228 Reduction/float_any/10000000/manual_time +0.0026 +0.0081 68877 69060 88443 89159 Reduction/float_any/100000000/manual_time +0.0022 +0.0049 471781 472801 490753 493170 ReductionDictionary/int32_t_all/10000/manual_time +0.0503 +0.0535 30648 32190 51412 54160 ReductionDictionary/int32_t_all/100000/manual_time +0.0696 +0.0642 31076 33240 51545 54855 ReductionDictionary/int32_t_all/1000000/manual_time +0.9359 +0.4786 38709 74935 58597 86644 ReductionDictionary/int32_t_all/10000000/manual_time +0.0289 +0.0625 95351 98111 113516 120611 ReductionDictionary/int32_t_all/100000000/manual_time -0.0963 -0.0910 645050 582915 663356 602971 ReductionDictionary/float_all/10000/manual_time +0.0476 +0.0504 30177 31615 50814 53374 ReductionDictionary/float_all/100000/manual_time -0.0305 -0.0166 34208 33166 55546 54626 ReductionDictionary/float_all/1000000/manual_time -0.0258 -0.0137 40372 39329 61080 60242 ReductionDictionary/float_all/10000000/manual_time -0.0366 -0.0210 95681 92175 113937 111545 ReductionDictionary/float_all/100000000/manual_time -0.0784 -0.0733 644233 593744 662793 614231 ReductionDictionary/int32_t_any/10000/manual_time +0.0718 +0.0680 29882 32026 50723 54172 ReductionDictionary/int32_t_any/100000/manual_time +0.0776 +0.0704 30316 32670 50640 54203 ReductionDictionary/int32_t_any/1000000/manual_time +0.0291 +0.0372 37311 38397 57039 59159 ReductionDictionary/int32_t_any/10000000/manual_time -0.0497 -0.0285 95928 91163 113924 110675 ReductionDictionary/int32_t_any/100000000/manual_time -0.1160 -0.1109 655803 579715 673920 599181 ReductionDictionary/float_any/10000/manual_time +0.0561 +0.0563 29905 31583 50566 53413 ReductionDictionary/float_any/100000/manual_time +0.0622 +0.0581 30431 32322 50788 53740 ReductionDictionary/float_any/1000000/manual_time +0.0403 +0.0473 36955 38444 56562 59237 ReductionDictionary/float_any/10000000/manual_time -0.0626 -0.0421 96949 90881 115099 110252 ReductionDictionary/float_any/100000000/manual_time -0.1203 -0.1154 661222 581658 679363 600965 ReductionDictionary/int32_t_min/10000/manual_time +0.0539 +0.0525 45796 48265 66802 70306 ReductionDictionary/int32_t_min/100000/manual_time +0.0955 +0.0868 46170 50579 66882 72687 ReductionDictionary/int32_t_min/1000000/manual_time +0.0683 +0.0647 47495 50738 67291 71646 ReductionDictionary/int32_t_min/10000000/manual_time +0.0065 +0.0160 97469 98105 115997 117850 ReductionDictionary/int32_t_min/100000000/manual_time +0.0045 +0.0070 501086 503357 519638 523253 ReductionDictionary/float_min/10000/manual_time +0.0721 +0.0653 45551 48834 66485 70829 ReductionDictionary/float_min/100000/manual_time +0.0966 +0.0865 44953 49295 65457 71120 ReductionDictionary/float_min/1000000/manual_time +0.0618 +0.0582 47227 50145 67004 70900 ReductionDictionary/float_min/10000000/manual_time +0.0052 +0.0167 97166 97669 115559 117490 ReductionDictionary/float_min/100000000/manual_time +0.0060 +0.0083 501044 504029 519674 523988 ReductionDictionary/int32_t_max/10000/manual_time +0.0596 +0.0558 44987 47667 65991 69670 ReductionDictionary/int32_t_max/100000/manual_time +0.0735 +0.0625 44640 47921 65423 69515 ReductionDictionary/int32_t_max/1000000/manual_time +0.0610 +0.0569 46224 49042 65957 69707 ReductionDictionary/int32_t_max/10000000/manual_time -0.0100 -0.0009 97078 96105 115787 115685 ReductionDictionary/int32_t_max/100000000/manual_time +0.0021 +0.0041 500466 501504 518959 521082 ReductionDictionary/float_max/10000/manual_time +0.0815 +0.0716 44141 47738 65134 69798 ReductionDictionary/float_max/100000/manual_time +0.0481 +0.0466 46378 48609 67284 70422 ReductionDictionary/float_max/1000000/manual_time +0.0091 +0.0180 48140 48579 67837 69055 ReductionDictionary/float_max/10000000/manual_time -0.0000 +0.0063 96823 96820 115498 116228 ReductionDictionary/float_max/100000000/manual_time +0.0044 +0.0061 501050 503259 519938 523126 ReductionDictionary/int32_t_mean/10000/manual_time -0.0259 -0.0018 45053 43885 66162 66041 ReductionDictionary/int32_t_mean/100000/manual_time +0.0448 +0.0484 44776 46782 65662 68843 ReductionDictionary/int32_t_mean/1000000/manual_time +0.0134 +0.0263 52069 52766 72163 74061 ReductionDictionary/int32_t_mean/10000000/manual_time -0.1592 -0.1321 135248 113712 153825 133504 ReductionDictionary/int32_t_mean/100000000/manual_time -0.2259 -0.2180 856793 663232 875288 684472 ReductionDictionary/float_mean/10000/manual_time +0.1263 +0.1185 44256 49847 65389 73139 ReductionDictionary/float_mean/100000/manual_time +0.1304 +0.1231 44038 49780 64743 72713 ReductionDictionary/float_mean/1000000/manual_time +0.0792 +0.0887 51678 55771 71756 78124 ReductionDictionary/float_mean/10000000/manual_time -0.1397 -0.1058 136928 117802 155376 138931 ReductionDictionary/float_mean/100000000/manual_time -0.2322 -0.2256 861014 661106 879508 681082 Reduction/bool_sum/10000/manual_time +0.1048 +0.0881 16256 17960 36958 40216 Reduction/bool_sum/100000/manual_time +0.0424 +0.0473 17106 17832 37986 39781 Reduction/bool_sum/1000000/manual_time +0.0837 +0.0620 17928 19428 38311 40688 Reduction/bool_sum/10000000/manual_time +0.0408 +0.0395 34927 36352 53394 55502 Reduction/bool_sum/100000000/manual_time +0.0030 +0.0102 145074 145516 164126 165798 Reduction/int8_t_sum/10000/manual_time +0.1105 +0.0832 15704 17438 36410 39440 Reduction/int8_t_sum/100000/manual_time +0.0694 +0.0629 16539 17688 37233 39575 Reduction/int8_t_sum/1000000/manual_time +0.0783 +0.0574 17880 19279 38316 40517 Reduction/int8_t_sum/10000000/manual_time +0.0416 +0.0406 34809 36256 53305 55467 Reduction/int8_t_sum/100000000/manual_time +0.0010 +0.0098 144214 144358 163105 164700 Reduction/int32_t_sum/10000/manual_time +0.0074 +0.0264 17459 17588 38551 39570 Reduction/int32_t_sum/100000/manual_time +0.0791 +0.0684 17086 18438 37705 40286 Reduction/int32_t_sum/1000000/manual_time -0.0064 -0.0027 23751 23599 44384 44263 Reduction/int32_t_sum/10000000/manual_time +0.0238 +0.0358 67606 69212 86499 89593 Reduction/int32_t_sum/100000000/manual_time +0.0002 -0.0043 472981 473093 495882 493743 Reduction/int64_t_sum/10000/manual_time +0.0340 +0.0383 17246 17833 38306 39772 Reduction/int64_t_sum/100000/manual_time +0.0755 +0.0719 17841 19188 38242 40990 Reduction/int64_t_sum/1000000/manual_time +0.0576 +0.0635 28702 30354 47263 50266 Reduction/int64_t_sum/10000000/manual_time +0.0119 +0.0141 112911 114253 132783 134658 Reduction/int64_t_sum/100000000/manual_time +0.0013 +0.0021 920797 921986 940549 942485 Reduction/float_sum/10000/manual_time +0.0882 +0.0707 16479 17932 37421 40067 Reduction/float_sum/100000/manual_time +0.0903 +0.0774 16677 18183 37163 40041 Reduction/float_sum/1000000/manual_time +0.0571 +0.0622 22393 23673 41960 44569 Reduction/float_sum/10000000/manual_time +0.0179 +0.0345 67968 69184 86691 89678 Reduction/float_sum/100000000/manual_time +0.0015 -0.0015 472247 472973 494236 493474 Reduction/double_sum/10000/manual_time +0.0982 +0.0794 16036 17611 36657 39568 Reduction/double_sum/100000/manual_time +0.0153 +0.0340 18453 18735 38955 40281 Reduction/double_sum/1000000/manual_time +0.0596 +0.0636 28623 30328 47153 50154 Reduction/double_sum/10000000/manual_time +0.0089 +0.0130 113426 114441 132746 134476 Reduction/double_sum/100000000/manual_time +0.0026 +0.0044 920131 922540 939374 943549 Reduction/int32_t_product/10000/manual_time +0.1183 +0.0902 16570 18529 37443 40819 Reduction/int32_t_product/100000/manual_time +0.1097 +0.0905 17076 18948 37741 41158 Reduction/int32_t_product/1000000/manual_time +0.0655 +0.0622 22023 23465 41604 44194 Reduction/int32_t_product/10000000/manual_time +0.0237 +0.0374 67629 69233 86354 89583 Reduction/int32_t_product/100000000/manual_time +0.0003 +0.0024 473110 473265 492461 493666 Reduction/float_product/10000/manual_time +0.0839 +0.0615 16972 18395 38034 40374 Reduction/float_product/100000/manual_time +0.0787 +0.0718 17091 18436 37602 40302 Reduction/float_product/1000000/manual_time +0.0686 +0.0714 22282 23810 41856 44844 Reduction/float_product/10000000/manual_time +0.0135 +0.0191 68313 69233 87798 89474 Reduction/float_product/100000000/manual_time +0.0016 -0.0034 472681 473450 495621 493925 Reduction/int64_t_min/10000/manual_time +0.0885 +0.0697 16886 18380 37763 40393 Reduction/int64_t_min/100000/manual_time +0.0490 +0.0534 17988 18869 38269 40311 Reduction/int64_t_min/1000000/manual_time +0.0087 +0.0218 30355 30618 49407 50485 Reduction/int64_t_min/10000000/manual_time +0.0092 +0.0179 113906 114949 132977 135352 Reduction/int64_t_min/100000000/manual_time +0.0011 +0.0020 921089 922086 940543 942460 Reduction/double_min/10000/manual_time +0.0818 +0.0662 16848 18226 37674 40169 Reduction/double_min/100000/manual_time +0.0839 +0.0761 17470 18936 37537 40394 Reduction/double_min/1000000/manual_time +0.0582 +0.0629 28967 30653 47503 50490 Reduction/double_min/10000000/manual_time +0.0135 +0.0246 113413 114940 132250 135504 Reduction/double_min/100000000/manual_time +0.0017 +0.0027 920812 922344 940237 942769 Reduction/timestamp_ms_min/10000/manual_time +0.0910 +0.0673 16926 18466 37673 40210 Reduction/timestamp_ms_min/100000/manual_time +0.0830 +0.0738 17894 19379 38112 40925 Reduction/timestamp_ms_min/1000000/manual_time +0.0339 +0.0429 30113 31133 48915 51012 Reduction/timestamp_ms_min/10000000/manual_time +0.0109 +0.0189 114185 115433 133132 135650 Reduction/timestamp_ms_min/100000000/manual_time +0.0053 +0.0076 920924 925845 940096 947215 Reduction/int8_t_mean/10000/manual_time +0.0489 +0.0517 29025 30445 50078 52670 Reduction/int8_t_mean/100000/manual_time +0.0401 +0.0493 29218 30389 50050 52515 Reduction/int8_t_mean/1000000/manual_time -0.0215 -0.0036 33316 32600 54162 53965 Reduction/int8_t_mean/10000000/manual_time +0.0387 +0.0407 46726 48536 64936 67581 Reduction/int8_t_mean/100000000/manual_time +0.0007 +0.0058 161956 162077 180866 181918 Reduction/float_mean/10000/manual_time +0.0916 +0.0827 28959 31611 49748 53860 Reduction/float_mean/100000/manual_time +0.0369 +0.0485 30720 31854 51489 53988 Reduction/float_mean/1000000/manual_time +0.0218 +0.0310 36650 37448 56777 58539 Reduction/float_mean/10000000/manual_time +0.0266 +0.0402 81361 83528 99648 103653 Reduction/float_mean/100000000/manual_time -0.0036 -0.0003 489644 487869 508151 507999 Reduction/int32_t_variance/10000/manual_time +0.0337 +0.0343 29718 30718 50967 52713 Reduction/int32_t_variance/100000/manual_time -0.0506 -0.0189 33774 32065 55094 54054 Reduction/int32_t_variance/1000000/manual_time +0.0364 +0.0316 36062 37374 56511 58297 Reduction/int32_t_variance/10000000/manual_time +0.0107 +0.0197 83950 84849 102507 104529 Reduction/int32_t_variance/100000000/manual_time -0.0003 +0.0018 491601 491463 510405 511323 Reduction/double_variance/10000/manual_time +0.0536 +0.0457 29157 30721 50266 52565 Reduction/double_variance/100000/manual_time +0.0367 +0.0415 30552 31674 51014 53130 Reduction/double_variance/1000000/manual_time +0.0238 +0.0334 43737 44780 62726 64821 Reduction/double_variance/10000000/manual_time +0.0046 +0.0084 129382 129980 148232 149476 Reduction/double_variance/100000000/manual_time +0.0027 +0.0042 934085 936595 952680 956708 Reduction/int64_t_std/10000/manual_time +0.0638 +0.0592 28956 30803 49847 52798 Reduction/int64_t_std/100000/manual_time -0.0742 -0.0356 34976 32380 56179 54180 Reduction/int64_t_std/1000000/manual_time -0.1048 -0.0836 49651 44446 70034 64177 Reduction/int64_t_std/10000000/manual_time +0.0077 +0.0118 129373 130373 148033 149784 Reduction/int64_t_std/100000000/manual_time +0.0025 +0.0037 934055 936408 952668 956168 Reduction/float_std/10000/manual_time +0.0754 +0.0699 28573 30728 49316 52764 Reduction/float_std/100000/manual_time +0.0945 +0.0846 28682 31394 49008 53155 Reduction/float_std/1000000/manual_time +0.0356 +0.0437 36195 37485 56074 58526 Reduction/float_std/10000000/manual_time +0.0175 +0.0292 83490 84951 101651 104623 Reduction/float_std/100000000/manual_time +0.0018 +0.0049 490773 491670 509017 511496 ReductionScan/int8_no_nulls/10000/manual_time +0.0523 +0.0511 15603 16420 36157 38005 ReductionScan/int8_no_nulls/100000/manual_time +0.0181 +0.0295 17358 17672 37950 39070 ReductionScan/int8_no_nulls/1000000/manual_time +0.0251 +0.0390 26876 27550 45671 47451 ReductionScan/int8_no_nulls/10000000/manual_time +0.0166 +0.0278 89319 90800 109601 112649 ReductionScan/int8_no_nulls/100000000/manual_time +0.0007 +0.0019 735210 735689 756198 757634 ReductionScan/int32_no_nulls/10000/manual_time +0.0515 +0.0559 15234 16019 35716 37712 ReductionScan/int32_no_nulls/100000/manual_time +0.0593 +0.0583 16808 17805 36980 39134 ReductionScan/int32_no_nulls/1000000/manual_time +0.0390 +0.0469 26405 27436 44665 46759 ReductionScan/int32_no_nulls/10000000/manual_time +0.0103 +0.0196 114948 116137 135580 138232 ReductionScan/int32_no_nulls/100000000/manual_time +0.0032 +0.0056 1016197 1019401 1037281 1043090 ReductionScan/uint64_no_nulls/10000/manual_time +0.0162 +0.0308 16222 16485 36939 38077 ReductionScan/uint64_no_nulls/100000/manual_time +0.0588 +0.0685 18751 19852 38174 40789 ReductionScan/uint64_no_nulls/1000000/manual_time +0.0444 +0.0604 36339 37953 56819 60253 ReductionScan/uint64_no_nulls/10000000/manual_time -0.0072 +0.0016 217943 216376 238710 239081 ReductionScan/uint64_no_nulls/100000000/manual_time -0.0009 -0.0007 2001645 1999848 2023397 2021947 ReductionScan/float_no_nulls/10000/manual_time +0.0298 +0.0399 15775 16246 36567 38028 ReductionScan/float_no_nulls/100000/manual_time +0.0534 +0.0505 16888 17789 37162 39037 ReductionScan/float_no_nulls/1000000/manual_time +0.0339 +0.0416 26655 27558 45083 46960 ReductionScan/float_no_nulls/10000000/manual_time +0.0071 +0.0155 115565 116383 136602 138716 ReductionScan/float_no_nulls/100000000/manual_time -0.0019 -0.0006 1019440 1017475 1040591 1040016 ReductionScan/int16_nulls/10000/manual_time +0.0510 +0.0442 35588 37402 57805 60363 ReductionScan/int16_nulls/100000/manual_time +0.0442 +0.0450 37789 39458 59383 62055 ReductionScan/int16_nulls/1000000/manual_time -0.0270 -0.0174 48836 47519 69731 68521 ReductionScan/int16_nulls/10000000/manual_time -0.0046 +0.0034 122849 122281 145028 145520 ReductionScan/int16_nulls/100000000/manual_time -0.0025 -0.0107 864450 862291 892772 883246 ReductionScan/uint32_nulls/10000/manual_time -0.0178 -0.0017 37685 37016 59988 59883 ReductionScan/uint32_nulls/100000/manual_time +0.0478 +0.0494 37349 39134 58805 61708 ReductionScan/uint32_nulls/1000000/manual_time +0.0085 +0.0130 49350 49771 69758 70663 ReductionScan/uint32_nulls/10000000/manual_time +0.0013 +0.0086 146704 146896 169127 170574 ReductionScan/uint32_nulls/100000000/manual_time +0.0085 +0.0111 1112967 1122427 1133374 1145988 ReductionScan/double_nulls/10000/manual_time +0.0623 +0.0611 35538 37753 57332 60833 ReductionScan/double_nulls/100000/manual_time +0.0602 +0.0635 38508 40827 59025 62775 ReductionScan/double_nulls/1000000/manual_time +0.0202 +0.0275 55522 56645 77436 79567 ReductionScan/double_nulls/10000000/manual_time +0.0084 +0.0118 243996 246039 265992 269142 ReductionScan/double_nulls/100000000/manual_time -0.0013 -0.0008 2120668 2117888 2140996 2139376 Reduction/bool_minmax/10000/manual_time -0.1377 -0.0650 22410 19325 43937 41081 Reduction/bool_minmax/100000/manual_time +0.0523 +0.0537 18947 19937 39665 41795 Reduction/bool_minmax/1000000/manual_time +0.0734 +0.0617 19146 20551 39413 41846 Reduction/bool_minmax/10000000/manual_time +0.0358 +0.0438 37836 39189 56031 58483 Reduction/bool_minmax/100000000/manual_time +0.0000 -0.0063 149637 149641 170798 169728 Reduction/int8_t_minmax/10000/manual_time +0.0361 +0.0438 18831 19512 39671 41408 Reduction/int8_t_minmax/100000/manual_time +0.0629 +0.0571 18429 19588 39093 41327 Reduction/int8_t_minmax/1000000/manual_time +0.0874 +0.0696 18956 20613 39054 41772 Reduction/int8_t_minmax/10000000/manual_time +0.0369 +0.0413 37781 39175 55998 58312 Reduction/int8_t_minmax/100000000/manual_time -0.0002 +0.0059 149816 149783 168898 169894 Reduction/int32_t_minmax/10000/manual_time +0.0813 +0.0613 18423 19921 39271 41676 Reduction/int32_t_minmax/100000/manual_time +0.0848 +0.0668 18560 20133 39161 41779 Reduction/int32_t_minmax/1000000/manual_time +0.0630 +0.0610 23265 24730 42732 45337 Reduction/int32_t_minmax/10000000/manual_time +0.0107 +0.0196 69505 70248 88260 89989 Reduction/int32_t_minmax/100000000/manual_time -0.0004 +0.0009 474594 474410 493944 494395 Reduction/timestamp_ms_minmax/10000/manual_time +0.0716 +0.0605 19241 20618 40084 42509 Reduction/timestamp_ms_minmax/100000/manual_time +0.0418 +0.0513 19638 20459 39825 41867 Reduction/timestamp_ms_minmax/1000000/manual_time +0.0559 +0.0635 30346 32042 48851 51954 Reduction/timestamp_ms_minmax/10000000/manual_time +0.0030 +0.0025 117085 117437 136688 137033 Reduction/timestamp_ms_minmax/100000000/manual_time +0.0010 -0.0037 921557 922453 945924 942449 Reduction/float_minmax/10000/manual_time +0.0650 +0.0521 18507 19710 39467 41526 Reduction/float_minmax/100000/manual_time +0.0740 +0.0619 18504 19874 39105 41525 Reduction/float_minmax/1000000/manual_time +0.0499 +0.0373 23385 24552 43561 45185 Reduction/float_minmax/10000000/manual_time +0.0163 +0.0314 69400 70529 88043 90807 Reduction/float_minmax/100000000/manual_time +0.0000 +0.0025 474346 474361 493215 494461 ```
It looks like the compression has a minor cost? I would have thought the removal of the `release_assert` would have improved performance. But maybe `reduce` isn't using it? Authors: - Robert Maynard (@robertmaynard) Approvers: - Keith Kraus (@kkraus14) - Mark Harris (@harrism) - Jake Hemstad (@jrhemstad) URL: https://github.com/rapidsai/cudf/pull/7583 --- cpp/cmake/Modules/ConfigureCUDA.cmake | 3 +++ cpp/include/cudf/ast/detail/operators.hpp | 12 ++++++------ cpp/include/cudf/ast/detail/transform.cuh | 8 ++++---- cpp/include/cudf/column/column_device_view.cuh | 2 +- .../cudf/detail/aggregation/aggregation.cuh | 4 ++-- .../cudf/detail/aggregation/aggregation.hpp | 4 ++-- cpp/include/cudf/detail/gather.cuh | 2 +- cpp/include/cudf/detail/indexalator.cuh | 4 ++-- .../utilities/{release_assert.cuh => assert.cuh} | 6 +++--- .../cudf/detail/utilities/hash_functions.cuh | 14 +++++++------- cpp/include/cudf/fixed_point/fixed_point.hpp | 4 ++-- cpp/include/cudf/lists/list_device_view.cuh | 14 +++++++------- cpp/include/cudf/table/row_operators.cuh | 6 +++--- cpp/include/cudf/utilities/type_dispatcher.hpp | 4 ++-- cpp/src/groupby/hash/multi_pass_kernels.cuh | 4 ++-- cpp/src/io/parquet/page_data.cu | 2 +- cpp/src/quantiles/quantiles_util.hpp | 8 ++++---- cpp/tests/error/error_handling_test.cu | 14 +++++++++----- 18 files changed, 61 insertions(+), 54 deletions(-) rename cpp/include/cudf/detail/utilities/{release_assert.cuh => assert.cuh} (87%) diff --git a/cpp/cmake/Modules/ConfigureCUDA.cmake b/cpp/cmake/Modules/ConfigureCUDA.cmake index d4be6e65021..142bb84f413 100644 --- a/cpp/cmake/Modules/ConfigureCUDA.cmake +++ b/cpp/cmake/Modules/ConfigureCUDA.cmake @@ -46,6 +46,9 @@ if(DISABLE_DEPRECATION_WARNING) list(APPEND CUDF_CUDA_FLAGS -Xcompiler=-Wno-deprecated-declarations) endif() +# make sure we produce smallest binary size +list(APPEND CUDF_CUDA_FLAGS -Xfatbin=-compress-all) + # Option to enable line info in CUDA device compilation to allow introspection when profiling / memchecking if(CUDA_ENABLE_LINEINFO) list(APPEND CUDF_CUDA_FLAGS -lineinfo) diff --git a/cpp/include/cudf/ast/detail/operators.hpp b/cpp/include/cudf/ast/detail/operators.hpp index 8ec26cf5eb7..27bcb0d320b 100644 --- a/cpp/include/cudf/ast/detail/operators.hpp +++ b/cpp/include/cudf/ast/detail/operators.hpp @@ -187,7 +187,7 @@ CUDA_HOST_DEVICE_CALLABLE constexpr void ast_operator_dispatcher(ast_operator op #ifndef __CUDA_ARCH__ CUDF_FAIL("Invalid operator."); #else - release_assert(false && "Invalid operator."); + cudf_assert(false && "Invalid operator."); #endif break; } @@ -784,7 +784,7 @@ struct double_dispatch_binary_operator_types { #ifndef __CUDA_ARCH__ CUDF_FAIL("Invalid binary operation."); #else - release_assert(false && "Invalid binary operation."); + cudf_assert(false && "Invalid binary operation."); #endif } }; @@ -819,7 +819,7 @@ struct single_dispatch_binary_operator_types { #ifndef __CUDA_ARCH__ CUDF_FAIL("Invalid binary operation."); #else - release_assert(false && "Invalid binary operation."); + cudf_assert(false && "Invalid binary operation."); #endif } }; @@ -924,7 +924,7 @@ struct dispatch_unary_operator_types { #ifndef __CUDA_ARCH__ CUDF_FAIL("Invalid unary operation."); #else - release_assert(false && "Invalid unary operation."); + cudf_assert(false && "Invalid unary operation."); #endif } }; @@ -996,7 +996,7 @@ struct return_type_functor { #ifndef __CUDA_ARCH__ CUDF_FAIL("Invalid binary operation. Return type cannot be determined."); #else - release_assert(false && "Invalid binary operation. Return type cannot be determined."); + cudf_assert(false && "Invalid binary operation. Return type cannot be determined."); #endif } @@ -1024,7 +1024,7 @@ struct return_type_functor { #ifndef __CUDA_ARCH__ CUDF_FAIL("Invalid unary operation. Return type cannot be determined."); #else - release_assert(false && "Invalid unary operation. Return type cannot be determined."); + cudf_assert(false && "Invalid unary operation. Return type cannot be determined."); #endif } }; diff --git a/cpp/include/cudf/ast/detail/transform.cuh b/cpp/include/cudf/ast/detail/transform.cuh index ee08742d871..2719a8b5077 100644 --- a/cpp/include/cudf/ast/detail/transform.cuh +++ b/cpp/include/cudf/ast/detail/transform.cuh @@ -87,7 +87,7 @@ struct unary_row_output : public row_output { Input input, detail::device_data_reference output) const { - release_assert(false && "Invalid unary dispatch operator for the provided input."); + cudf_assert(false && "Invalid unary dispatch operator for the provided input."); } }; @@ -116,7 +116,7 @@ struct binary_row_output : public row_output { RHS rhs, detail::device_data_reference output) const { - release_assert(false && "Invalid binary dispatch operator for the provided input."); + cudf_assert(false && "Invalid binary dispatch operator for the provided input."); } }; @@ -239,7 +239,7 @@ struct row_evaluator { detail::device_data_reference rhs, detail::device_data_reference output) const { - release_assert(false && "Invalid binary dispatch operator for the provided input."); + cudf_assert(false && "Invalid binary dispatch operator for the provided input."); } private: @@ -311,7 +311,7 @@ __device__ void evaluate_row_expression(detail::row_evaluator const& evaluator, output, op); } else { - release_assert(false && "Invalid operator arity."); + cudf_assert(false && "Invalid operator arity."); } } } diff --git a/cpp/include/cudf/column/column_device_view.cuh b/cpp/include/cudf/column/column_device_view.cuh index b2f152180b0..5a02f5bbe55 100644 --- a/cpp/include/cudf/column/column_device_view.cuh +++ b/cpp/include/cudf/column/column_device_view.cuh @@ -774,7 +774,7 @@ struct index_element_fn { std::is_unsigned::value)>* = nullptr> __device__ size_type operator()(Args&&... args) { - release_assert(false and "dictionary indices must be an unsigned integral type"); + cudf_assert(false and "dictionary indices must be an unsigned integral type"); return 0; } }; diff --git a/cpp/include/cudf/detail/aggregation/aggregation.cuh b/cpp/include/cudf/detail/aggregation/aggregation.cuh index 3d006449044..3f5f5a91632 100644 --- a/cpp/include/cudf/detail/aggregation/aggregation.cuh +++ b/cpp/include/cudf/detail/aggregation/aggregation.cuh @@ -19,8 +19,8 @@ #include #include #include +#include #include -#include #include #include @@ -103,7 +103,7 @@ struct update_target_element { column_device_view source, size_type source_index) const noexcept { - release_assert(false and "Invalid source type and aggregation combination."); + cudf_assert(false and "Invalid source type and aggregation combination."); } }; diff --git a/cpp/include/cudf/detail/aggregation/aggregation.hpp b/cpp/include/cudf/detail/aggregation/aggregation.hpp index 1cafad25c9c..1a4847dad12 100644 --- a/cpp/include/cudf/detail/aggregation/aggregation.hpp +++ b/cpp/include/cudf/detail/aggregation/aggregation.hpp @@ -17,7 +17,7 @@ #pragma once #include -#include +#include #include #include #include @@ -627,7 +627,7 @@ CUDA_HOST_DEVICE_CALLABLE decltype(auto) aggregation_dispatcher(aggregation::Kin #ifndef __CUDA_ARCH__ CUDF_FAIL("Unsupported aggregation."); #else - release_assert(false && "Unsupported aggregation."); + cudf_assert(false && "Unsupported aggregation."); // The following code will never be reached, but the compiler generates a // warning if there isn't a return value. diff --git a/cpp/include/cudf/detail/gather.cuh b/cpp/include/cudf/detail/gather.cuh index 87f5c9251c7..73647ac2292 100644 --- a/cpp/include/cudf/detail/gather.cuh +++ b/cpp/include/cudf/detail/gather.cuh @@ -18,8 +18,8 @@ #include #include #include +#include #include -#include #include #include #include diff --git a/cpp/include/cudf/detail/indexalator.cuh b/cpp/include/cudf/detail/indexalator.cuh index 8568bd68bfd..8bbd0d1aada 100644 --- a/cpp/include/cudf/detail/indexalator.cuh +++ b/cpp/include/cudf/detail/indexalator.cuh @@ -268,7 +268,7 @@ struct input_indexalator : base_indexalator { template ()>* = nullptr> __device__ size_type operator()(void const* tp) { - release_assert(false and "only index types are supported"); + cudf_assert(false and "only index types are supported"); return 0; } }; @@ -366,7 +366,7 @@ struct output_indexalator : base_indexalator { template ()>* = nullptr> __device__ void operator()(void* tp, size_type const value) { - release_assert(false and "only index types are supported"); + cudf_assert(false and "only index types are supported"); } }; diff --git a/cpp/include/cudf/detail/utilities/release_assert.cuh b/cpp/include/cudf/detail/utilities/assert.cuh similarity index 87% rename from cpp/include/cudf/detail/utilities/release_assert.cuh rename to cpp/include/cudf/detail/utilities/assert.cuh index e0db88d8fcb..69f9e2d3791 100644 --- a/cpp/include/cudf/detail/utilities/release_assert.cuh +++ b/cpp/include/cudf/detail/utilities/assert.cuh @@ -27,11 +27,11 @@ * * Relies on the `__PRETTY_FUNCTION__` macro which is specific to GCC and Clang. */ -#if defined(__CUDA_ARCH__) && (defined(__clang__) || defined(__GNUC__)) +#if !defined(NDEBUG) && defined(__CUDA_ARCH__) && (defined(__clang__) || defined(__GNUC__)) #define __ASSERT_STR_HELPER(x) #x -#define release_assert(e) \ +#define cudf_assert(e) \ ((e) ? static_cast(0) \ : __assert_fail(__ASSERT_STR_HELPER(e), __FILE__, __LINE__, __PRETTY_FUNCTION__)) #else -#define release_assert(e) (static_cast(0)) +#define cudf_assert(e) (static_cast(0)) #endif diff --git a/cpp/include/cudf/detail/utilities/hash_functions.cuh b/cpp/include/cudf/detail/utilities/hash_functions.cuh index e9d66d125dd..a2e7d6d4aae 100644 --- a/cpp/include/cudf/detail/utilities/hash_functions.cuh +++ b/cpp/include/cudf/detail/utilities/hash_functions.cuh @@ -17,7 +17,7 @@ #pragma once #include -#include +#include #include #include @@ -154,7 +154,7 @@ struct MD5ListHasher { size_type offset_end, md5_intermediate_data* hash_state) const { - release_assert(false && "MD5 Unsupported chrono type column"); + cudf_assert(false && "MD5 Unsupported chrono type column"); } template ()>* = nullptr> @@ -163,7 +163,7 @@ struct MD5ListHasher { size_type offset_end, md5_intermediate_data* hash_state) const { - release_assert(false && "MD5 Unsupported non-fixed-width type column"); + cudf_assert(false && "MD5 Unsupported non-fixed-width type column"); } template ()>* = nullptr> @@ -273,7 +273,7 @@ struct MD5Hash { size_type row_index, md5_intermediate_data* hash_state) const { - release_assert(false && "MD5 Unsupported chrono type column"); + cudf_assert(false && "MD5 Unsupported chrono type column"); } template ()>* = nullptr> @@ -281,7 +281,7 @@ struct MD5Hash { size_type row_index, md5_intermediate_data* hash_state) const { - release_assert(false && "MD5 Unsupported non-fixed-width type column"); + cudf_assert(false && "MD5 Unsupported non-fixed-width type column"); } template ()>* = nullptr> @@ -344,7 +344,7 @@ void CUDA_DEVICE_CALLABLE MD5Hash::operator()(column_device_view col, column_device_view offsets = col.child(offsets_column_index); column_device_view data = col.child(data_column_index); - if (data.type().id() == type_id::LIST) release_assert(false && "Nested list unsupported"); + if (data.type().id() == type_id::LIST) cudf_assert(false && "Nested list unsupported"); cudf::type_dispatcher(data.type(), MD5ListHasher{}, @@ -724,7 +724,7 @@ struct IdentityHash { CUDA_HOST_DEVICE_CALLABLE std::enable_if_t::value, return_type> operator()(Key const& key) const { - release_assert(false && "IdentityHash does not support this data type"); + cudf_assert(false && "IdentityHash does not support this data type"); return 0; } diff --git a/cpp/include/cudf/fixed_point/fixed_point.hpp b/cpp/include/cudf/fixed_point/fixed_point.hpp index 8f8e2b7394c..eb752a8a0ea 100644 --- a/cpp/include/cudf/fixed_point/fixed_point.hpp +++ b/cpp/include/cudf/fixed_point/fixed_point.hpp @@ -16,7 +16,7 @@ #pragma once -#include +#include #include // Note: The versions are used in order for Jitify to work with our fixed_point type. @@ -91,7 +91,7 @@ template ())>* = nullptr> CUDA_HOST_DEVICE_CALLABLE Rep ipow(T exponent) { - release_assert(exponent >= 0 && "integer exponentiation with negative exponent is not possible."); + cudf_assert(exponent >= 0 && "integer exponentiation with negative exponent is not possible."); if (exponent == 0) return static_cast(1); auto extra = static_cast(1); auto square = static_cast(Base); diff --git a/cpp/include/cudf/lists/list_device_view.cuh b/cpp/include/cudf/lists/list_device_view.cuh index 3afafe9d1fa..4f207474526 100644 --- a/cpp/include/cudf/lists/list_device_view.cuh +++ b/cpp/include/cudf/lists/list_device_view.cuh @@ -37,12 +37,12 @@ class list_device_view { : lists_column(lists_column), _row_index(row_index) { column_device_view const& offsets = lists_column.offsets(); - release_assert(row_index >= 0 && row_index < lists_column.size() && - row_index < offsets.size() && "row_index out of bounds"); + cudf_assert(row_index >= 0 && row_index < lists_column.size() && row_index < offsets.size() && + "row_index out of bounds"); begin_offset = offsets.element(row_index); - release_assert(begin_offset >= 0 && begin_offset <= lists_column.child().size() && - "begin_offset out of bounds."); + cudf_assert(begin_offset >= 0 && begin_offset <= lists_column.child().size() && + "begin_offset out of bounds."); _size = offsets.element(row_index + 1) - begin_offset; } @@ -71,7 +71,7 @@ class list_device_view { */ CUDA_DEVICE_CALLABLE size_type element_offset(size_type idx) const { - release_assert(idx >= 0 && idx < size() && "idx out of bounds"); + cudf_assert(idx >= 0 && idx < size() && "idx out of bounds"); return begin_offset + idx; } @@ -93,7 +93,7 @@ class list_device_view { */ CUDA_DEVICE_CALLABLE bool is_null(size_type idx) const { - release_assert(idx >= 0 && idx < size() && "Index out of bounds."); + cudf_assert(idx >= 0 && idx < size() && "Index out of bounds."); auto element_offset = begin_offset + idx; return lists_column.child().is_null(element_offset); } @@ -294,7 +294,7 @@ struct list_size_functor { CUDA_HOST_DEVICE_CALLABLE list_size_functor(column_device_view const& d_col) : d_column(d_col) { #if defined(__CUDA_ARCH__) - release_assert(d_col.type().id() == type_id::LIST && "Only list type column is supported"); + cudf_assert(d_col.type().id() == type_id::LIST && "Only list type column is supported"); #else CUDF_EXPECTS(d_col.type().id() == type_id::LIST, "Only list type column is supported"); #endif diff --git a/cpp/include/cudf/table/row_operators.cuh b/cpp/include/cudf/table/row_operators.cuh index d9840e78be2..04d215ff7cb 100644 --- a/cpp/include/cudf/table/row_operators.cuh +++ b/cpp/include/cudf/table/row_operators.cuh @@ -17,8 +17,8 @@ #pragma once #include +#include #include -#include #include #include #include @@ -190,7 +190,7 @@ class element_equality_comparator { std::enable_if_t()>* = nullptr> __device__ bool operator()(size_type lhs_element_index, size_type rhs_element_index) { - release_assert(false && "Attempted to compare elements of uncomparable types."); + cudf_assert(false && "Attempted to compare elements of uncomparable types."); return false; } @@ -291,7 +291,7 @@ class element_relational_comparator { std::enable_if_t()>* = nullptr> __device__ weak_ordering operator()(size_type lhs_element_index, size_type rhs_element_index) { - release_assert(false && "Attempted to compare elements of uncomparable types."); + cudf_assert(false && "Attempted to compare elements of uncomparable types."); return weak_ordering::LESS; } diff --git a/cpp/include/cudf/utilities/type_dispatcher.hpp b/cpp/include/cudf/utilities/type_dispatcher.hpp index 26c51d0435a..bd9ea015a32 100644 --- a/cpp/include/cudf/utilities/type_dispatcher.hpp +++ b/cpp/include/cudf/utilities/type_dispatcher.hpp @@ -16,7 +16,7 @@ #pragma once -#include +#include #include #include #include @@ -501,7 +501,7 @@ CUDA_HOST_DEVICE_CALLABLE constexpr decltype(auto) type_dispatcher(cudf::data_ty #ifndef __CUDA_ARCH__ CUDF_FAIL("Unsupported type_id."); #else - release_assert(false && "Unsupported type_id."); + cudf_assert(false && "Unsupported type_id."); // The following code will never be reached, but the compiler generates a // warning if there isn't a return value. diff --git a/cpp/src/groupby/hash/multi_pass_kernels.cuh b/cpp/src/groupby/hash/multi_pass_kernels.cuh index a491b50478a..24de22705a9 100644 --- a/cpp/src/groupby/hash/multi_pass_kernels.cuh +++ b/cpp/src/groupby/hash/multi_pass_kernels.cuh @@ -20,8 +20,8 @@ #include #include #include +#include #include -#include #include #include @@ -65,7 +65,7 @@ struct var_hash_functor { size_type source_index, size_type target_index) noexcept { - release_assert(false and "Invalid source type for std, var aggregation combination."); + cudf_assert(false and "Invalid source type for std, var aggregation combination."); } template diff --git a/cpp/src/io/parquet/page_data.cu b/cpp/src/io/parquet/page_data.cu index 6e8937607b9..437d4b56e22 100644 --- a/cpp/src/io/parquet/page_data.cu +++ b/cpp/src/io/parquet/page_data.cu @@ -18,7 +18,7 @@ #include #include -#include +#include #include #include diff --git a/cpp/src/quantiles/quantiles_util.hpp b/cpp/src/quantiles/quantiles_util.hpp index 67323126751..1df0a4ab41a 100644 --- a/cpp/src/quantiles/quantiles_util.hpp +++ b/cpp/src/quantiles/quantiles_util.hpp @@ -15,7 +15,7 @@ */ #include -#include +#include #include #include @@ -144,7 +144,7 @@ select_quantile(ValueAccessor get_value, size_type size, double q, interpolation default: #if defined(__CUDA_ARCH__) - release_assert(false && "Invalid interpolation operation for quantiles"); + cudf_assert(false && "Invalid interpolation operation for quantiles"); return Result(); #else CUDF_FAIL("Invalid interpolation operation for quantiles."); @@ -173,7 +173,7 @@ select_quantile_data(Iterator begin, size_type size, double q, interpolation int } #if defined(__CUDA_ARCH__) - release_assert(false && "Invalid interpolation operation for quantiles"); + cudf_assert(false && "Invalid interpolation operation for quantiles"); return Result(); #else CUDF_FAIL("Invalid interpolation operation for quantiles."); @@ -200,7 +200,7 @@ CUDA_HOST_DEVICE_CALLABLE bool select_quantile_validity(Iterator begin, } #if defined(__CUDA_ARCH__) - release_assert(false && "Invalid interpolation operation for quantiles"); + cudf_assert(false && "Invalid interpolation operation for quantiles"); return false; #else CUDF_FAIL("Invalid interpolation operation for quantiles."); diff --git a/cpp/tests/error/error_handling_test.cu b/cpp/tests/error/error_handling_test.cu index debf540ea8e..da9509e94a6 100644 --- a/cpp/tests/error/error_handling_test.cu +++ b/cpp/tests/error/error_handling_test.cu @@ -83,11 +83,13 @@ TEST(StreamCheck, CatchFailedKernel) "invalid configuration argument"); } -__global__ void assert_false_kernel() { release_assert(false && "this kernel should die"); } +#ifndef NDEBUG + +__global__ void assert_false_kernel() { cudf_assert(false && "this kernel should die"); } -__global__ void assert_true_kernel() { release_assert(true && "this kernel should live"); } +__global__ void assert_true_kernel() { cudf_assert(true && "this kernel should live"); } -TEST(ReleaseAssertDeathTest, release_assert_false) +TEST(DebugAssertDeathTest, cudf_assert_false) { testing::FLAGS_gtest_death_test_style = "threadsafe"; @@ -100,19 +102,21 @@ TEST(ReleaseAssertDeathTest, release_assert_false) // each attempted kernel launch if (cudaErrorAssert == cudaDeviceSynchronize()) { std::abort(); } - // If we reach this point, the release_assert didn't work so we exit normally, which will cause + // If we reach this point, the cudf_assert didn't work so we exit normally, which will cause // EXPECT_DEATH to fail. }; EXPECT_DEATH(call_kernel(), "this kernel should die"); } -TEST(ReleaseAssert, release_assert_true) +TEST(DebugAssert, cudf_assert_true) { assert_true_kernel<<<1, 1>>>(); ASSERT_EQ(cudaSuccess, cudaDeviceSynchronize()); } +#endif + // These tests don't use CUDF_TEST_PROGRAM_MAIN because : // 1.) They don't need the RMM Pool // 2.) The RMM Pool interferes with the death test