-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce compile time/size for scan.cu #7516
Conversation
|
Codecov Report
@@ Coverage Diff @@
## branch-0.19 #7516 +/- ##
===============================================
+ Coverage 81.86% 82.29% +0.43%
===============================================
Files 101 101
Lines 16884 17264 +380
===============================================
+ Hits 13822 14208 +386
+ Misses 3062 3056 -6
Continue to review full report at Codecov.
|
This improvement is great and much needed. Do you know how this impacts the run-time performance of scan? |
For nullable columns the performance does not change -- the code is almost identical so that is expected. The main change is for non-nullable columns which did not require a null-replacement-iterator. We are adding an extra non-divergent runtime if-check when retrieving the element. I reran my benchmarks and found this approach does slow a bit with smaller rows (10K) but never more than 1-2 microseconds. For larger rows (1M or more) the slow down is in nanoseconds and less than 1.25%. Here are some benchmark results for some non-null fixed-width columns. I used the gbenchmark included in this PR on my desktop which is Ubuntu 18.04 with CUDA 11.0 on a GV100.
The negative values means that the null-replace implementation in this PR was technically faster though within a reasonable measurement fluctuation error. All of the smaller row columns run in about 32us. Only the 100M row benchmarks are above 1ms. Honestly, I was hoping it would be faster since we are generating half the code. But thrust's scan is very fast with fixed-width types and trivial predicates like the ones in scan.cu. |
@gpucibot merge |
This PR reduces the number of calls to `inclusive_scan` and `exclusive_scan` by using a `null_replace_accessor` that allows non-nullable columns. This reduces the compile time and size of `scan.cu` by half. This PR also includes a scan gbenchmark that shows no change in performance from the original implementation. Authors: - David (@davidwendt) Approvers: - Paul Taylor (@trxcllnt) - Jake Hemstad (@jrhemstad) URL: rapidsai#7516
This PR reduces the number of calls to
inclusive_scan
andexclusive_scan
by using anull_replace_accessor
that allows non-nullable columns. This reduces the compile time and size ofscan.cu
by half. This PR also includes a scan gbenchmark that shows no change in performance from the original implementation.