Reduce compile time/size for scan.cu #7516

davidwendt · 2021-03-04T20:56:55Z

This PR reduces the number of calls to inclusive_scan and exclusive_scan by using a null_replace_accessor that allows non-nullable columns. This reduces the compile time and size of scan.cu by half. This PR also includes a scan gbenchmark that shows no change in performance from the original implementation.

davidwendt · 2021-03-04T21:25:22Z

	baseline	refactored
compile of scan.cu	194 s	113 s
scan.cu.o size	6.42MB	3.54MB
libcudf.so size	353MB	351MB
debug compile scan.cu	27.3 min	13.6 min
debug scan.cu.o size	361MB	188MB

cpp/src/reductions/scan.cu

…r<dispatch_storage_type>

codecov · 2021-03-06T00:25:04Z

Codecov Report

Merging #7516 (8eabdc1) into branch-0.19 (7871e7a) will increase coverage by 0.43%.
The diff coverage is 93.75%.

@@               Coverage Diff               @@
##           branch-0.19    #7516      +/-   ##
===============================================
+ Coverage        81.86%   82.29%   +0.43%     
===============================================
  Files              101      101              
  Lines            16884    17264     +380     
===============================================
+ Hits             13822    14208     +386     
+ Misses            3062     3056       -6

Impacted Files	Coverage Δ
python/cudf/cudf/core/column/column.py	`87.80% <75.00%> (+0.04%)`	⬆️
python/cudf/cudf/core/column/decimal.py	`95.83% <100.00%> (+0.96%)`	⬆️
python/cudf/cudf/core/column/string.py	`86.76% <100.00%> (+0.26%)`	⬆️
python/cudf/cudf/io/feather.py	`100.00% <0.00%> (ø)`
python/cudf/cudf/comm/serialize.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/_fuzz_testing/io.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/core/column/struct.py	`100.00% <0.00%> (ø)`
python/dask_cudf/dask_cudf/_version.py	`0.00% <0.00%> (ø)`
python/dask_cudf/dask_cudf/io/tests/test_csv.py	`100.00% <0.00%> (ø)`
python/dask_cudf/dask_cudf/io/tests/test_orc.py	`100.00% <0.00%> (ø)`
... and 40 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cc104ef...6bf0175. Read the comment docs.

hyperbolic2346 · 2021-03-06T03:04:21Z

This improvement is great and much needed. Do you know how this impacts the run-time performance of scan?

davidwendt · 2021-03-08T19:30:21Z

For nullable columns the performance does not change -- the code is almost identical so that is expected. The main change is for non-nullable columns which did not require a null-replacement-iterator. We are adding an extra non-divergent runtime if-check when retrieving the element. I reran my benchmarks and found this approach does slow a bit with smaller rows (10K) but never more than 1-2 microseconds. For larger rows (1M or more) the slow down is in nanoseconds and less than 1.25%.

Here are some benchmark results for some non-null fixed-width columns. I used the gbenchmark included in this PR on my desktop which is Ubuntu 18.04 with CUDA 11.0 on a GV100.

type	size	us diff	% change
int32	10K	1.12	3.76%
int32	100K	0.743	2.36%
int32	1M	0.498	1.21%
int32	10M	0.692	0.42%
int32	100M	-0.482	-0.03%
uint64	10K	0.825	2.72%
uint64	100K	0.470	1.44%
uint64	1M	0.362	0.62%
uint64	10M	0.359	0.12%
uint64	100M	-0.208	-0.01%
float	10K	1.061	3.55%
float	100K	0.533	1.68%
float	1M	0.269	0.65%
float	10M	0.701	0.43%
float	100M	-0.102	-0.01%

The negative values means that the null-replace implementation in this PR was technically faster though within a reasonable measurement fluctuation error. All of the smaller row columns run in about 32us. Only the 100M row benchmarks are above 1ms.

Honestly, I was hoping it would be faster since we are generating half the code. But thrust's scan is very fast with fixed-width types and trivial predicates like the ones in scan.cu.

kkraus14 · 2021-03-09T02:35:09Z

@gpucibot merge

@davidwendt

This PR reduces the number of calls to `inclusive_scan` and `exclusive_scan` by using a `null_replace_accessor` that allows non-nullable columns. This reduces the compile time and size of `scan.cu` by half. This PR also includes a scan gbenchmark that shows no change in performance from the original implementation. Authors: - David (@davidwendt) Approvers: - Paul Taylor (@trxcllnt) - Jake Hemstad (@jrhemstad) URL: rapidsai#7516

Reduce compile time/size for scan.cu

846075a

davidwendt added 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Mar 4, 2021

davidwendt self-assigned this Mar 4, 2021

davidwendt requested review from a team as code owners March 4, 2021 20:56

davidwendt requested review from trxcllnt and jrhemstad March 4, 2021 20:56

github-actions bot added the CMake CMake build issue label Mar 4, 2021

jrhemstad reviewed Mar 4, 2021

View reviewed changes

cpp/src/reductions/scan.cu Outdated Show resolved Hide resolved

davidwendt added 3 commits March 5, 2021 08:34

null_replace_accessor column type check conflicts with type_dispatche…

90b4330

…r<dispatch_storage_type>

merge null-replace-iterators logic

25c6cf1

Merge branch 'branch-0.19' into compile-scan

6bf0175

trxcllnt approved these changes Mar 9, 2021

View reviewed changes

jrhemstad approved these changes Mar 9, 2021

View reviewed changes

rapids-bot bot merged commit 444d9f2 into rapidsai:branch-0.19 Mar 9, 2021

davidwendt deleted the compile-scan branch March 9, 2021 14:53

karthikeyann mentioned this pull request Mar 11, 2021

remove has_nulls template param in pair_begin, make_pair_iterator #7573

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce compile time/size for scan.cu #7516

Reduce compile time/size for scan.cu #7516

davidwendt commented Mar 4, 2021

davidwendt commented Mar 4, 2021

codecov bot commented Mar 6, 2021 •

edited

Loading

hyperbolic2346 commented Mar 6, 2021

davidwendt commented Mar 8, 2021

kkraus14 commented Mar 9, 2021

Reduce compile time/size for scan.cu #7516

Reduce compile time/size for scan.cu #7516

Conversation

davidwendt commented Mar 4, 2021

davidwendt commented Mar 4, 2021

codecov bot commented Mar 6, 2021 • edited Loading

Codecov Report

hyperbolic2346 commented Mar 6, 2021

davidwendt commented Mar 8, 2021

kkraus14 commented Mar 9, 2021

codecov bot commented Mar 6, 2021 •

edited

Loading