[FEA] Zero-copy slice should compute the null counts for the slices #3600

jrhemstad · 2019-12-13T00:58:59Z

Is your feature request related to a problem? Please describe.

The zero-copy slice API returns a vector<column_view>s of sub-views into another column.

Currently, the returned column_views have an internal null count of UNKNOWN_NULL_COUNT, which means each sub-views null count will be computed lazily upon invocation of column_view::null_count().

If the number of partitions is large, then this could mean a significant amount of overhead to compute the null count for each sub-view. Each sub-view would require it's own kernel launch to count a relatively small number of bits.

It would be much faster to compute the null count for each returned sub-view in a single kernel within the slice function.

Describe the solution you'd like

cudf::slice should be updated to compute the null count of each output slice in a single kernel.

Basically, the problem is given a bitmask and a list of pairs of bit indices like: [ {begin0, end0}, {begin1, end1}, ... {beginN, endN}]

We want to produce the count of unset bits between each begin(i),end(i) so the output is [ count0, count1, ... countN].

The output column_views null counts can then be updated accordingly.

@harrism suggested a good implementation idea:

I think it's a matter of a cubDeviceSegmentedReduce() with segment start/end index iterators and input transform iterator which converts bit indices to word indices, masks off any unused portion of the current word, and uses an inverted popc to return the number of null bits in the word. Done.

Additional context
Related: #3579 (comment)

The text was updated successfully, but these errors were encountered:

nvdbaranec · 2019-12-19T16:04:48Z

Would it make sense to make the computation an optional flag (maybe defaulting to true)? Right now slice/split() are nice and lightweight. Being able to turn off this additional computation for specific cases might be handy.

nvdbaranec · 2019-12-19T20:43:46Z

I have a useful extension of this that might be worth incorporating:

Currently, when a string column is sliced, the size of the chars child column is not updated on the CPU. This makes sense because you have to reach into gpu memory to check the offsets column to get the info.

For the strings implementation of contiguous_split() I need to be able to compute a buffer size that includes the chars child subcolumn in a fast way, at scale. Currently, you can get this info on the cpu by calling strings_column_view.chars() - however that function ends up reading device memory directly to get the info. This will not be fast enough for the scale that contiguous_split() wants to run at (potentially hundreds of thousands of columns).

So, if we're writing a kernel that is computing null_counts() on the validity vector, it might also make sense also to compute the size field of the chars child column for string columns. This way, strings_column_view.chars() would become "free".

harrism · 2019-12-19T23:00:50Z

I think that makes sense.

nvdbaranec · 2020-01-03T15:18:53Z

To expand on Jake's comment:

"cudf::slice should be updated to compute the null count of each output slice in a single kernel."

This should also hold true for tables as well. A major use case for slice() at scale is contiguous_split(). My standard test case has been:

6 GB of fixed width data total (random floats and ints)
Table with 512 columns
256 splits

This yields a total of 128k output columns. The performance of contiguous_split() at that scale becomes dominated by # of kernel calls. So a solution that computes the null counts at the split level (whole output tables) in a single kernel instead of at the column level is highly desirable. It's the difference between 256 additional kernel calls and 128k additional kernel calls.

seunghwak · 2020-01-03T18:37:33Z

Would it make sense to make the computation an optional flag (maybe defaulting to true)? Right now slice/split() are nice and lightweight. Being able to turn off this additional computation for specific cases might be handy.

Yes, providing an optional flag can be one way, and I think computing null counts only if the original column has a valid null count is another; and this applies to copy_range or other functions as well.

Null counts may not be necessary in all applications, and in case null counts are unnecessary, seemingly cheap operations can become expensive due to computing null counts and this can be surprising to users.

Let me know what you guys think.

jrhemstad · 2020-01-06T16:44:47Z

This should also hold true for tables as well

AFAIK the table split just calls the column split on each column, so this should already be taken care of.

I think computing null counts only if the original column has a valid null count is another

That would require some mechanism to detect if the original column has a "valid" null count, which I am hoping to avoid.

Would it make sense to make the computation an optional flag (maybe defaulting to true)?

I generally dislike flags like this in public APIs. It exposes too much implementation detail. I'd want to see how expensive the null counts for all splits is before making a final call.

seunghwak · 2020-01-06T17:29:49Z

I think computing null counts only if the original column has a valid null count is another

That would require some mechanism to detect if the original column has a "valid" null count, which I am hoping to avoid.

Would it make sense to make the computation an optional flag (maybe defaulting to true)?

I generally dislike flags like this in public APIs. It exposes too much implementation detail. I'd want to see how expensive the null counts for all splits is before making a final call.

I agree that we don't need to decide on this now, but libcudf++ already provides a mechanism to invalidate a null count,

cudf/cpp/include/cudf/column/column_view.hpp

Line 132 in 072785e

* point `set_null_count(UNKNOWN_NULL_COUNT)` was invoked, then the

so, I'm not sure adding a function to peek whether a column_view object has a valid null count further compromises the abstraction level.

jrhemstad · 2020-01-06T17:34:43Z

I'm not sure adding a function to peek whether a column_view object has a valid null count further compromises the abstraction level.

See #3579 for conversation on this topic.

kkraus14 · 2020-01-29T04:46:36Z

Fixed by #3698

jrhemstad added feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. labels Dec 13, 2019

harrism assigned seunghwak Dec 13, 2019

nvdbaranec mentioned this issue Dec 17, 2019

[REVIEW] contiguous_split function. #3557

Merged

nvdbaranec mentioned this issue Dec 20, 2019

[FEA] Compute size of chars child column in a strings_column_view() when performing a slice. #3652

Closed

seunghwak mentioned this issue Jan 6, 2020

[REVIEW] Implement count_(un)set_bits functions taking multiple ranges and update slice to compute null counts at once #3698

Merged

5 tasks

kkraus14 closed this as completed Jan 29, 2020

nvdbaranec mentioned this issue Feb 13, 2020

[REVIEW] Remove UNKNOWN_NULL_COUNT workaround from contiguous_split() #4147

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Zero-copy slice should compute the null counts for the slices #3600

[FEA] Zero-copy slice should compute the null counts for the slices #3600

jrhemstad commented Dec 13, 2019 •

edited

Loading

nvdbaranec commented Dec 19, 2019

nvdbaranec commented Dec 19, 2019

harrism commented Dec 19, 2019

nvdbaranec commented Jan 3, 2020

seunghwak commented Jan 3, 2020

jrhemstad commented Jan 6, 2020

seunghwak commented Jan 6, 2020

jrhemstad commented Jan 6, 2020

kkraus14 commented Jan 29, 2020

[FEA] Zero-copy slice should compute the null counts for the slices #3600

[FEA] Zero-copy slice should compute the null counts for the slices #3600

Comments

jrhemstad commented Dec 13, 2019 • edited Loading

nvdbaranec commented Dec 19, 2019

nvdbaranec commented Dec 19, 2019

harrism commented Dec 19, 2019

nvdbaranec commented Jan 3, 2020

seunghwak commented Jan 3, 2020

jrhemstad commented Jan 6, 2020

seunghwak commented Jan 6, 2020

jrhemstad commented Jan 6, 2020

kkraus14 commented Jan 29, 2020

jrhemstad commented Dec 13, 2019 •

edited

Loading