-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] cudaErrorIllegalAddress on distinct_count and nunique for reduce #13576
Comments
This looks like an overflow due to deref a null pointer. |
|
That
What does the documentation for
So, if only some of the input columns are nullable (as is the case here), this returns a null pointer and a null count of zero (everything is valid). This is then used by cuco's auto const iter = thrust::counting_iterator<cudf::size_type>(0);
// when nulls are equal, insert non-null rows only to improve efficiency
if (nulls_equal == null_equality::EQUAL and has_nulls) {
thrust::counting_iterator<size_type> stencil(0);
auto const [row_bitmask, null_count] =
cudf::detail::bitmask_or(keys, stream, rmm::mr::get_current_device_resource());
row_validity pred{static_cast<bitmask_type const*>(row_bitmask.data())};
return key_set.insert_if(iter, iter + num_rows, stencil, pred, stream.value()) +
static_cast<cudf::size_type>(null_count > 0);
}
// otherwise, insert all
return key_set.insert(iter, iter + num_rows, stream.value()); AIUI, this check is designed to skip comparing rows when all entries in the row are null and nulls should compare equal. So I think we need to check that all (rather than any) columns are nulls before dispatching to diff --git a/cpp/src/stream_compaction/distinct_count.cu b/cpp/src/stream_compaction/distinct_count.cu
index 7185dae77b..25f2a67cba 100644
--- a/cpp/src/stream_compaction/distinct_count.cu
+++ b/cpp/src/stream_compaction/distinct_count.cu
@@ -151,10 +151,12 @@ cudf::size_type distinct_count(table_view const& keys,
auto const iter = thrust::counting_iterator<cudf::size_type>(0);
// when nulls are equal, insert non-null rows only to improve efficiency
- if (nulls_equal == null_equality::EQUAL and has_nulls) {
+ if (nulls_equal == null_equality::EQUAL and
+ std::all_of(keys.begin(), keys.end(), [](auto const& col) { return col.has_nulls(); })) {
thrust::counting_iterator<size_type> stencil(0);
auto const [row_bitmask, null_count] =
cudf::detail::bitmask_or(keys, stream, rmm::mr::get_current_device_resource());
+ CUDF_EXPECTS(null_count > 0, "Unpossible!");
row_validity pred{static_cast<bitmask_type const*>(row_bitmask.data())};
return key_set.insert_if(iter, iter + num_rows, stencil, pred, stream.value()) + |
Ah, no, this is deliberate (as per #7406 (comment)). If any masks are null then the validity of their or is null (i.e. everything is valid). So I think the first patch is correct. |
I think the logic here was wrong from the beginning. We want to avoid |
@wence- While looking for similar code, I only see one more instance of cudf/python/cudf/cudf/core/groupby/groupby.py Line 869 in 67efaf6
I'm not familiar with Python code here so can you have a look and check if it is also correct, please? |
I think I disagree. We want to count distinct rows, if nulls should compare equal, then we still need to hash the row if it only contains some nulls. That is, the table:
with Do you agree? |
This is safe afaict.
|
Humnn, sorry it seems that I was wrong. So we want to insert a row if it still contains at least one non-null element. So using I thought that cudf/cpp/src/search/contains_table.cu Line 127 in 67efaf6
@PointKernel Comment in the code misled me:
So I thought that we are inserting rows without any nulls. It should state this:
|
Yeah, I continue to be confused all the time by the interpretation of the null mask as "bit set => no null at this position". So the way we detect that a row is fully null is to OR the bitmasks together (because setness [validity] in any column means that the row is not fully null). |
In term of efficiency, I think using
cudf/cpp/src/search/contains_table.cu Line 202 in 67efaf6
I really don't have a better solution for this. I (and maybe @wence- ) continue to be confused 😄 |
I think this patch is a correctness fix, and only does diff --git a/cpp/src/stream_compaction/distinct_count.cu b/cpp/src/stream_compaction/distinct_count.cu
index 7185dae77b..341648d548 100644
--- a/cpp/src/stream_compaction/distinct_count.cu
+++ b/cpp/src/stream_compaction/distinct_count.cu
@@ -150,15 +150,21 @@ cudf::size_type distinct_count(table_view const& keys,
stream.value()};
auto const iter = thrust::counting_iterator<cudf::size_type>(0);
- // when nulls are equal, insert non-null rows only to improve efficiency
+ // when nulls are equal, only insert those rows that are not all null to improve efficiency
if (nulls_equal == null_equality::EQUAL and has_nulls) {
thrust::counting_iterator<size_type> stencil(0);
auto const [row_bitmask, null_count] =
cudf::detail::bitmask_or(keys, stream, rmm::mr::get_current_device_resource());
row_validity pred{static_cast<bitmask_type const*>(row_bitmask.data())};
- return key_set.insert_if(iter, iter + num_rows, stencil, pred, stream.value()) +
- static_cast<cudf::size_type>(null_count > 0);
+ // Unless all columns have a null mask, row_bitmask will be
+ // null, and null_count will be zero. Equally, unless there is
+ // some row which is null in all columns, null_count will be
+ // zero. So, it is only when null_count is not zero that we need
+ // to do a filtered insertion.
+ if (null_count > 0) {
+ return key_set.insert_if(iter, iter + num_rows, stencil, pred, stream.value()) + 1;
+ }
}
// otherwise, insert all
return key_set.insert(iter, iter + num_rows, stream.value()); |
But this is no worse than the case without any nulls (which is presumably the more common one). Do we have any benchmarks here? |
I don't think we have any benchmark for such extreme cases, because such benchmarks will never be able to complete. |
If nulls should compare equal then we want to filter out rows for which all column entries are invalid (bitmask not set). If any column is not nullable, then bitmask_or returns an empty bitmask buffer (and a null count of zero) indicating that the returned bitmask is fully valid. When passed as a predicate to insert_if we get a null pointer dereference. To avoid this, only run predicated insertion if the null count returned from bitmask_or is positive (which guarantees that the validity bitmask exists). This also avoids running predicated insertion when the predicate is always true. Closes rapidsai#13576.
If nulls should compare equal then we want to filter out rows for which all column entries are invalid (bitmask not set). If any column is not nullable, then bitmask_or returns an empty bitmask buffer (and a null count of zero) indicating that the returned bitmask is fully valid. When passed as a predicate to insert_if we get a null pointer dereference. To avoid this, only run predicated insertion if the null count returned from bitmask_or is positive (which guarantees that the validity bitmask exists). This also avoids running predicated insertion when the predicate is always true. Closes #13576. Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - Nghia Truong (https://github.com/ttnghia) - Yunsong Wang (https://github.com/PointKernel) URL: #13590
Describe the bug
I recently needed to get an approximate distinct count for a dynamic optimization I was doing in Spark. As a part of this I tried to use nunique as a reduce aggregation and ran into cudaErrorIllegalAddress on anything more complicated than some very basic types. I then tried to use distinct_count for the same thing, because I wanted the result on the CPU and this made it very simple to do, but I hit the same problem.
Steps/Code to reproduce bug
I was doing a test with TPCDS data and the keys to various hash aggregates, but I also saw it in a few unit tests. Generally the rule that I found, but don't really trust is that anything with more than one column in it or any value that is larger than 64-bits. So strings cause this, structs cause this, tables with more than one column has this show up. Decimal_128 causes this. Lots of things, but here is one example test that I wrote that also causes it to crash.
I am happy to provide more data that causes crashes if people want me to. In the short term I have worked around the issue by hashing the row before sending it to distinct_count, because all I really care about is an approximate count, not an exact one. But I don't want to put anything into production until this gets fixed.
I did run compute sanitizer on the failing test and got back a lot of issues, but this is the first one
This was on the latest 23.08 code.
Expected behavior
No crashes.
The text was updated successfully, but these errors were encountered: