-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add column sanitization checks in CUDF_TEST_EXPECT_COLUMN_*
macros
#14559
Add column sanitization checks in CUDF_TEST_EXPECT_COLUMN_*
macros
#14559
Conversation
Signed-off-by: Suraj Aralihalli <[email protected]>
Signed-off-by: Suraj Aralihalli <[email protected]>
During local testing, the following tests did not pass:
|
It appears that the functions tested in the mentioned tests do not seem to produce sanitized columns. Each function needs a closer examination. I started by analyzing Returned:
Expected:
I suspect this discrepancy could be because the offsets are computed without considering the null mask. See |
Thanks for finding these. I can look into the strings and dictionary failures. |
CUDF_TEST_EXPECT_COLUMN_*
macros
Thank you @SurajAralihalli. This is such a great investigation! |
Thanks! This should really help us tighten up our null handling throughout libcudf. |
Signed-off-by: Suraj Aralihalli <[email protected]>
Your change looks good. ROLLING_TEST
RollingDictionaryTest.LeadLag I'm beginning to wonder if |
I verified that PR #14578 fixes this error. |
Fixes the strings specialization logic in `cudf::clamp` to not produce unsanitized null entries. The code was refactored and simplified as well. Also removed unsanitized nulls in test input in the `cudf::clamp` gtests. Reference: #14559 - fixes several of these gtests Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Mark Harris (https://github.com/harrism) - Nghia Truong (https://github.com/ttnghia) - Mike Wilson (https://github.com/hyperbolic2346) URL: #14580
Fixes `cudf::dictionary::decode` logic to produced sanitized null entries for compound column types. Reference: #14559 -- fixes many of the errors found here concerning dictionary column gtests. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Nghia Truong (https://github.com/ttnghia) - MithunR (https://github.com/mythrocks) URL: #14578
Fixes the string specialization logic in `cudf::segmented_reduce` to not produce unsanitized null entries. The functor used to build a gather map for argmin/argmax was corrected to handle include/exclude nulls correctly. Reference: #14559 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Nghia Truong (https://github.com/ttnghia) - Bradley Dice (https://github.com/bdice) URL: #14586
Removes unsanitized rows from input data in gtests for COPYING_TEST. This fixes some errors found in #14559 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Karthikeyan (https://github.com/karthikeyann) URL: #14600
Fixes the strings specialization logic in `cudf::clamp` to not produce unsanitized null entries. The code was refactored and simplified as well. Also removed unsanitized nulls in test input in the `cudf::clamp` gtests. Reference: rapidsai#14559 - fixes several of these gtests Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Mark Harris (https://github.com/harrism) - Nghia Truong (https://github.com/ttnghia) - Mike Wilson (https://github.com/hyperbolic2346) URL: rapidsai#14580
…#14578) Fixes `cudf::dictionary::decode` logic to produced sanitized null entries for compound column types. Reference: rapidsai#14559 -- fixes many of the errors found here concerning dictionary column gtests. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Nghia Truong (https://github.com/ttnghia) - MithunR (https://github.com/mythrocks) URL: rapidsai#14578
Fixes the string specialization logic in `cudf::segmented_reduce` to not produce unsanitized null entries. The functor used to build a gather map for argmin/argmax was corrected to handle include/exclude nulls correctly. Reference: rapidsai#14559 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Nghia Truong (https://github.com/ttnghia) - Bradley Dice (https://github.com/bdice) URL: rapidsai#14586
Removes unsanitized rows from input data in gtests for COPYING_TEST. This fixes some errors found in rapidsai#14559 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Karthikeyan (https://github.com/karthikeyann) URL: rapidsai#14600
Removes unsanitized rows from output result of `cudf::get_json_object` which may occur when querying an array path `$[*]` does not find any matches in the target string. In this case, a single `'['` remains in the output buffer (per row) though the row is marked as a null entry. This fixes the `JsonPathTests.GetJsonObjectInvalidQuery` error found in #14559 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Nghia Truong (https://github.com/ttnghia) - Vyas Ramasubramani (https://github.com/vyasr) URL: #14609
The
|
I think that's fine. We did something similar in 25ebec7 from #14363, and we discussed that in https://nvidia.slack.com/archives/C01CW5L51QC/p1699980706006349. Since our policies prohibit libcudf APIs from producing unsanitized outputs, we don't need to test unsanitized inputs. It's up to the user not to construct such inputs if they're creating their inputs manually. |
@SurajAralihalli I would recommend you remove this test
in this PR. And if you merge with the latest branch-24.02, I expect this PR should then pass CI. |
Sure @davidwendt! I'll do that by the end of this week when I get back from my trip. Thanks so much for letting me know. |
Signed-off-by: Suraj Aralihalli <[email protected]>
/ok to test |
/merge |
This PR removes an extra code path used for checking the equality of the null count when verifying if columns are equivalent (not equal). The purpose of this code path was to verify a specific definition of equivalence for columns containing unsanitized nulls, i.e. by ignoring the stored null count and directly verifying the validity of the underlying null mask. This is no longer necessary because we required sanitized null masks to be output from all libcudf APIs now (see the "libcudf expects nested types to have sanitized null masks" section in the [developer guide](https://docs.rapids.ai/api/libcudf/stable/developer_guide)), and this requirement will be enforced with the merge of #14559. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Mike Wilson (https://github.com/hyperbolic2346) - Bradley Dice (https://github.com/bdice) URL: #13312
This PR addresses Issue #12786
The listed functions have been modified to incorporate a column sanitization check; otherwise, they will raise a
std::invalid_argument
error.expect_column_properties_equal
expect_column_properties_equivalent
expect_columns_equal
expect_columns_equivalent
Checklist