[FEA] Verify that no nonempty nulls exist in testing macros #12786

vyasr · 2023-02-16T00:21:05Z

Is your feature request related to a problem? Please describe.
libcudf functions expect nulls to be sanitized (see "libcudf expects nested types to have sanitized null masks" at https://docs.rapids.ai/api/libcudf/stable/developer_guide). Providing unsanitized inputs to libcudf functions may result in incorrect results. We are willing to accept putting the onus on the user to ensure that any manually constructed inputs are properly sanitized in this way. However, currently this approach can lead to hidden bugs if a libcudf developer implements an API that produces unsanitized columns, something that we do not test for. Adding tests for a particular API to ensure that nulls are sanitized seems helpful on its face, but in practice it requires identifying APIs that need this check a priori, severely reducing the effectiveness of this approach. Therefore, an alternative approach is needed that automatically verifies these invariants for all tests without any additional work on the part of developers.

Describe the solution you'd like
We should modify our CUDF_TESTS_EXPECTS_* assertion macros to verify that the input columns are properly sanitized prior to performing any comparison. If the inputs columns are unsanitized, a suitable error should be raised even if the intended assertion would otherwise have passed. This approach automatically injects sanitization validation into all tests and ensures that no libcudf APIs produce unsanitized outputs.

The text was updated successfully, but these errors were encountered:

SurajAralihalli · 2023-11-27T12:18:54Z

I can look into this, thanks!

vyasr · 2023-11-29T22:21:58Z

@SurajAralihalli thanks! Let us know if you need any help with this.

…14559) This PR addresses Issue #[12786](#12786) The listed functions have been modified to incorporate a column sanitization check; otherwise, they will raise a `std::invalid_argument` error. - `expect_column_properties_equal` - `expect_column_properties_equivalent` - `expect_columns_equal` - `expect_columns_equivalent` Authors: - Suraj Aralihalli (https://github.com/SurajAralihalli) Approvers: - Nghia Truong (https://github.com/ttnghia) - MithunR (https://github.com/mythrocks) URL: #14559

GregoryKimball · 2024-01-05T05:11:00Z

Amazing work @SurajAralihalli !!

vyasr added feature request New feature or request 0 - Backlog In queue waiting for assignment tests Unit testing for project labels Feb 16, 2023

vyasr added this to libcudf Feb 16, 2023

GregoryKimball mentioned this issue Apr 7, 2023

[FEA] Make calling to purge_nonempty_nulls optional in various places #12567

Closed

GregoryKimball moved this to Needs owner in libcudf Apr 7, 2023

GregoryKimball moved this from Needs owner to To be revisited in libcudf Oct 26, 2023

GregoryKimball added the Spark Functionality that helps Spark RAPIDS label Nov 17, 2023

SurajAralihalli mentioned this issue Dec 4, 2023

Add column sanitization checks in CUDF_TEST_EXPECT_COLUMN_* macros #14559

Merged

3 tasks

GregoryKimball closed this as completed Jan 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Verify that no nonempty nulls exist in testing macros #12786

[FEA] Verify that no nonempty nulls exist in testing macros #12786

vyasr commented Feb 16, 2023

SurajAralihalli commented Nov 27, 2023

vyasr commented Nov 29, 2023

GregoryKimball commented Jan 5, 2024

[FEA] Verify that no nonempty nulls exist in testing macros #12786

[FEA] Verify that no nonempty nulls exist in testing macros #12786

Comments

vyasr commented Feb 16, 2023

SurajAralihalli commented Nov 27, 2023

vyasr commented Nov 29, 2023

GregoryKimball commented Jan 5, 2024