Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Verify that no nonempty nulls exist in testing macros #12786

Closed
vyasr opened this issue Feb 16, 2023 · 3 comments
Closed

[FEA] Verify that no nonempty nulls exist in testing macros #12786

vyasr opened this issue Feb 16, 2023 · 3 comments
Labels
0 - Backlog In queue waiting for assignment feature request New feature or request Spark Functionality that helps Spark RAPIDS tests Unit testing for project

Comments

@vyasr
Copy link
Contributor

vyasr commented Feb 16, 2023

Is your feature request related to a problem? Please describe.
libcudf functions expect nulls to be sanitized (see "libcudf expects nested types to have sanitized null masks" at https://docs.rapids.ai/api/libcudf/stable/developer_guide). Providing unsanitized inputs to libcudf functions may result in incorrect results. We are willing to accept putting the onus on the user to ensure that any manually constructed inputs are properly sanitized in this way. However, currently this approach can lead to hidden bugs if a libcudf developer implements an API that produces unsanitized columns, something that we do not test for. Adding tests for a particular API to ensure that nulls are sanitized seems helpful on its face, but in practice it requires identifying APIs that need this check a priori, severely reducing the effectiveness of this approach. Therefore, an alternative approach is needed that automatically verifies these invariants for all tests without any additional work on the part of developers.

Describe the solution you'd like
We should modify our CUDF_TESTS_EXPECTS_* assertion macros to verify that the input columns are properly sanitized prior to performing any comparison. If the inputs columns are unsanitized, a suitable error should be raised even if the intended assertion would otherwise have passed. This approach automatically injects sanitization validation into all tests and ensures that no libcudf APIs produce unsanitized outputs.

@vyasr vyasr added feature request New feature or request 0 - Backlog In queue waiting for assignment tests Unit testing for project labels Feb 16, 2023
@vyasr vyasr added this to libcudf Feb 16, 2023
@GregoryKimball GregoryKimball moved this to Needs owner in libcudf Apr 7, 2023
@GregoryKimball GregoryKimball moved this from Needs owner to To be revisited in libcudf Oct 26, 2023
@GregoryKimball GregoryKimball added the Spark Functionality that helps Spark RAPIDS label Nov 17, 2023
@SurajAralihalli
Copy link
Contributor

I can look into this, thanks!

@vyasr
Copy link
Contributor Author

vyasr commented Nov 29, 2023

@SurajAralihalli thanks! Let us know if you need any help with this.

rapids-bot bot pushed a commit that referenced this issue Dec 18, 2023
…14559)

This PR addresses Issue #[12786](#12786)

The listed functions have been modified to incorporate a column sanitization check; otherwise, they will raise a `std::invalid_argument` error.
- `expect_column_properties_equal`
- `expect_column_properties_equivalent`
- `expect_columns_equal`
- `expect_columns_equivalent`

Authors:
  - Suraj Aralihalli (https://github.com/SurajAralihalli)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - MithunR (https://github.com/mythrocks)

URL: #14559
@GregoryKimball
Copy link
Contributor

Amazing work @SurajAralihalli !!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0 - Backlog In queue waiting for assignment feature request New feature or request Spark Functionality that helps Spark RAPIDS tests Unit testing for project
Projects
Archived in project
Development

No branches or pull requests

3 participants