[FEA] Verify that no nonempty nulls exist in testing macros #12786
Labels
0 - Backlog
In queue waiting for assignment
feature request
New feature or request
Spark
Functionality that helps Spark RAPIDS
tests
Unit testing for project
Is your feature request related to a problem? Please describe.
libcudf functions expect nulls to be sanitized (see "libcudf expects nested types to have sanitized null masks" at https://docs.rapids.ai/api/libcudf/stable/developer_guide). Providing unsanitized inputs to libcudf functions may result in incorrect results. We are willing to accept putting the onus on the user to ensure that any manually constructed inputs are properly sanitized in this way. However, currently this approach can lead to hidden bugs if a libcudf developer implements an API that produces unsanitized columns, something that we do not test for. Adding tests for a particular API to ensure that nulls are sanitized seems helpful on its face, but in practice it requires identifying APIs that need this check a priori, severely reducing the effectiveness of this approach. Therefore, an alternative approach is needed that automatically verifies these invariants for all tests without any additional work on the part of developers.
Describe the solution you'd like
We should modify our
CUDF_TESTS_EXPECTS_*
assertion macros to verify that the input columns are properly sanitized prior to performing any comparison. If the inputs columns are unsanitized, a suitable error should be raised even if the intended assertion would otherwise have passed. This approach automatically injects sanitization validation into all tests and ensures that no libcudf APIs produce unsanitized outputs.The text was updated successfully, but these errors were encountered: