-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QST] expect_columns_equal() vs expect_columns_equivalent() #5867
Comments
"equal" means two columns are exactly, bitwise equal. "equivalent" means two columns are functionality equivalent. Example, given two columns
This was the original intention of the difference between Equivalence can also have special meaning for floating point values where elements can be considered equivalent within some units of least precision. |
My opinion here is : "equals" is either semantically incorrect or people are using it with the wrong expectations. 99.9% of the time what you care about is "if I were to pass either of these columns along to further computation I would get the same results" and I'm pretty sure that's how people are using it. But the current definition of equals is much more strict than this, which causes unexpected side-effects and retroactive breakage. The clear example is how it is ok for empty strings columns to either have children or not. Depending on the whims of the function you called, you might get back a column which will pass expect_columns_equal() today but fail it tomorrow. The concrete instance of this I'd point out is the change I have in flight to empty_like(). With the new change:
This is valid by the standards of cudf. But when I made this change, a few random tests started failing because they were using equals() when what they needed was equivalent(). The end result is - 6 or 7 random checks in semi_join_tests.cpp mysteriously use So I guess I would advocate one of two things:
or
|
That describes equivalence, not equality.
This is likely because
Equivalence vs. equality is a well-established concept in mathematics, programming languages, and even C++ (e.g.,
I'd definitely support this. I agree that most places should probably use |
Thank you for the advice, @jrhemstad, @nvdbaranec. I will switch the |
To throw another log on the fire here : I just realized that expect_columns_equal() already returns true in a case that is just "equivalent". Specifically : sliced columns. The following all passes:
But these columns aren't really truly equal. The sliced/split columns have an offset > 0 and their data buffers are different (this difference is more exaggerated in columns that contain offsets such as strings and lists). expected0 and expected1 do not have an offset and their data is different. Equivalent, absolutely, but not the same. |
This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. |
I'm going to mark this as closed as the question of equal vs equivalent has been answered. |
I seek guidance on which test scenarios we should be using
cudf::test:expect_columns_equal()
vscudf::test::expect_columns_equivalent()
. I ran into this while writingstructs_column_test
, as part of #5807. I was advised to put up an issue for discussion here.As discussed in #5700, when constructing a
structs
column, the parent column's null-mask will beAND
ed with the children's, but the data streams are not modified.The values in the resulting child column may be checked for correctness with an
expected_column
(via the appropriate column wrapper).For primitive column types, both equivalence and equality checks succeed.
For list columns, they fail. :/
It turns out that
cudf::test::column_comparator_impl<list_view>
compareslist
columns by recursively comparing its data/offset contents. Constructing a list containing nulls vialist_column_wrapper
does not produce the same contents as a column whose null-mask has been twiddled.Should one modify
column_comparator_impl<list_view>
to check for equivalence differently from equality?Also, how would the caller of
expect_columns_equal()
discern that they should instead be callingexpect_columns_equivalent()
?I think @nvdbaranec ran into a similar (or "equivalent" :]) issue, when testing
list<string>
.The text was updated successfully, but these errors were encountered: