-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tests: enable fuzz for filtered anti-semi NLJoin #12360
Conversation
.await | ||
} | ||
|
||
fn less_than_100_join_filter(schema1: Arc<Schema>, _schema2: Arc<Schema>) -> JoinFilter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
during NLjoin construction, join filter was completely ignored, and all NLjoin tests were running with only equijoin conditions in filter.
Just to double check my understanding, your analysis is that this test is invalid because NestedLoopJoinExec
doesn't appy the less_than_100_join_filter
correctly?
I reread the docs for NestedLoopJoinExec
and it seems to imply the point is to run joins without a equality predicate 🤔
https://docs.rs/datafusion/latest/datafusion/physical_plan/joins/struct.NestedLoopJoinExec.html
So is the difference that less_than_100_join_filter
only referred to one join input (aka only refers to a
) where NLJ needs filters that refer to both sides ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems to imply the point is to run joins without a equality predicate
That's true, but in order to somehow perform fuzz testing against other join implementations, NLJoin runs (intended to run) in fuzz tests with the following filter <on / equijoin conditions> and <optional join filter>
Just to double check my understanding, your analysis is that this test is invalid because NestedLoopJoinExec doesn't appy the
less_than_100_join_filter
correctly?
The test is invalid because of how NLJoin was built in these tests -- if optional join filter
was passed as a test case argument, it was not included into NLJoin filter expression, and the test compared join on <equijoin conditions> and <filter>
for HashJoin vs join on <equijoin condition>
for NLJoin -- this problem is fixed by the first commit.
So is the difference that
less_than_100_join_filter
only referred to one join input (aka only refers to a) where NLJ needs filters that refer to both sides ?
less_than_100_join_filter
is removed because it's expression (t.a < 100
) doesn't make much sense, since a
produced by make_staggered_batches
is always less than 100 -- that's why only semi/anti_filtered joins were failing (they used different filter expression -- col_lt_col_filter
, which is not always true), and all other filtered test cases (inner/left/right/full) were passing even with missing filter for NLJoin (from p.1).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I think I introduced the col_lt_col_filter
when testing flaky semi join.
We probably should have used this filter for other filtered use cases.
But if the tests becoming flaky, it would be dangerous to merge the PR without ignoring tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the responses @korowa -- this PR makes sense to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To find if test is flaky I followed the approach of running the test 1000 times in the loop. Like for AntiJoin
async fn test_anti_join_1k_filtered() {
// NLJ vs HJ gives wrong result
// Tracked in https://github.com/apache/datafusion/issues/11537
for i in 0..1000 {
println!("Iteration {i}");
JoinFuzzTestCase::new(
make_staggered_batches(1000),
make_staggered_batches(1000),
JoinType::LeftAnti,
Some(Box::new(col_lt_col_filter)),
)
.run_test(&[JoinTestType::HjSmj], false)
.await;
}
}
But again if the test becomes flaky with col_lt_col_filter
we should probably ignore the test to avoid random broken CI
I've run all them in a loop with hundred iterations, and all currently present tests are passing. The only failed tests after filter change were SMJ related -- they are disabled in f9d2a90 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm thanks @korowa
Which issue does this PR close?
Closes #11537.
Rationale for this change
It turned out to be not a NLJoin issue, but fuzz tests -- during NLjoin construction, join filter was completely ignored, and all NLjoin tests were running with only equijoin conditions in filter.
In the same time filtered inner/left/right/full join fuzz tests were passing even without the filter -- the reason is that they've been using
less_than_100_join_filter
, which always returns True, as generation of input data for column, used in this filter, always returns values less than 100 --make_staggered_batches
comment:so it seems to be more useful to place
col_lt_col_filter
in all tests, since it has non 100% selectivity.Finally, after replacing filter in all test cases, some of HjSmj tests turned out to be flaky, so they are temporarily disabled, until #12359 fix.
What changes are included in this PR?
_filtered
test cases now usecol_lt_col_filter
_filtered
testcases disabled for SortMergeJoin due to their flakinessAre these changes tested?
Yes
Are there any user-facing changes?
No, it's related exclusively to the tests