tests: enable fuzz for filtered anti-semi NLJoin #12360

korowa · 2024-09-06T18:00:10Z

Which issue does this PR close?

Rationale for this change

It turned out to be not a NLJoin issue, but fuzz tests -- during NLjoin construction, join filter was completely ignored, and all NLjoin tests were running with only equijoin conditions in filter.

In the same time filtered inner/left/right/full join fuzz tests were passing even without the filter -- the reason is that they've been using less_than_100_join_filter, which always returns True, as generation of input data for column, used in this filter, always returns values less than 100 -- make_staggered_batches comment:

/// two sorted int32 columns 'a', 'b' ranged from 0..99 as join columns

so it seems to be more useful to place col_lt_col_filter in all tests, since it has non 100% selectivity.

Finally, after replacing filter in all test cases, some of HjSmj tests turned out to be flaky, so they are temporarily disabled, until #12359 fix.

What changes are included in this PR?

NLJoin construction in fuzz tests now takes into account join_filter, and constructs composite filter as a union of join_filter and equijoin conditions, all fuzz tests are enabled for NljHj case
all _filtered test cases now use col_lt_col_filter
some _filtered testcases disabled for SortMergeJoin due to their flakiness

Are these changes tested?

Yes

Are there any user-facing changes?

No, it's related exclusively to the tests

alamb · 2024-09-07T10:34:43Z

datafusion/core/tests/fuzz_cases/join_fuzz.rs

-    .await
-}
-
-fn less_than_100_join_filter(schema1: Arc<Schema>, _schema2: Arc<Schema>) -> JoinFilter {


during NLjoin construction, join filter was completely ignored, and all NLjoin tests were running with only equijoin conditions in filter.

Just to double check my understanding, your analysis is that this test is invalid because NestedLoopJoinExec doesn't appy the less_than_100_join_filter correctly?

I reread the docs for NestedLoopJoinExec and it seems to imply the point is to run joins without a equality predicate 🤔

https://docs.rs/datafusion/latest/datafusion/physical_plan/joins/struct.NestedLoopJoinExec.html

So is the difference that less_than_100_join_filter only referred to one join input (aka only refers to a) where NLJ needs filters that refer to both sides ?

it seems to imply the point is to run joins without a equality predicate

That's true, but in order to somehow perform fuzz testing against other join implementations, NLJoin runs (intended to run) in fuzz tests with the following filter <on / equijoin conditions> and <optional join filter>

Just to double check my understanding, your analysis is that this test is invalid because NestedLoopJoinExec doesn't appy the less_than_100_join_filter correctly?

The test is invalid because of how NLJoin was built in these tests -- if optional join filter was passed as a test case argument, it was not included into NLJoin filter expression, and the test compared join on <equijoin conditions> and <filter> for HashJoin vs join on <equijoin condition> for NLJoin -- this problem is fixed by the first commit.

So is the difference that less_than_100_join_filter only referred to one join input (aka only refers to a) where NLJ needs filters that refer to both sides ?

less_than_100_join_filter is removed because it's expression (t.a < 100) doesn't make much sense, since a produced by make_staggered_batches is always less than 100 -- that's why only semi/anti_filtered joins were failing (they used different filter expression -- col_lt_col_filter, which is not always true), and all other filtered test cases (inner/left/right/full) were passing even with missing filter for NLJoin (from p.1).

yeah, I think I introduced the col_lt_col_filter when testing flaky semi join.
We probably should have used this filter for other filtered use cases.

But if the tests becoming flaky, it would be dangerous to merge the PR without ignoring tests

alamb

Thank you for the responses @korowa -- this PR makes sense to me

comphead

To find if test is flaky I followed the approach of running the test 1000 times in the loop. Like for AntiJoin

async fn test_anti_join_1k_filtered() {
    // NLJ vs HJ gives wrong result
    // Tracked in https://github.com/apache/datafusion/issues/11537
   for i in 0..1000 {
        println!("Iteration {i}");
        JoinFuzzTestCase::new(
            make_staggered_batches(1000),
            make_staggered_batches(1000),
            JoinType::LeftAnti,
            Some(Box::new(col_lt_col_filter)),
        )
        .run_test(&[JoinTestType::HjSmj], false)
        .await;
    }
}

But again if the test becomes flaky with col_lt_col_filter we should probably ignore the test to avoid random broken CI

korowa · 2024-09-09T05:08:28Z

To find if test is flaky I followed the approach of running the test 1000 times in the loop. Like for AntiJoin

I've run all them in a loop with hundred iterations, and all currently present tests are passing. The only failed tests after filter change were SMJ related -- they are disabled in f9d2a90

comphead

lgtm thanks @korowa

korowa added 3 commits September 6, 2024 20:29

tests: enable fuzz for filtered anti-semi NLJoin

74600f7

tests: update filters in join fuzz tests

bdf7bb1

tests: disable flaky tests for SortMergeJoin

f9d2a90

github-actions bot added the core Core DataFusion crate label Sep 6, 2024

alamb reviewed Sep 7, 2024

View reviewed changes

alamb approved these changes Sep 8, 2024

View reviewed changes

comphead reviewed Sep 8, 2024

View reviewed changes

comphead approved these changes Sep 9, 2024

View reviewed changes

comphead merged commit 4569cbb into apache:main Sep 9, 2024
24 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tests: enable fuzz for filtered anti-semi NLJoin #12360

tests: enable fuzz for filtered anti-semi NLJoin #12360

korowa commented Sep 6, 2024

alamb Sep 7, 2024

korowa Sep 8, 2024

comphead Sep 8, 2024

alamb left a comment

comphead left a comment

korowa commented Sep 9, 2024 •

edited

Loading

comphead left a comment

tests: enable fuzz for filtered anti-semi NLJoin #12360

tests: enable fuzz for filtered anti-semi NLJoin #12360

Conversation

korowa commented Sep 6, 2024

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

alamb Sep 7, 2024

Choose a reason for hiding this comment

korowa Sep 8, 2024

Choose a reason for hiding this comment

comphead Sep 8, 2024

Choose a reason for hiding this comment

alamb left a comment

Choose a reason for hiding this comment

comphead left a comment

Choose a reason for hiding this comment

korowa commented Sep 9, 2024 • edited Loading

comphead left a comment

Choose a reason for hiding this comment

korowa commented Sep 9, 2024 •

edited

Loading