Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move filtered SMJ Full filtered join out of join_partial phase #13369

Merged
merged 3 commits into from
Nov 13, 2024

Conversation

comphead
Copy link
Contributor

@comphead comphead commented Nov 12, 2024

Which issue does this PR close?

Closes #12359
Closes #10659

Rationale for this change

Move the Full Outer filtered SMJ join out of join_partial phase to evaluate filter expressions properly and keep track of previous related expressions already evaluated for the same row

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added physical-expr Physical Expressions core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Nov 12, 2024
async fn test_full_join_1k_filtered() {
JoinFuzzTestCase::new(
make_staggered_batches(1000),
make_staggered_batches(1000),
JoinType::Full,
Some(Box::new(col_lt_col_filter)),
)
.run_test(&[JoinTestType::NljHj], false)
.run_test(&[NljHj, HjSmj], false)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

works now, adding HJ vs SMJ test back

#Alice 100 Alice 2
#Alice 50 NULL NULL
#Bob 1 NULL NULL
query TITI rowsort
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

works now

let mut first_row_idx = 0;
let mut seen_false = false;

for i in 0..row_indices_length {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the mask processing is more complex compared to other join types, I'm planning to add more tests and documentation preferably in follow up PR, but if its a blocker for the review I'll update this PR

@comphead comphead requested review from alamb and korowa November 12, 2024 01:20
@comphead
Copy link
Contributor Author

During this work there are some opportunities found to clean up/document/improve testing for existing SMJ code. Planning to file a separate PR for it

@comphead
Copy link
Contributor Author

@andygrove cc

@@ -852,6 +852,54 @@ fn get_corrected_filter_mask(
corrected_mask.extend(vec![Some(true); null_matched]);
Some(corrected_mask.finish())
}
JoinType::Full => {
let mut mask: Vec<Option<bool>> = vec![Some(true); row_indices_length];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use booleanbuilder?

Copy link
Contributor Author

@comphead comphead Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unlike to other join types for this one its needed to update current array, the builder does append only afaik

Copy link
Contributor

@Dandandan Dandandan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@Dandandan
Copy link
Contributor

Nice work @comphead

@alamb
Copy link
Contributor

alamb commented Nov 13, 2024

Awesome -- thank you @comphead -- the effort you are making to get Sort merge join into shape is very cool.

Thanks also to @Dandandan for the review

@alamb alamb merged commit fd092e0 into apache:main Nov 13, 2024
26 checks passed
alamb pushed a commit to alamb/datafusion that referenced this pull request Nov 13, 2024
…che#13369)

* Move filtered SMJ Full filtered join out of `join_partial` phase

* Move filtered SMJ Full filtered join out of `join_partial` phase

* Move filtered SMJ Full filtered join out of `join_partial` phase
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
3 participants