Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: join swap for projected semi/anti joins #13022

Merged
merged 1 commit into from
Oct 21, 2024

Conversation

korowa
Copy link
Contributor

@korowa korowa commented Oct 20, 2024

Which issue does this PR close?

Closes #10978.

Rationale for this change

On join swap embedded projection for anti/semi joins is getting modified just like for any other join types. This leads to resulting embedded projection, containing column indices out of bounds of join output schema.

What changes are included in this PR?

The fix is not to modify output projection for semi/anti joins -- they already have one-side schema as an output, and trying to add/subtract the number of fields from the other input side to column indices will break the projection.

Are these changes tested?

Existing test for swapping join with embedded projection is extended to test all join types.

Are there any user-facing changes?

Swapping inputs for semi/anti joins with embedded projection should start working.

@github-actions github-actions bot added the core Core DataFusion crate label Oct 20, 2024
@korowa
Copy link
Contributor Author

korowa commented Oct 20, 2024

cc @berkaysynnada @my-vegetable-has-exploded

Perhaps you could review this PR, due to still being in context after #12967?

Copy link
Contributor

@Dandandan Dandandan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me & looks good. It would be good to get another review

case::left(JoinType::Left, vec![1], true),
case::right(JoinType::Right, vec![1], true),
case::full(JoinType::Full, vec![1], true),
case::left_anti(JoinType::LeftAnti, vec![0], false),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will be better to add a swap test for LeftAnti & LeftSemi?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean a separate one? Initially I was going to do so, but realized that that it'll result in 3 almost equivalent tests ("normal" joins / left semi-anti / right semi-anti) except for minor differences, so decided that it could be better to parameterize test code to highlight these differences and reduce code duplication.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since small_on_right is false for LeftAnti & LeftSemi, the swap rule don't work for those tests. But it doesn't matters.

Copy link
Contributor

@berkaysynnada berkaysynnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you @korowa, nice catch

I see you've already did what #13022 (comment) mentions, but perhaps I couldn't understand the point, so not merging this yet

})
.collect()
})
match join_type {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks @korowa

@comphead comphead merged commit 701cb00 into apache:main Oct 21, 2024
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inconsistent behavior in HashJoin Projections
5 participants