Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support RightAnti for SortMergeJoin #13680

Merged
merged 6 commits into from
Jan 5, 2025
Merged

Conversation

irenjj
Copy link
Contributor

@irenjj irenjj commented Dec 7, 2024

Which issue does this PR close?

Closes #13472

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the physical-expr Physical Expressions label Dec 7, 2024
@github-actions github-actions bot added the core Core DataFusion crate label Dec 9, 2024
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Dec 11, 2024
@irenjj irenjj marked this pull request as ready for review December 13, 2024 15:20
@Omega359
Copy link
Contributor

@comphead - are you able to review this?

@comphead
Copy link
Contributor

Thanks @Omega359 and @irenjj somehow I missed that, I'll check it out

select 41 a, 42 b union all
select 41 a, 45 b
)
select t2.* from t1 right anti join t2 on t1.a = t2.a and t1.b = t2.b
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting thing: right ANTI/SEMI naturally not supported by SQL text, however DF supports it, DuckDB/PG will fail on query like that because of parse error.

DF runs it and picks the join type properly

| plan_type     | plan                                                                                        |
+---------------+---------------------------------------------------------------------------------------------+
| logical_plan  | Sort: t2.a ASC NULLS LAST, t2.b ASC NULLS LAST                                              |
|               |   RightAnti Join: t1.a = t2.a, t1.b = t2.b                                                  |

From one side it is really convenient to test out queries with RIGHT ANTI/SEMI using SQL, from another it is not very expected behavior comparing to other engines.

WDYT should we do something with that? @alamb @Dandandan @ozankabak @jayzhan-synnada @findepi

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DuckDB/PG failed because of not supported, in this case, I think it is fine that we have different result because we support the query.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to support the syntax, although it's not common for other engines to support it.

@@ -2910,6 +2984,310 @@ mod tests {
Ok(())
}

#[tokio::test]
async fn join_right_anti_one() -> Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what one means here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The meaning is "a column." This naming has already been updated.

@@ -1920,6 +1939,10 @@ impl SortMergeJoinStream {
let output_column_indices = (0..left_columns_length).collect::<Vec<_>>();
filtered_record_batch =
filtered_record_batch.project(&output_column_indices)?;
} else if matches!(self.join_type, JoinType::RightAnti) {
let output_column_indices = (0..right_columns_length).collect::<Vec<_>>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can factor out this part into separate method

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @irenjj I think it is very very close

@irenjj
Copy link
Contributor Author

irenjj commented Dec 30, 2024

Thanks @comphead . I've made the changes based on your suggestions. Could you please take another look?

@@ -2910,6 +2992,310 @@ mod tests {
Ok(())
}

#[tokio::test]
async fn join_right_anti_one_one() -> Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @irenjj sorry I still not very sure what is one_one?
is it join by one column?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is.

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks @irenjj

@comphead comphead merged commit 69687da into apache:main Jan 5, 2025
26 checks passed
@irenjj irenjj deleted the smj_right_anti branch January 6, 2025 11:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SortMergeJoin: Add RightAnti join support
5 participants