Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for UNION [DISTINCT] sql #1068

Merged
merged 1 commit into from
Oct 4, 2021

Conversation

xudong963
Copy link
Member

@xudong963 xudong963 commented Oct 3, 2021

Which issue does this PR close?

Closes #998

Earlier version #1029

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added datafusion Changes in the datafusion crate sql SQL Planner labels Oct 3, 2021
@xudong963
Copy link
Member Author

Use the way mentioned by @Dandandan @alamb, UNION is easy to implement! Thanks again, PTAL @Dandandan @alamb @houqp

@xudong963 xudong963 force-pushed the union_impl_by_select_distinct branch from 19bf016 to 6f75105 Compare October 3, 2021 15:51
_ => Err(DataFusionError::NotImplemented(format!(
"Only UNION ALL is supported, found {}",
"Only UNION ALL and UNION are supported, found {}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Only UNION ALL and UNION are supported, found {}",
"Only UNION ALL and UNION [DISTINCT] are supported, found {}",

@@ -3440,7 +3453,7 @@ mod tests {
let sql = "SELECT order_id from orders EXCEPT SELECT order_id FROM orders";
let err = logical_plan(sql).expect_err("query should have failed");
assert_eq!(
"NotImplemented(\"Only UNION ALL is supported, found EXCEPT\")",
"NotImplemented(\"Only UNION ALL and UNION are supported, found EXCEPT\")",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"NotImplemented(\"Only UNION ALL and UNION are supported, found EXCEPT\")",
"NotImplemented(\"Only UNION ALL and UNION [DISTINCT] are supported, found EXCEPT\")",

@Dandandan
Copy link
Contributor

This looks good @xudong963 thank you!

One suggestion I have is to also test the UNION DISTINCT syntax, to make sure it's supported and has the same result as just UNION.

datafusion/src/execution/dataframe_impl.rs Show resolved Hide resolved
}

#[tokio::test]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick, to keep the style consistent, we typically don't leave a newline between #[tokio::test] and the test implementation.

@houqp houqp added the enhancement New feature or request label Oct 3, 2021
@xudong963 xudong963 force-pushed the union_impl_by_select_distinct branch 2 times, most recently from ec10910 to 332e472 Compare October 4, 2021 10:19
@xudong963
Copy link
Member Author

xudong963 commented Oct 4, 2021

all comments have fixed @Dandandan @houqp cc @alamb

@alamb alamb changed the title Add support for UNION sql Add support for UNION [DISTINCT] sql Oct 4, 2021
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty awesome PR @xudong963 -- thank you! I really like how the support just sort of comes together from the existing parts and pieces

Comment on lines 294 to 295
/// apply union distinct
pub fn union_distinct(&self) -> Result<Self> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// apply union distinct
pub fn union_distinct(&self) -> Result<Self> {
/// Apply deduplication: Only distinct (different) values are returned)
pub fn distinct(&self) -> Result<Self> {

I suggest renaming this to just distinct as there is only one input, so I find the use of the term union somewhat confusing

let left_plan = self.set_expr_to_plan(left.as_ref(), None, ctes)?;
let right_plan = self.set_expr_to_plan(right.as_ref(), None, ctes)?;
let union_plan = union_with_alias(left_plan, right_plan, alias)?;
LogicalPlanBuilder::from(union_plan)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@@ -128,6 +128,8 @@ const QUERY1: &str = "SELECT * FROM sales limit 3";
const QUERY: &str =
"SELECT customer_id, revenue FROM sales ORDER BY revenue DESC limit 3";

const QUERY2: &str = "SELECT customer_id, revenue FROM sales UNION SELECT customer_id, revenue FROM sales ORDER BY revenue DESC limit 3";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please move this test into tests/sql.rs? I don't think UNION has anything to do with user defined plans.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think renaming the function on LogicalPlanBuilder and moving the test is all I think is necessary prior to merging this PR.

@xudong963 xudong963 force-pushed the union_impl_by_select_distinct branch from 332e472 to d6647b3 Compare October 4, 2021 11:29
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Oct 4, 2021
@xudong963 xudong963 force-pushed the union_impl_by_select_distinct branch from d6647b3 to 68277da Compare October 4, 2021 11:31
@xudong963
Copy link
Member Author

Thanks for your great comments which exactly improve my engineering ability! @alamb @houqp @Dandandan

all comments have fixed @alamb

@xudong963
Copy link
Member Author

I really like how the support just sort of comes together from the existing parts and pieces

Yes, me too. The process helps me know well existing parts!

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job @xudong963 !

@alamb alamb merged commit a8dedc8 into apache:master Oct 4, 2021
@houqp
Copy link
Member

houqp commented Oct 4, 2021

Really nice work @xudong963 !

@Dandandan
Copy link
Contributor

Thanks @xudong963 ! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate documentation Improvements or additions to documentation enhancement New feature or request sql SQL Planner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

implement Set Operations UNION
4 participants