Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DataFrame support for INTERSECT and update readme #1258

Merged
merged 1 commit into from
Nov 9, 2021

Conversation

xudong963
Copy link
Member

Which issue does this PR close?

Closes #1082

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added datafusion Changes in the datafusion crate documentation Improvements or additions to documentation labels Nov 6, 2021
@xudong963
Copy link
Member Author

xudong963 commented Nov 6, 2021

Make DataFrame users have the intersect feature and Update the README.
PTAL❤️ @alamb @houqp @Dandandan

@@ -231,6 +231,29 @@ impl DataFrame for DataFrameImpl {
.build()?,
)))
}

fn intersect(&self, dataframe: Arc<dyn DataFrame>) -> Result<Arc<dyn DataFrame>> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like there is some code duplication between this and the one in sql planner, perhaps better to move this into logical plan builder so it can be reused in both places.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be valuable

@@ -375,4 +375,19 @@ pub trait DataFrame: Send + Sync {
/// # }
/// ```
fn registry(&self) -> Arc<dyn FunctionRegistry>;

/// Calculate the intersect two [`DataFrame`]s. The two [`DataFrame`]s must have exactly the same schema
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Calculate the intersect two [`DataFrame`]s. The two [`DataFrame`]s must have exactly the same schema
/// Calculate the intersection of two [`DataFrame`]s. The two [`DataFrame`]s must have exactly the same schema

Comment on lines 252 to 254
&LogicalPlanBuilder::from(left_plan)
.join_detailed(&right_plan, JoinType::Semi, join_keys, true)?
.build()?,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think once we merge #1259 this could just call LogicalPlanBuilder::intersect

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, once #1259 is merged, I'll quickly fix the ticket and #1261. 😊

@xudong963
Copy link
Member Author

The ticket has been done!

@houqp houqp added the enhancement New feature or request label Nov 9, 2021
@alamb alamb changed the title DataFrame supports intersect and update readme Add DataFrame support for INTERSECT and update readme Nov 9, 2021
@alamb
Copy link
Contributor

alamb commented Nov 9, 2021

Looks great -- thank you @xudong963

@alamb alamb merged commit d28a2ac into apache:master Nov 9, 2021
@xudong963 xudong963 deleted the intersect_df branch November 9, 2021 11:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement the rest of Set Operators: INTERSECT, EXCEPT, etc
3 participants