-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement right semi join and support in HashBuildProbeorder #3958
Conversation
| JoinType::Left | ||
| JoinType::Right | ||
| JoinType::Full | ||
| JoinType::Semi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to change JoinType::Semi
to JoinType::LeftSemi
explictly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might be better for clarity, but also a breaking change.
I think we also should change Anti to LeftAnti if we want to do this.
FYI @alamb @andygrove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add the new variants and mark the old ones as deprecated?
let task_ctx = session_ctx.task_ctx(); | ||
let left = build_table( | ||
("a2", &vec![10, 20, 30, 40]), | ||
("b1", &vec![4, 5, 6, 5]), // 5 is double on the left |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
recommend calling this b2
not b1
for clarity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for writing up #3945 -- I was very confused about the usecase for a "right semijoin" until I read that.
I learned something new today!
I like @xudong963 's suggestion:
we It's better to change JoinType::Semi to JoinType::LeftSemi explictly?
I think we also should change Anti to LeftAnti if we want to do this.
This makes sense to me -- I think the more explicit the join types the better 👍
I didn't quite understand the tests -- maybe I am confused.
Ok - I went ahead and renamed everything to LeftSemi and LeftAnti. I don't think it's worth the complexity to keep Semi and Anti around as depricated (as this would cause extra cases everywhere and making this error prone). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM -- thanks @Dandandan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parquet-testing
was changed, is it expected?
Maybe the change will break some tests.
ab9c87d
to
ee86743
Compare
My mistake! Thanks, fixed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, thank you @Dandandan
Lol, btw I just finished all kinds of right-related join in databend community a few days ago to make join reorder work smoothly.
Awesome - would be nice to see if we can learn something from databend 👍 |
365b3bd
to
85201d8
Compare
@tustvold Is there anything I should do to update the generated pbjson and prost files? I have merged master to this version, but can't seem to generate the same code :/ |
@@ -43,7 +43,6 @@ impl<'de> serde::Deserialize<'de> for AggregateExprNode { | |||
D: serde::Deserializer<'de>, | |||
{ | |||
const FIELDS: &[&str] = &[ | |||
"aggr_function", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only get aggrFunction
@tustvold Updating the lock-file solved the issue. Might be something to think about if that happens too often (at least document that this could be a reason)? I saw the related update:
But I didn't yet realize this was a influxdata dependency ;) |
9877832
to
7963202
Compare
Filed - #3987 |
Benchmark runs are scheduled for baseline = e73a43c and contender = 002165b. 002165b is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
…3958) * Implement right semi join * Change error a bit * protobuf * protobuf * protobuf * Change column name to b2 * Rename everything * Rename & fmt * Change display to leftanti * Fix last expected plan * Commit generated file * generated
Which issue does this PR close?
Closes #3945
Rationale for this change
Possible perf / memory usage improvements when statistics are available (or when someone hard-codes the join type).
What changes are included in this PR?
Are there any user-facing changes?