Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Define and implement non equi join #3628

Closed

Conversation

ChuckHastings
Copy link
Contributor

Closes #2792

This should be merged after #3224 and #3538

At this point the PR only contains the API definition. I included the entire join.hpp file from PR 3538 for now. Once the dependent PRs are merged I'll merge any final updates.

@ChuckHastings ChuckHastings requested a review from a team as a code owner December 17, 2019 17:00
@ChuckHastings ChuckHastings self-assigned this Dec 17, 2019
@jrhemstad
Copy link
Contributor

#2792 is a pretty significant undertaking. I think we should spend some time talking about how it's going to be implemented before we attempt to define an API.

@codecov
Copy link

codecov bot commented Dec 17, 2019

Codecov Report

Merging #3628 (541b956) into branch-0.14 (6a26d65) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff              @@
##           branch-0.14    #3628   +/-   ##
============================================
  Coverage        87.39%   87.39%           
============================================
  Files               49       49           
  Lines             9380     9380           
============================================
  Hits              8198     8198           
  Misses            1182     1182           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6a26d65...541b956. Read the comment docs.

cudf::size_type left_column_idx; ///< index of left column to compare
cudf::size_type right_column_idx; ///< index of right column to compare
bool only_left_in_output; ///< if true, only output the left column
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style comment : I really like this style (justification of member names). I wish this was more common :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do specify this in our style guide for enums, but we should add it for struct members (work in progress): https://docs.google.com/document/d/10Ff4VZlSsizHdF5r9Zrw7dIwlPD0yWWSmNJ6qter7r8/edit#heading=h.u9cu9e14hobq

// the existing API for equi join and the new internal mechanism for non-equi join? Perhaps
// I implement this public API toward that purpose and we can decide whether to deprecate the
// old interface?
/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these functions are supersets of inner/outer join, I would think this would replace their implementations entirely. So their interfaces in this header might still exist, but all they'd be doing is calling into this function with an EQUAL operation. Caveat : I am in no way an expert on database stuff so this comment may be moot :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll have to see how the implementations unfold. I was actually imagining the opposite... that perhaps the interfaces would combine but if the ONLY comparison operator is EQUAL then we would call the existing implementations. It is possible when all is said and done that this implementation would be comparable in performance to the original, but I'm not convinced at this point.

* join_ops: { {GREATER_THAN, 0, 0, false} }
* Result: { a: {2}, b: {1}, c: {5} }
*
* @throws cudf::logic_error if either table is empty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like there are some other possibilities here: Out of bounds indices in columns_in_common, out of bounds indices in the join_ops structs.

@harrism harrism added 2 - In Progress Currently a work in progress 4 - Needs Review Waiting for reviewer to review or respond libcudf Affects libcudf (C++/CUDA) code. labels Feb 9, 2020
@harrism harrism changed the title [WIP] Define and implement non equi join [WIP] Define and implement non equi join [skip ci] Mar 12, 2020
@harrism harrism changed the base branch from branch-0.12 to branch-0.14 March 12, 2020 00:52
@harrism harrism changed the title [WIP] Define and implement non equi join [skip ci] [WIP] Define and implement non equi join Mar 12, 2020
@GPUtester
Copy link
Collaborator

Can one of the admins verify this patch?

@github-actions
Copy link

This PR has been marked rotten due to no recent activity in the past 90d. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates.

@harrism
Copy link
Member

harrism commented Feb 17, 2021

This is really stale. Closing, but we can reopen or cherry pick from it if needed.

@harrism harrism closed this Feb 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 - In Progress Currently a work in progress 4 - Needs Review Waiting for reviewer to review or respond libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Inequality Joins
5 participants