-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Legacy join APIs return extraneous columns #7762
Comments
@shwina did you see this? |
This was by design, although I can concede a lazy one on my part. @jrhemstad wanted the "common columns" parameter from the legacy APIs gone and I didn't replace it with reasonable default behaviour One solution is to get rid of these APIs entirely. Python (and I believe Spark) don't use them anymore. If we still want them for the convenience of libcudf users, we could make it such that only key columns from the left are preserved for inner and left joins. |
Sorry for closing! Wrong button |
This issue has been labeled |
Still relevant. The API should either be fixed or removed. |
This issue has been labeled |
I think this is still relevant, but low priority.
I think this is the most appropriate solution, since I cannot see a valid reason callers would want or expect duplicated key columns in the join output. |
Resolves #7762 Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Nghia Truong (https://github.com/ttnghia) - Mark Harris (https://github.com/harrism) - Mike Wilson (https://github.com/hyperbolic2346) - Jason Lowe (https://github.com/jlowe) URL: #11274
Describe the bug
While updating the Java bindings to the new join gather map APIs added in #7454, I noticed that the old join APIs now return more columns than they did before. They seem to be returning a table that contains the key columns from both the left and right tables, whereas before they returned only the key columns as they appeared in the left table (updated for the join result).
For example take two tables, the left having columns A,B and the right having columns C,D and we want to perform a join on A == C. Previously the join APIs would return a table with 3 columns, A, B, and D, but now they return 4 columns.
Steps/Code to reproduce bug
Call the
left_join
,inner_join
, andfull_join
APIs before and after #7454.Expected behavior
Redundant key columns should not be manifested. Manifesting the right key columns on an inner or left join seems to serve no purpose, as the left key columns already contain the proper answer for the join.
The text was updated successfully, but these errors were encountered: