-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Add equi-join support for structs #7543
Comments
We should clarify what this means. "Arbitrary depths" should not mean a struct that contains arbitrarily nested list type(s). But multiple levels of Arbitrary levels of structs can be handled trivially in the same way as #7422 by exploding the struct parent columns' null masks into |
Do you even need to expand the null masks into seperate columns? If a struct has a null mask, all of it's children will also have a null mask with everything pushed down. Seems like all you need to do is just flatten all the columns into one big table_view and use that for comparison. |
I had the same idea in sorting, but Bobby reminded me why that won't work. The key is that you need to be able to differentiate Though actually... I wonder if we actually have the same problem in join, especially if nulls are considered unequal. |
Adds support for equijoin on structs. This PR is leveraging the [struct PR](#7422) and the [rewrite for join API](#7454). It enables equijoin on structs by flattening the struct for the hash calculation. closes #7543 Authors: - Mike Wilson (https://github.com/hyperbolic2346) - Ashwin Srinath (https://github.com/shwina) - Karthikeyan (https://github.com/karthikeyann) - Vukasin Milovanovic (https://github.com/vuule) - Alessandro Bellina (https://github.com/abellina) - Devavret Makkar (https://github.com/devavret) - David Wendt (https://github.com/davidwendt) - Liangcai Li (https://github.com/firestarman) - Paul Taylor (https://github.com/trxcllnt) - Kumar Aatish (https://github.com/kaatish) - Jason Lowe (https://github.com/jlowe) - Dillon Cullinan (https://github.com/dillon-cullinan) - Raza Jafri (https://github.com/razajafri) - https://github.com/rwlee - Michael Wang (https://github.com/isVoid) - Dante Gama Dessavre (https://github.com/dantegd) - Keith Kraus (https://github.com/kkraus14) - Robert Maynard (https://github.com/robertmaynard) - GALI PREM SAGAR (https://github.com/galipremsagar) - https://github.com/ChrisJar - AJ Schmidt (https://github.com/ajschmidt8) - https://github.com/nvdbaranec - Nghia Truong (https://github.com/ttnghia) - https://github.com/chenrui17 - Conor Hoekstra (https://github.com/codereport) - Mike Wendt (https://github.com/mike-wendt) Approvers: - Nghia Truong (https://github.com/ttnghia) - Devavret Makkar (https://github.com/devavret) - Jake Hemstad (https://github.com/jrhemstad) URL: #7720
There is no current way to run a join using a struct column. It would be nice to support this in cudf. This will probably leverage some of the work from #7422.
Describe the solution you'd like
It would certainly be nice to support arbitrary depths, but based on #7422 it might require a first-pass at a single depth and a later PR to add support for more depths.
Of particular concern is handling of NaN, Inf, -Inf, etc. These should be handled exactly the same as if the comparison was made between columns of the types inside the structs, so this should be fairly straight forward. The other concern is null handling. Join has the option of nulls comparing equal or not based on incoming parameters. There is some potential for issues around nulls here and care needs to be taken with the implementation.
A null struct would not be the same as a struct with all null entries.
The text was updated successfully, but these errors were encountered: