-
Notifications
You must be signed in to change notification settings - Fork 917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Sort list columns #5890
Comments
cc @revans2 from the Spark side as well |
So even if we decide on what it means to sort a list of columns, implementing this in the context of a multi-column sort is going to be very difficult. What's more, I'm concerned adding the logic to the |
Spark has a similar comparison for list types. There may be some discrepancies when null entries within a list are encountered. Spark treats a null entry within a list as less than any other value in the other list except another null which is treated as equal.
Can you elaborate? I'm curious if this will be like NaN handling, where the performance hit was negligible unless NaNs appeared in the data. |
It will be similar to the AST interpretation Brad is doing where adding increasingly complex and nested switch statements eventually causes the compiler to balk and starts putting things on the stack frame or refusing to inline things, which significantly reduces performance. Even if list columns aren't present, the increased complexity of the comparator could very well reduce performance. |
This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. |
Still relevant. And I believe we will be starting to investigate in the somewhat-near future. |
Closed by #11129 |
It would be nice to be able to sort list columns. The use case I have encountered so far is in testing equality of two list columns with identical elements, but ordered differently, as occurs in a
groupby COLLECT
.What does it mean to sort a list column?
Python does a "lexical" comparison between lists:
Accepting the above, it becomes easy to reason about comparing arbitrarily nested lists:
I wonder if Spark users would have similar expectations for sorting list columns. cc: @nvdbaranec @jlowe @jrhemstad @kkraus14 @harrism
The text was updated successfully, but these errors were encountered: