-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Add support for shallow lists in cudf::merge #13514
Comments
This should be covered by #13347. I believe this is targeting 23.08. cc: @divyegala |
Slightly but may not be worth it. We can work around the issue by concatenating the tables and calling cudf::sorted_order as mentioned above which is slower but should hold us over until cudf::merge for lists is properly fixed. |
I noticed that the mishandling of non-key nested types in cudf::merge is already tracked by #8050. |
Thank you @jlowe @bdice for inputs on this issue. +1 on "the logic for raising an error is incorrect (too aggressive, fails even on non-ordering columns which seems unnecessary)". If possible, we hope users do not experience unexpected job failure when moving Spark jobs from CPU to GPU. It may be helpful to give them a clear message to check on unsupported operations or be aware of any fallback. We don't seem to find more information for this one besides the exception (and stack trace):
FYI: There are messages on other unsupported operations which fall back as expected. We will help to share more on them. |
@bdice @divyegala are #13514 and #8050 closed by #14250? |
@GregoryKimball thanks. Yes, they should be |
Closing as resolved by #14250. |
Is your feature request related to a problem? Please describe.
Spark supports sorting on BinaryType which is represented as a libcudf LIST column which a child column of non-nullable UINT8. The RAPIDS Accelerator leverages cudf::merge to perform out-of-core sort algorithms. cudf::sorted_order supports sorting on the LIST column directly, but cudf::merge does not, failing with the error:
Describe the solution you'd like
cudf::merge should support the same ordering types as supported by cudf::sorted_order
Describe alternatives you've considered
Applications would need to concatenate the tables to be merged together into one big table and then call cudf::sorted_order which is suboptimal to being able to perform the merged sort directly via cudf::merge.
The text was updated successfully, but these errors were encountered: