-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Support arrays_zip #5229
Comments
arrays_zip will make an array of structs as output. If all of the arrays had the same offsets column it would just involve reordering the columns so that the data columns are under a struct column which is under the array column with corresponding offsets, but because of nulls and different length arrays it does not work out of the box. We probably can make this work by finding the maximum length of arrays in each row and then creating a segmented gather list to insert the nulls where needed, then gather the child arrays and do the manipulation. We probably don't need cudf for this, but until we really start to write it and see all of the corner cases I don't know for sure. |
Current PR is to enable cuDF API `segmented_gather` in Java package. `segmented_gather` is essential to implement spark array functions like `arrays_zip`(NVIDIA/spark-rapids#5229). Authors: - Alfred Xu (https://github.com/sperlingxx) Approvers: - Robert (Bobby) Evans (https://github.com/revans2) URL: #10669
Add generateListOffsets API, converting list lengths to list offsets, which is useful in the development of spark-rapids. For example, the support of [array_repeat](NVIDIA/spark-rapids#5226) and [arrays_zip](NVIDIA/spark-rapids#5229) relies on this API. Authors: - Alfred Xu (https://github.com/sperlingxx) Approvers: - Nghia Truong (https://github.com/ttnghia) - Liangcai Li (https://github.com/firestarman) URL: #10683
I wish we can support arrays_zip.
Eg:
The text was updated successfully, but these errors were encountered: