-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of unnest even more #6961
Comments
One interesting special case is that if the list array does not have any nulls at all, then |
Is this a good first issue? Would it be possible to take it? |
Hi @smiklos -- thanks! I think this would be a reasonable first issue if you are willing to learn more about how arrow ListArrays work. THe current code is well documented and tested so I think it would be good. This work would effectively be to change the calculation of the take offsets |
FWIW I was making some ascii art that I wanted to share on this ticket: Given this setup:
We could compute the output of unnest by computing offsets
And then calling take on the
|
This make sense but note that FixedSizeListArray works differently. Also, we'll need different take indices for the column getting flattened and the rest that gets expanded. |
Basically the
Unest
exec plan could be made faster if we reduced some copies. Here is the basic idea in case anyone wants to do thatThis looks very much the same to me as calling
list_array.values()
to get access to the underlying values: https://docs.rs/arrow/latest/arrow/array/struct.GenericListArray.html#method.valuesIn this case the values array would be more like
And the offsets of the list array would be would be like (I think):
With a null mask showing the second and fourth element are null
So I was thinking you could calculate the take indices directly from the offsets / nulls without having to copy all the values out of the underlying array
Originally posted by @alamb in #6903 (comment)
The text was updated successfully, but these errors were encountered: