You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The current implementation of take_run only handles PrimitiveArray. Also, it's slow as it compares the values. Extending the current approach to String and Binary values will make the solution much slower.
Describe the solution you'd like
Instead of run encoding taken values, we can run encode taken physical indices. This will be significantly faster for String and Binary values as we will avoid comparing values. The drawback of this approach is that in certain scenarios the output might not be efficiently run encoded. For e.g. given a RunArray { run_ends=[2,4,6,8], values=[1,2,1,2] } and take indices [2,3,6,7], the output will be RunArray { run_ends=[2,4], values=[2,2] } rather than RunArray { run_ends=[4], values=[2] }
Describe alternatives you've considered
We continue with the current approach of comparing values which, in creation scenarios, will result in efficient run encoded array at the cost of performance.
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The current implementation of
take_run
only handlesPrimitiveArray
. Also, it's slow as it compares the values. Extending the current approach to String and Binary values will make the solution much slower.Describe the solution you'd like
Instead of run encoding taken values, we can run encode taken physical indices. This will be significantly faster for String and Binary values as we will avoid comparing values. The drawback of this approach is that in certain scenarios the output might not be efficiently run encoded. For e.g. given a
RunArray { run_ends=[2,4,6,8], values=[1,2,1,2] }
and take indices[2,3,6,7]
, the output will beRunArray { run_ends=[2,4], values=[2,2] }
rather thanRunArray { run_ends=[4], values=[2] }
Describe alternatives you've considered
We continue with the current approach of comparing values which, in creation scenarios, will result in efficient run encoded array at the cost of performance.
Additional context
#3622 (comment)
The text was updated successfully, but these errors were encountered: