-
Notifications
You must be signed in to change notification settings - Fork 842
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support unary_dyn_mut
in arth
#3708
Comments
Could you provide a link to the benchmark you were running, when this was previously proposed the numbers were not nearly so flattering - #3134 (comment) I'm mainly apprehensive as we already have issues with the amount of codegen for the arithmetic kernels 😅 |
Cannot it just use |
There is already a arrow-rs/arrow-arith/src/arity.rs Line 59 in d3fe797
PrimitiveArray as input
|
I used pub(crate) fn add_scalar_cow<T>(
array: PrimitiveArray<T>,
scalar: T::Native,
) -> Result<PrimitiveArray<T>, ArrowError>
where
T: ArrowNumericType,
T::Native: ArrowNativeTypeOp,
{
match unary_mut(array, |value| value.add_wrapping(scalar)) {
Ok(array) => Ok(array),
// Fall back need copy array
Err(array) => add_scalar(&array, scalar),
}
} But there a lot code change during that time, like support dictionaryArray.... So i try to modify less api in datafusion, support |
@tustvold The benchMark just add I lost this branch in my local 😭 |
I think it would help move this forward if there we could get a standalone benchmark in arrow-rs, the level of performance improvement makes me sceptical that there isn't something else going on - e.g checked to unchecked overflow or something |
yeap, i prefer reimplement it in datafusion find the cause (I didn't notice before already bench in arrow-rs). By the way I think without performance gain, still need this api for user 😆 |
What makes you say this, the memory savings are likely irrelevant, so I'm not sure what the advantage would be if not performance? |
I mean even if without huge performance gain , it will still avoid re-allocate memory by manual. |
@tustvold But I did a poc test in datafusion apache/datafusion#5285
|
One thing need mention, i do profile on https://github.com/apache/arrow-rs/pull/3709/files#diff-d31b0761ffe79b72672cd516aa8ef0792c004328f31994dc616e677e8eb3c50cR31-R59. I think they are do the same amount work:
@tustvold I am not an expert about this 😭 , could your share your opinion 🆘 |
And test with datafusion-cli
Seems not error during add |
A difference with #3134 (comment) is that a batch size of 512 (in I would expect a larger difference to arise for simple arithmetic operations (like adding) when choosing a small batch size as the operation itself could be fast compared to the allocation. Also in the posted benchmark in the comments a allocation is done in the loop, whereas the query from @Ted-Jiang does a scalar addition. |
@Dandandan Thanks for your info ❤️ I try to update https://github.com/apache/arrow-rs/pull/3709/files#diff-d31b0761ffe79b72672cd516aa8ef0792c004328f31994dc616e677e8eb3c50cR31-R59 to
Got
Still little improvement found 😢 . Seems there lost something in my poc in datafusion. |
Closing this one for now, feel free to reopen if you wish to pursue this further |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
apache/datafusion@48732b4 all math compute kernel like add_dyn_scalar, multi_dyn_scalar...
I try to use use copy one write (mock func add_scalar_mut use
arrow-rs/arrow-arith/src/arity.rs
Line 59 in 26ea71c
run
select t.a + 1 from t
got
Describe the solution you'd like
Add
unary_dyn_mut
which take ownership of array then support cow math operator.Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered: