Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EPIC] Add Decimal support #3523

Closed
6 of 8 tasks
liukun4515 opened this issue Sep 17, 2022 · 13 comments
Closed
6 of 8 tasks

[EPIC] Add Decimal support #3523

liukun4515 opened this issue Sep 17, 2022 · 13 comments
Labels
enhancement New feature or request

Comments

@liukun4515
Copy link
Contributor

liukun4515 commented Sep 17, 2022

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
(This section helps Arrow developers understand the context and why for this feature, in addition to the what)

I have implemented the feature about decimal in the datafusion. But didn't take care about some special case.
Specially when do the arithmetic operation:

  1. loss of the precision, In many case, the intermediate result will be overflow, So we use the float32 or float64 as the intermediate which will cause the the issue of loss of precision
  2. overflow/out of the range

Correctness Issues:

NULL cast/ NULL arithmetic/NULL comparator

String cast:

AGG decimal

User Experience:

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

@liukun4515 liukun4515 added the enhancement New feature or request label Sep 17, 2022
@liukun4515
Copy link
Contributor Author

cc @liukun4515 @kmitchener

@kmitchener
Copy link
Contributor

Thanks @liukun4515. It's all the issues listed in #3480 as well:

Per other discussions, I'd like to help here. I think the general plan should be something like:

  • stabilize and add tests
  • move the decimal implementation code to arrow-rs
  • have the discussion about Decimal128/256 implementation that's going on arrow-rs and agree on path forward for implementation details
  • meanwhile in DataFusion, continue to improve use of Decimals in the math functions, sql literals, scientific notation support, etc.

I think there will be some churn .. for example, to fix #3521 , we could use the num crate that arrow-rs already uses and fix the implementation of the decimal divide method here. However, that's only a fine short-term solution for stabilization and we'd need to remove the num crate again when we finally move the Decimal kernel implementation over to arrow-rs. Is that acceptable?

Also, how do you deal with the synchronization between the projects? for example, some DF issues are due to the decimal cast code in arrow-rs .. how do we test the fixes? now that we're at the beginning of the dev cycle for 13, can we change DF to pin it to arrow-rs git revs?

cc @alamb

@kmitchener
Copy link
Contributor

Hm, I see the situation in arrow-rs is complicated re decimals and I want to retract all opinions at this time. :)

@liukun4515
Copy link
Contributor Author

Also, how do you deal with the synchronization between the projects? for example, some DF issues are due to the decimal cast code in arrow-rs .. how do we test the fixes? now that we're at the beginning of the dev cycle for 13, can we change DF to pin it to arrow-rs git revs?

Hi @kmitchener

I think we can resolve the arithmetic operation for decimal128 data type in this sprint, I will do this and move the operation to the arrow-rs.

At the same time, I will fix the issue about decimal in datafusion. From the bug issue list, I think the most issue about decimal operation in the datafusion are about the divide arithmetic operation.

And other issue is about feature, like casting between string and decimal which can be implement in the arrow-rs.

@liukun4515
Copy link
Contributor Author

@kmitchener if the casting feature is important for you, it's great to add it in the arrow-rs kernel.

@alamb
Copy link
Contributor

alamb commented Sep 19, 2022

Also, how do you deal with the synchronization between the projects? for example, some DF issues are due to the decimal cast code in arrow-rs .. how do we test the fixes? now that we're at the beginning of the dev cycle for 13, can we change DF to pin it to arrow-rs git revs?

@kmitchener I have seen two basic patterns for this:

  1. Make a PR that pins to a arrow-rs git revision and complete the changes you want (and this basically leaves the PR as a draft for a while). Example: Use arrow row format in SortPreservingMerge ~50-70% faster #3386
  2. Make a "wrapper" / "adapter" function for the arrow one, adding whatever new special logic is needed and calling into the arrow kernel if needed. For example https://github.com/apache/arrow-datafusion/blob/master/datafusion/physical-expr/src/expressions/binary/adapter.rs

@kmitchener
Copy link
Contributor

@liukun4515 I'll submit a couple PRs to move the decimal kernels to arrow-rs. I have the code moved locally and it's mostly working -- enough that I'm confident I can complete it.

@alamb
Copy link
Contributor

alamb commented Sep 21, 2022

That would be awesome @kmitchener -- and a long time coming 👏

@liukun4515
Copy link
Contributor Author

liukun4515 commented Sep 22, 2022

I'll submit a couple PRs to move the decimal kernels to arrow-rs. I have the code moved locally and it's mostly working -- enough that I'm confident I can complete it.

I think it is not easy work to move the decimal operation to arrow-rs.
We can not do the comparation and arithmetic operation based on the I128, we should support the decimal256 also.
I try it in my local, it works.
But We need do more work to make it grace, there are some pr about decimal in the arrow-rs you can take a look it.

cc @kmitchener

Can you submit your change in your branch? and give me link for your changes
I want to take a look your thought.

@kmitchener
Copy link
Contributor

@liukun4515 @alamb Here are the 2 PRs. The one for arrow-rs is ready for review. The one for DF will remain in draft until arrow-rs 24 is released.

apache/arrow-rs#2770
#3590

If/once apache/arrow-rs#2770 is merged in, I'll update the DF PR to use git revs.

@andygrove andygrove changed the title Decimal issue [EPIC] Add Decimal support Oct 30, 2022
@alamb
Copy link
Contributor

alamb commented Apr 22, 2024

I wonder if we should claim this ticket is complete? I wonder if there is anything else this ticket is tracking

@liukun4515
Copy link
Contributor Author

I wonder if we should claim this ticket is complete? I wonder if there is anything else this ticket is tracking

I will check above unclosed issue in this week.

@alamb
Copy link
Contributor

alamb commented Nov 21, 2024

I think I am going to claim this epic is complete and that DataFusion supports decimals. We can track additional work in other tickets if needed

@alamb alamb closed this as completed Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants