-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(rust,python): initial working version of Decimal Series #7220
Conversation
@plaflamme @ritchie46 Pushing my WIP branch in its current state, this obviously needs further work before we even get to arithmetics, ops and the such. But first I think we should make decisions on whether things are generally correct type-wise/dtype-wise etc given that decimals are a bit weird compared to other types. // there's also a standing bug with creating decimal series from anyvalues – just inferring the type from the first decimal doesn't work, you have to scan the entire sequence first to determine the scale |
23b4e60
to
50e2dbe
Compare
Maybe we could start with negative scale (indicating that it should be set) and then we keep track of it while we are scanning the items. Another strategy is always using a default scale if not set by users. We do the same with datetime units. |
I think it's a bit different here from datetime units, at least if we restrict it to [
Decimal('1.23'), # scale = 2
Decimal('-1000'), # scale = 0
Decimal('0.0001'), # scale = 4
Decimal('0'), # scale = 0
] In this case you have to have to use (prec = None, scale = 4), ending up with the following i128 values: [1_2300, -1000_0000, 1, 0] One exception would be if we allow to mix integers and Decimals in any-values, e.g. ideally this should produce the same output: [Decimal('1.23'), -1000, Decimal('0.0001'), Decimal('0') ] The logic is slightly different if decimal dtype has been already provided to us – then we have to check that (1) the resulting scale is >= than the actual scale of each item, (2) there's no i128 overflow while rescaling, (3) if precision is set, it is being respected. |
Yeap, I understand. What I mean is that upon inference we set the |
@aldanor shall we try to get an absolute minimal version working and merge that in? I rather break this down in smaller steps and see what we encounter. |
I think after writing at least some tests for the basic functionality that has been added here, like formatting, parsing converting to/from python decimal, and from arrow, to/from anyvalues etc – I would be pretty uncomfortable merging it otherwise. I can jump on adding some tests here – if you have any particular suggestions for test cases please feel free to post. // Btw: failing Python tests on the CI are actually a sign that things sort of work as intended – they are the ones that checked that Decimal was converted to Float64 which it isn't the case anymore. |
The python tests are preferred test locations. These are the ultimate integation tests and don't hurt compile times. If we want to test on the rust side, the integrations tests folder: |
So, I guess conversion logic must go into The So, where does this I was thinking the logic should be something like this:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My 2 cents. This looks great BTW 👍
@ritchie46 @plaflamme Here's a new batch of updates (the whole thing is brittle as f, I'm surprised it actually works but it does; lots of work ahead to cover various corners!)
|
104d66f
to
57d5905
Compare
Next steps, I think: write tests for all possible ways to build series/frames with decimals in them from Python/Arrow, and to convert out of it. Might make sense to write them even if they fail and mark them as There are definitely gaps to fill and decisions to make (e.g., should pl.Series(['0.01'], dtype=pl.Decimal) yield decimal series? would be cool if it did. What's the supertype of Decimal and str? not sure). |
This is also the case for
Yes, that should be in |
d85e4f5
to
d8e7769
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright. Thanks a lot @aldanor! This is great functionality to have. I have left a few minor remarks and then it is good to go.
d8e7769
to
52923bf
Compare
@ritchie46 fixed/rebased, thanks for pointing a few things out 🙂 I think even though there's tons of work ahead and it's in a barely-working state we should merge it in because otherwise constant rebasing will become pretty painful given the large number of files affected here... |
Thanks a lot @aldanor. Is it ok if I ping you in future decimal related issues? |
|
Some of my workloads are now breaking because polars infer things to be decimal type and then proceeds to complain that some operations are not supported on the decimal column. This makes me unable to use latest 0.16.10, everything works for me 0.16.8.FYI @ritchie46 @aldanor |
Can you tell which operations are missing? |
No description provided.