-
Notifications
You must be signed in to change notification settings - Fork 326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add rounding functions to the Column type #6817
Conversation
ceil : Column ! Invalid_Value_Type | ||
ceil self = Value_Type.expect_numeric self <| | ||
fun = _.ceil | ||
Column_Ops.map_over_storage self fun make_long_builder skip_nothing=True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really not sure if these operations should be implemented in Enso.
Currently most of the 'primitive' operations, especially on numeric columns are implemented in Java, for better efficiency.
Do we want to move towards doing these in Enso? I think we should measure what is the difference in performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another issue.
Currently I see that the ceil
method may return an EnsoBigInteger (refer to the decimal/CeilNode.java
).
However, currently the Table library storage does not support Big Integer storage - only LongStorage
. So this will likely fail in some way.
I think we need to have tests checking what happens in the scenario where the double
value magnitude exceeds the long
range, to verify the behaviour.
IMO currently it could be valid to return Nothing
in such a case and attach some kind of Arithmetic_Overflow
exception. Extending Table to handle Big Integers is a separate issue (currently unscheduled, @jdunkerley do we have plans to support this?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear about the first issue. The map_over_storage
is OK. Its usage is warranted when the operation we perform on each row must run some Enso logic (some complicated Enso function that is not worth replicating into Java, or has so expensive logic anyway the benefit of inlining it to Java would be negligible).
But I think we shouldn't use Enso operations for primitive operations and instead implement them directly in Java. That is the current architecture of the primitive-value storages in the Table library. We can discuss if it should be changed, but to do so I think we need to do some performance measurements to take informed decisions.
(For example one of the reasons we do the primitive operations in Java is that on
long
/double
types it allows to completely avoid boxing the values, which I don't think is avoidable when talking between Enso and Java).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think worth making this a vectorised operation on the NumberStorage.
We can use the Core_Utils to share the common implementation. Lets put it as a follow up ticket though please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use the Core_Utils to share the common implementation. Lets put it as a follow up ticket though please.
We could make the round
implementation in Core_Utils
, but then it would no longer be able to be written in Enso...
As for all the others - ceil
, floor
and I think truncate
- we seem to be doing just the basic Java operations Math.ceil
etc. in the 'primitive' path. The only additional handling is to handle promotion to BigInteger
if the result exceeds the range of long
. But BigInteger
is currently not supported by Table anyway, so I don't think sharing the implementation of these gives us any value at this moment - because the pure Enso side needs the additional BigInteger handling that has some Truffle magic, and the Table side needs to just report a warning/failure in case of an overflow - the common part is just delegating to Math.ceil
and co. which IMO isn't enough to make sharing worth it.
(But I'd make these vectorized nonetheless, I'm only unsure about round
as that would require us moving the whole implementation into Java - it could make it more efficient for tables though, so may as well be worth it, but it's quite a bit more work).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok - not need to put in Core_Utils
then.
Worth us seeing the relative performance of rounding 1,000,000 values as a map vs a vectorised op.
distribution/lib/Standard/Table/0.0.0-dev/src/Internal/Java_Exports.enso
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API and tests look ok.
- I think we need to have tests checking what is the behaviour when the
double
value exceeds thelong
range - since the Table currently does not support Big Integer storage, I'm not sure what will happen and I'd like to verify it. - Ideally, I don't think using
map_over_storage
for the primitive numeric operations likeceil
andfloor
, possibly alsotruncate
is the right choice. I think to be consistent with the architecture of the Table library, these should be implemented asUnaryMapOperation
.
I think it is fine to use map_over_storage
for round
since it includes pretty complex Enso logic which we don't want to replicate into Java. But the other operations, if I'm reading correctly, just delegate to Java methods, so I think they should be 'vectorized'.
I reckon the 'vectorization' of the operations into Java will happen as a separate task.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets add a follow up to look at vectorising the operations and the relative performance.
Otherwise LGTM.
ceil : Column ! Invalid_Value_Type | ||
ceil self = Value_Type.expect_numeric self <| | ||
fun = _.ceil | ||
Column_Ops.map_over_storage self fun make_long_builder skip_nothing=True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok - not need to put in Core_Utils
then.
Worth us seeing the relative performance of rounding 1,000,000 values as a map vs a vectorised op.
@@ -585,7 +585,7 @@ spec = | |||
231.2 . round . should_be_a Integer | |||
231.2 . round -1 . should_be_a Integer | |||
|
|||
Test.specify "Edge cases" <| | |||
Test.specify "Edge cases" pending="Re-enable this if the 15-digit restriction is removed" <| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we plan to ever remove this restriction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then I think we should just remove these tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @radeusgd small comments - worth clearing those up before merging.
@@ -585,7 +585,7 @@ spec = | |||
231.2 . round . should_be_a Integer | |||
231.2 . round -1 . should_be_a Integer | |||
|
|||
Test.specify "Edge cases" <| | |||
Test.specify "Edge cases" pending="Re-enable this if the 15-digit restriction is removed" <| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope
Checklist
Please ensure that the following checklist has been satisfied before submitting the PR:
Scala,
Java,
and
Rust
style guides. In case you are using a language not listed above, follow the Rust style guide.
./run ide build
.