-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add trigonometry functions #241
Conversation
78838f9
to
6668bf1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main thoughts are:
- do we really need the narrower types?
- Can you make sure that we correctly state error conditions for each type of function (I don't remember my trig).
extensions/functions_arithmetic.yaml
Outdated
description: "Get the cosine of a value in radians." | ||
impls: | ||
- args: | ||
- value: i8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason we should have signatures for all these narrower values? Is there some value in doing this over only supporting i64 and fp64?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding to this, in Arrow-c++, we only have kernels for fp32 and fp64 (I believe the docs may be wrong here and imply support for decimal inputs but we insert an implicit cast). You won't get an error if you try the integer types but it will insert an implicit cast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only included all of the others to follow how some other functions have been specified, but I don't see any value in doing it this way if we don't need to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, if the end result is no different than a cast the kernel is not needed. If there are other cases like this we should get rid of those kernels as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should allow an impl to specify multiple allowed types for a single value arg, e.g.
impls:
- args:
- value: [ i8, i16, i32, i64, fp32, fp64 ]
return: fp64
That would reduce the amount of boilerplate in these YAML files and at the same time avoid Substrait producers needing to insert lots of fussy explicit casts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the current pattern requires this.
Do you have an i32 + i64? Is it because of some performance benefit? I would expect that to be rare. The most common pattern systems have is an implementation of i64 + i64 and they cast i32 to i64 before doing i64 + i64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated to just support i64 and fp64 now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think fp32
is probably more valuable than i64
and I apologize for being obnoxious :). My goal isn't to limit the number of kernels but to create one kernel per distinct implementation. I think it is common for hardware/engines to have different implementations for float and double. By "different implementation" I mean "passing the same inputs yields a different result"
For an example, consider the C++ standard library implementation for cosine. There are three ways to call std::cos
, with a float
, with a double
, or with an integral
type. However, calling cosine with an integral type is identical to casting to double and calling cosine with that double value. We can see this both in the docs:
- A set of overloads or a function template accepting an argument of any integral type. Equivalent to 2) (the argument is cast to double).
and in practice:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for going into detail about that, makes sense! Updated to now have fp32
and fp64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part of this discussion is somewhat related to #251, so linking for posterity.
extensions/functions_arithmetic.yaml
Outdated
description: "Get the tangent of a value in radians." | ||
impls: | ||
- args: | ||
- options: [ SILENT, SATURATE, ERROR ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these the right error behaviors? (I can't remember my trig classes...)
Also, same question here wrt all the different input types. If all implementations do an unpromotion, let's not introduce signatures that do that implicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is some good input from @jvanstraten on this topic (well, related topic at least) here: #230 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies; should have put this in the original PR. The reasoning for specifying the overflow behaviour options for tangent only was on the basis that the ranges for the returned value for this function are wider than the other ones and so I assumed it'd need it.
My trig is also a bit rusty, but from what I can gather from a bit of a search, in terms of output values:
- range of the cosine and sine functions: [-1, 1]
- range of tangent: [-infinity, infinity].
- range of arctangent, arcsine: [-pi/2, pi/2]
- range of arccosine: [0,pi]
Though as @westonpace has said, the approach suggested by @jvanstraten might be more appropriate here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tan
is technically undefined at (1/2 + k) pi
for all integer k
, because that's a division by zero when expressed as sin(x) / cos(x)
. However, due to pi's transcendental nature, none of those numbers can be exactly represented with floating-point numbers, so tan(x)
is defined for all finite floating-point values of x. So tan
only needs the rounding option.
asin
and acos
do have domain errors; they are undefined for x < -1
and x > 1
. So they should be defined however sqrt
ends up being defined. atan2
is also a bit weird when 0, 0
is passed, since it's basically atan(y / x)
(with some boundary conditions to deal with x = 0
when y != 0
and to get all quadrants to behave nicely), so I guess it should get the same treatment.
The other trig functions are defined for all finite real numbers, and thus only need rounding options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
due to pi's transcendental nature
Especially when topped with ice cream
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the conversation regarding those other options now appears to be resolved, have updated the options here in line with those suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, minor nits
I don't know about the hyphen. Merriam-webster gives one word without hyphen. Wolfram Alpha and cpp reference give two words without hyphen.
f8bd33b
to
fa9f79c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
impls: | ||
- args: | ||
- name: rounding | ||
options: [ TIE_TO_EVEN, TIE_AWAY_FROM_ZERO, TRUNCATE, CEILING, FLOOR ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question as elsewhere, are these options in the order of most preferred to least preferred? Remember that an engine that support multiple option values on a non-specified value will pick them in the order they are declared.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIE_TO_EVEN
a.k.a. "round to nearest" is the default mode used by glibc, and as such is probably what everyone is already implicitly doing, so I placed that first intentionally.
Side note; reading that description I guess my earlier assumptions that "even" means "even number" is wrong, and it actually refers to the least significant bit of the mantissa. I don't think that changes anything for us though.
This looks good to me. Looks like it needs a conflict resolved before merge, however. @jvanstraten , are you happy with this? |
Yep, LGTM |
@thisisnic, can you do a conflict resolution? |
fa9f79c
to
e08d615
Compare
@jvanstraten Apologies, for some reason when I fixed the conflict and force-pushed, it dismissed your review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still LGTM 😄
Thanks @thisisnic ! |
This PR adds definitions for common trigonometry functions.