-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spurious overflow in Rational-to-Float16 conversion #52394
Comments
A simple workaround would be to define something like: Float16(x::Rational{<:Union{Int16,Int32,Int64,Int128,UInt16,UInt32,UInt64,UInt128}}) = Float16(Float32(x)) |
#49749 is supposed to fix this as far as I remember, but I still haven't gotten around to finishing it. |
Indeed, one has to go out of their way to cause it with
# unsigned wraparound makes this succinct. Or use `BigInt` and convert to `UInt128`.
Float32(-(UInt128(1) << 103) // (-(UInt128(1) << 103) + 1))
# Naturally, this only applies to numerators that satisfy the above condition
# after reduction to smallest terms.
Float32(-(UInt128(1) << 103) // (-(UInt128(1) << 103) + 2)) One should leave out |
Implementation approach: 1. Convert the (numerator, denominator) pair to a (sign bit, integral significand, exponent) triplet using integer arithmetic. The integer type in question must be wide enough. 2. Convert the above triplet into an instance of the chosen FP type. There is special support for IEEE 754 floating-point and for `BigFloat`, otherwise a fallback using `ldexp` is used. As a bonus, constructing a `BigFloat` from a `Rational` should now be thread-safe when the rounding mode and precision are provided to the constructor, because there is no access to the global precision or rounding mode settings. Updates JuliaLang#45213 Updates JuliaLang#50940 Updates JuliaLang#52507 Fixes JuliaLang#52394 Closes JuliaLang#52395 Fixes JuliaLang#52859
Implementation approach: 1. Convert the (numerator, denominator) pair to a (sign bit, integral significand, exponent) triplet using integer arithmetic. The integer type in question must be wide enough. 2. Convert the above triplet into an instance of the chosen FP type. There is special support for IEEE 754 floating-point and for `BigFloat`, otherwise a fallback using `ldexp` is used. As a bonus, constructing a `BigFloat` from a `Rational` should now be thread-safe when the rounding mode and precision are provided to the constructor, because there is no access to the global precision or rounding mode settings. Updates JuliaLang#45213 Updates JuliaLang#50940 Updates JuliaLang#52507 Fixes JuliaLang#52394 Closes JuliaLang#52395 Fixes JuliaLang#52859
Implementation approach: 1. Convert the (numerator, denominator) pair to a (sign bit, integral significand, exponent) triplet using integer arithmetic. The integer type in question must be wide enough. 2. Convert the above triplet into an instance of the chosen FP type. There is special support for IEEE 754 floating-point and for `BigFloat`, otherwise a fallback using `ldexp` is used. As a bonus, constructing a `BigFloat` from a `Rational` should now be thread-safe when the rounding mode and precision are provided to the constructor, because there is no access to the global precision or rounding mode settings. Updates JuliaLang#45213 Updates JuliaLang#50940 Updates JuliaLang#52507 Fixes JuliaLang#52394 Closes JuliaLang#52395 Fixes JuliaLang#52859
Implementation approach: 1. Convert the (numerator, denominator) pair to a (sign bit, integral significand, exponent) triplet using integer arithmetic. The integer type in question must be wide enough. 2. Convert the above triplet into an instance of the chosen FP type. There is special support for IEEE 754 floating-point and for `BigFloat`, otherwise a fallback using `ldexp` is used. As a bonus, constructing a `BigFloat` from a `Rational` should now be thread-safe when the rounding mode and precision are provided to the constructor, because there is no access to the global precision or rounding mode settings. Updates JuliaLang#45213 Updates JuliaLang#50940 Updates JuliaLang#52507 Fixes JuliaLang#52394 Closes JuliaLang#52395 Fixes JuliaLang#52859
Implementation approach: 1. Convert the (numerator, denominator) pair to a (sign bit, integral significand, exponent) triplet using integer arithmetic. The integer type in question must be wide enough. 2. Convert the above triplet into an instance of the chosen FP type. There is special support for IEEE 754 floating-point and for `BigFloat`, otherwise a fallback using `ldexp` is used. As a bonus, constructing a `BigFloat` from a `Rational` should now be thread-safe when the rounding mode and precision are provided to the constructor, because there is no access to the global precision or rounding mode settings. Updates JuliaLang#45213 Updates JuliaLang#50940 Updates JuliaLang#52507 Fixes JuliaLang#52394 Closes JuliaLang#52395 Fixes JuliaLang#52859
Implementation approach: 1. Convert the (numerator, denominator) pair to a (sign bit, integral significand, exponent) triplet using integer arithmetic. The integer type in question must be wide enough. 2. Convert the above triplet into an instance of the chosen FP type. There is special support for IEEE 754 floating-point and for `BigFloat`, otherwise a fallback using `ldexp` is used. As a bonus, constructing a `BigFloat` from a `Rational` should now be thread-safe when the rounding mode and precision are provided to the constructor, because there is no access to the global precision or rounding mode settings. Updates JuliaLang#45213 Updates JuliaLang#50940 Updates JuliaLang#52507 Fixes JuliaLang#52394 Closes JuliaLang#52395 Fixes JuliaLang#52859
Constructing a floating-point number from a `Rational` should now be correctly rounded. Implementation approach: 1. Convert the (numerator, denominator) pair to a (sign bit, integral significand, exponent) triplet using integer arithmetic. The integer type in question must be wide enough. 2. Convert the above triplet into an instance of the chosen FP type. There is special support for IEEE 754 floating-point and for `BigFloat`, otherwise a fallback using `ldexp` is used. As a bonus, constructing a `BigFloat` from a `Rational` should now be thread-safe when the rounding mode and precision are provided to the constructor, because there is no access to the global precision or rounding mode settings. Updates JuliaLang#45213 Updates JuliaLang#50940 Updates JuliaLang#52507 Fixes JuliaLang#52394 Closes JuliaLang#52395 Fixes JuliaLang#52859
Constructing a floating-point number from a `Rational` should now be correctly rounded. Implementation approach: 1. Convert the (numerator, denominator) pair to a (sign bit, integral significand, exponent) triplet using integer arithmetic. The integer type in question must be wide enough. 2. Convert the above triplet into an instance of the chosen FP type. There is special support for IEEE 754 floating-point and for `BigFloat`, otherwise a fallback using `ldexp` is used. As a bonus, constructing a `BigFloat` from a `Rational` should now be thread-safe when the rounding mode and precision are provided to the constructor, because there is no access to the global precision or rounding mode settings. Updates JuliaLang#45213 Updates JuliaLang#50940 Updates JuliaLang#52507 Fixes JuliaLang#52394 Closes JuliaLang#52395 Fixes JuliaLang#52859
Fixes #52394. Also fixes `Float32` for `UInt128`, since currently `Float32((typemax(UInt128)-0x01) // typemax(UInt128))` gives `Nan32`.
Constructing a floating-point number from a `Rational` should now be correctly rounded. Implementation approach: 1. Convert the (numerator, denominator) pair to a (sign bit, integral significand, exponent) triplet using integer arithmetic. The integer type in question must be wide enough. 2. Convert the above triplet into an instance of the chosen FP type. There is special support for IEEE 754 floating-point and for `BigFloat`, otherwise a fallback using `ldexp` is used. As a bonus, constructing a `BigFloat` from a `Rational` should now be thread-safe when the rounding mode and precision are provided to the constructor, because there is no access to the global precision or rounding mode settings. Updates JuliaLang#45213 Updates JuliaLang#50940 Updates JuliaLang#52507 Fixes JuliaLang#52394 Closes JuliaLang#52395 Fixes JuliaLang#52859
It would be nice if this gave a correctly rounded result rather than overflowing:
It's not specific to
Float16
arithmetic, in principle, but that's where it shows up most easily sinceFloat32
andFloat64
cannot be overflowed byInt
.(I ran into this when implementing a recursive pairwise
mean
in #52365 (comment))The text was updated successfully, but these errors were encountered: