-
Notifications
You must be signed in to change notification settings - Fork 226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More efficient bisection for 1D Newton root finder #1012
base: develop
Are you sure you want to change the base?
Conversation
include/boost/math/tools/roots.hpp
Outdated
static_assert(!std::is_same<T, float>::value, "Need to use Midpoint754 solver when T is float"); | ||
static_assert(!std::is_same<T, double>::value, "Need to use Midpoint754 solver when T is double"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be an assertion for long double as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added an explanation as to why long double
does not use the Midpoint754
solver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A guess there are a couple cases:
-
On some systems (e.g. Windows and MacOS)
long doubles
are just aliases todouble
. When long doubles are 64-bits we could just cast to double and useMidpoint754
to speed things up. -
On x86_64 linux systems you have the 80-bit
long double
. Normally you haveunsigned __int128
but you'd have to shift the bits around to cast it back properly. Likely not worth the magic required. -
On systems with IEEE 128-bit long doubles (e.g. ARM and S390x) you have access to
unsigned __int128
which should generally work. We have both of the aforementioned architectures running natively in the CI system.
Here is where I have defined macros before for determining the size of long double.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On some systems (e.g. Windows and MacOS)
long doubles
are just aliases todouble
. When long doubles are 64-bits we could just cast to double and useMidpoint754
to speed things up.
These systems will use the specialization for double
.
On x86_64 linux systems you have the 80-bit
long double
. Normally you haveunsigned __int128
but you'd have to shift the bits around to cast it back properly. Likely not worth the magic required.
I tried to make this work. Starting with numeric_limits<long double>::max()
and adding 1 bit gives NaN
, not Inf
as it does for float
, double
and __float128
. It's just not meant to be.
On systems with IEEE 128-bit long doubles (e.g. ARM and S390x) you have access to
unsigned __int128
which should generally work. We have both of the aforementioned architectures running natively in the CI system.
I got this to work for both _float128
and boost::multiprecision::float128
(wrapper for __float128
). The implementation does not require libquadmath
.
} | ||
|
||
template <typename U = T> | ||
static typename std::enable_if<std::numeric_limits<U>::is_specialized, U>::type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may be able to get away without checking has_denorm
because it was deprecated in C++23
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to be able to bisect with denormals because the solution to the root finding problem could be a denormal number.
Reading the depreciation document, it seems like the depreciation of has_denorm
has a lot to do with it being a static constexpr
. This is problematic because the actual behavior can change at runtime with e.g., _MM_SET_FLUSH_ZERO_MODE
.
I think we can actually delete std::numeric_limits::min()
because denorm_min
returns numeric_limits::min()
if denormals are off. Does that seen sensible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can actually delete
std::numeric_limits::min()
becausedenorm_min
returnsnumeric_limits::min()
if denormals are off. Does that seen sensible?
I think so. Since you are already check for is_specialized
you should avoid ever getting 0 instead of a finite minimum value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. Since you are already check for
is_specialized
you should avoid ever getting 0 instead of a finite minimum value.
@NAThompson I think it would be worthwhile for you to take a look at this as well, availability dependent of course. |
@mborland thank you for the review. Please let me know if there are additional changes. |
I added IEEE754 bisection support for |
I realized that the code overlap between what is needed for find the midpoint between two IEEE754 floats and the code needed to find the fast float distance between two IEEE754 floats is high. I'm in the process of refactoring this functionality into a few classes that when operated on allow the following operations to be performed in a few lines of code: |
The introduced tools in template <typename T, typename U>
static T fast_float_distance(T x, T y, boost::math::tools::detail::ieee754_linear::Layout_IEEE754Linear<T, U>) {
using boost::math::tools::detail::ieee754_linear::BitsInflated;
using boost::math::tools::detail::ieee754_linear::Denormflator;
const auto df = Denormflator<T, U>(); // Calculates if `has_denorm` is true at this instant
// Step 1 -- Calculate inflated representations
const BitsInflated<T, U> y_inflated(y);
const BitsInflated<T, U> x_inflated(x);
// Step 2 -- Calculate deflated representations
const auto y_def = df.deflate(y_inflated);
const auto x_def = df.deflate(x_inflated);
// Step 3 -- Subtract the two deflated representations
const auto xy_def_distance = (y_def - x_def)
// Step 4 -- The distance between the deflated points is the fast float distance
return xy_def_distance.static_cast_int_value_to_float();
} Or it can be used to perform the float advance operation: template <typename T, typename U>
static T float_advance(T x, int n, boost::math::tools::detail::ieee754_linear::Layout_IEEE754Linear<T, U>) {
using boost::math::tools::detail::ieee754_linear::BitsInflated;
using boost::math::tools::detail::ieee754_linear::Denormflator;
const auto df = Denormflator<T, U>(); // Calculates if `has_denorm` is true at this instant
// Step 1 -- Calculate inflated representation
const BitsInflated<T, U> x_inflated(x);
// Step 2 -- Calculate deflated representation
const auto x_def = df.deflate(x_inflated);
// Step 3 -- Move over n bits
const auto x_plus_n_def = x_def + n;
// Step 4 -- Inflate
const auto x_plus_n = df.inflate(x_plus_n_def);
// Step 5 -- Convert to float
return x_plus_n.reinterpret_as_float();
} |
If you are looking for a different quick way to calculate float distance @NAThompson showed me this trick: https://github.com/boostorg/math/pull/814/files#diff-45f56077fb4aa15ca1aca5025321e5c275fd20b3e161585a25442eaa2c7952a3R724 |
The same idea of viewing floats in bitspace is used here. Seeing that trick used in #814 inspired this refactor. In addition, this implementation attempts to:
|
|
||
|
||
#ifdef BOOST_HAS_FLOAT128 | ||
#include <boost/multiprecision/float128.hpp> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this breaks standalone mode as BOOST_HAS_FLOAR128 may be set but the multiprecision headers will not be available. We also try to avoid mutual/cyclic dependencies like these. What is the include actually used for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch with the dependency. I've removed this include and replaced it with the following includes:
#include <boost/math/tools/config.hpp> // For BOOST_MATH_FLOAT128_TYPE
#include <boost/cstdfloat.hpp> // For numeric_limits support for 128 bit floats
There may be a better way of including the needed files that I'm unaware of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This include, or one like it appears to be necessary in order to pass the std_real_concept_check_128
test associated with the file compile_test/std_real_concept_check.cpp
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless you've changed something in that test, it uses nothing from multiprecision at all so the include should not be needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As stated in a previous comment, I've removed the header #include <boost/multiprecision/float128.hpp>
and replaced it with #include <boost/cstdfloat.hpp> // For numeric_limits support for 128 bit floats
. The std_real_concept_check_128
test needs for std::numeric_limits
specializations to be defined for the BOOST_MATH_FLOAT128_TYPE
. If this header is removed, then test will not compile. (I tried this)
Why this occurs: the std_real_concept_check_128
test makes the std_real_concept
emulate a 128 bit floating point type. The tools in ieee754_linear
recognize this as an IEEE 754 floating point type. The comment on line 133 goes on to say:
// NOTE: returns Layout_IEEE754Linear<BOOST_MATH_FLOAT128_TYPE, ...> instead of Layout_IEEE754Linear<T, ...>
// in order to work with non-trivial types that wrap trivial 128 bit floating point types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I see the issue, but I think that we're looking at this the wrong way:
- Not depending on boost/cstdfloat.hpp is a "good thing" because while that header is genuinely very useful, it's also rather fragile to changes in GCC releases.
- I think the assumptions in
IsFloat128
(and maybe elsewhere) are wrong: just because a type has the correct size and properties, does NOT mean it's a trivial wrapper around a native type. It could for example be a pointer (or two) to the actual implementation which just happens to have the same size. mpfr_t is 24 bytes on my system so doesn't quite qualify, but if I'm reading it correctly, MPFR could be configured in a such a way that it was a 16 bye / 128 bit type which was set up to emulate a 128 bit float. Another way of looking at this is that std_real_concept_type and the other concept checking types should be viewed as black boxes - their purpose is to emulate to some degree or another, some other random type about which we know nothing that the library may be instantiated with in the future. You may be able to actually see their internals, but you really shouldn't look ;)
So I would change
class IsFloat128 {
public:
template <typename T>
static constexpr bool value() {
return std::numeric_limits<T>::is_iec559 &&
std::numeric_limits<T>::radix == 2 &&
std::numeric_limits<T>::digits == 113 && // Mantissa has 112 bits + 1 implicit bit
sizeof(T) == 16;
}
};
To maybe:
class IsFloat128 {
public:
template <typename T>
static constexpr bool value() {
return std::numeric_limits<T>::is_iec559 &&
std::numeric_limits<T>::radix == 2 &&
std::numeric_limits<T>::digits == 113 && // Mantissa has 112 bits + 1 implicit bit
#ifdef BOOST_HAS_FLOAT128
(std::is_floating_Point<T>::value || std::is_same<__float128, T>::value)
#else
std::is_floating_Point<T>::value
#endif
}
};
Likewise for the other IsFloatXX classes.
Then change:
template <typename T, typename U>
class Layout_IEEE754Linear {
static_assert(std::numeric_limits<T>::is_iec559, "Type must be IEEE 754 floating point.");
static_assert(std::is_unsigned<U>::value, "U must be an unsigned integer type.");
static_assert(sizeof(T) == sizeof(U), "Type and uint size must be the same.");
public:
using type_float = T; // The linear floating point type
};
To:
template <typename T, typename U>
class Layout_IEEE754Linear {
static_assert(std::numeric_limits<T>::is_iec559
#ifdef BOOST_HAS_FLOAT128
|| std::is_same<T, __float128>::value
#endif
, "Type must be IEEE 754 floating point.");
static_assert(std::is_unsigned<U>::value, "U must be an unsigned integer type.");
static_assert(sizeof(T) == sizeof(U), "Type and uint size must be the same.");
public:
using type_float = T; // The linear floating point type
};
Which would allow the header dependency to be removed as well. I do appreciate it's less clean, but I suspect it's a safer option in the long run!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not depending on boost/cstdfloat.hpp is a "good thing" because while that header is genuinely very useful, it's also rather fragile to changes in GCC releases.
Does the following change make things less fragile? Changing the include from:
#include <boost/cstdfloat.hpp>
to
#if defined(BOOST_HAS_INT128) && defined(BOOST_HAS_FLOAT128)
#if !defined(BOOST_CSTDFLOAT_NO_LIBQUADMATH_LIMITS)
#include <boost/math/cstdfloat/cstdfloat_limits.hpp>
#endif
#endif
I think the assumptions in IsFloat128 (and maybe elsewhere) are wrong: just because a type has the correct size and properties, does NOT mean it's a trivial wrapper around a native type. It could for example be a pointer (or two) to the actual implementation which just happens to have the same size. [...] MPFR could be configured in a such a way [...] to emulate a 128 bit float.
For the value()
function of any of the IsFloat
classes to return true, the following statement needs to be true:
std::numeric_limits<T>::is_iec559
The 32, 64, and 128 bit IEEE floats have a specific layout consisting of a sign bit, an exponent, and a fraction. A number type containing two pointers will not meet this standard. Skimming the MPFR codebase, this library is clearly set up with an awareness of IEEE 754 arithmetic. They don't appear to specialize numeric limits, and therefore don't set std::numeric_limits<T>::is_iec559 = true
. I think the check that std::numeric_limits<T>::is_iec559 = true
provides reasonable protection against accidental use of this functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup apologies, you're correct that std::numeric_limits<T>::is_iec559
should provide a reasonable check given that that std defines a binary layout.
I would still prefer to avoid the cstdfloat includes if possible though - and we know that __float128 is an iec559 compatible type, so including it for that one case is overkill when we can just use is_same<T, __float128>::value
?
Other than that and one minor nit about inclusion this looks good to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a good way to do this. The code operates by duck typing a float type T
. For example, something is determined to be a 32 bit binary floating point type based on the value of IsFloat32::value<T>()
. Something is determined to be such a type if:
std::numeric_limits<T>::is_iec559 &&
std::numeric_limits<T>::radix == 2 &&
std::numeric_limits<T>::digits == 24 && // Mantissa has 23 bits + 1 implicit bit
sizeof(T) == 4;
evaluates to true. If any of the four conditions above cannot be evaluated, then the code will fail to compile. The include is needed for it to compile, even if the method were to be unused.
The alternative way of structuring this code is to manually match up binary floating point types with the appropriate methods, but this is messy and tends to miss valid use cases. is_same<T, __float128>::value
might work on GCC, but what about an Intel compiler?
I'd like to avoid this include here too as it shouldn't be necessary. Perhaps the issue occurs here:
math/include/boost/math/tools/config.hpp
Lines 399 to 415 in 09d82da
#ifdef BOOST_MATH_USE_FLOAT128 | |
// | |
// Only enable this when the compiler really is GCC as clang and probably | |
// intel too don't support __float128 yet :-( | |
// | |
# if defined(__INTEL_COMPILER) && defined(__GNUC__) | |
# if (__GNUC__ > 4) || ((__GNUC__ == 4) && (__GNUC_MINOR__ >= 6)) | |
# define BOOST_MATH_FLOAT128_TYPE __float128 | |
# endif | |
# elif defined(__GNUC__) | |
# define BOOST_MATH_FLOAT128_TYPE __float128 | |
# endif | |
# ifndef BOOST_MATH_FLOAT128_TYPE | |
# define BOOST_MATH_FLOAT128_TYPE _Quad | |
# endif | |
#endif |
where a 128 bit floating point type is defined without defining
std::numeric_limits
for that type.
include/boost/math/tools/roots.hpp
Outdated
#include <boost/math/tools/toms748_solve.hpp> | ||
#include <boost/math/policies/error_handling.hpp> | ||
|
||
#ifdef BOOST_HAS_FLOAT128 | ||
#include <boost/multiprecision/float128.hpp> | ||
using boost::multiprecision::float128; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, we shouldn't be forcing multiprecision headers to be included here, plus the using declaration at global header scope is evil ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This header and using declaration was unneeded. Good catch.
#include <boost/multiprecision/float128.hpp> | ||
#endif | ||
|
||
#if __has_include(<stdfloat>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unfortunate, but __has_include is an insufficient check, because it may pass even when the translation mode is not C++23, so we will need to check the value of BOOST_CXX_VERSION as well (unfortunately __cplusplus is non-reliable for MSVC). I yes, must get around to adding proper Boost.Config macros and tests for these new C++23 headers sometime!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The include pattern:
#if __has_include(<stdfloat>)
# include <stdfloat>
#endif
occurs in include/boost/math/tools/promotion.hpp
, include/boost/math/concepts/real_concept.hpp
, and about 30 test files including test/jacobi_test.cpp
. I think it makes sense to allow it here for consistency. If required it should be fixed across the codebase in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be desirable for the pattern:
#if defined __has_include
# if __cplusplus > 202002L || _MSVC_LANG > 202002L
# if __has_include (<X>)
# include <X>
# endif
# endif
#endif
to me made into a macro similar to
INCLUDE_CPP23_HEADER(X)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but you still need to be able to determine whether the header has been included or not (ie whether you can use those contents or not), so the normal method is:
#ifndef BOOST_NO_CXX23_STDFLOAT
#include <stdfloat>
#endif
#ifndef BOOST_NO_CXX23_STDFLOAT
// do something that uses the header
#endif
I'm working on the C++23 macros for Boost.Config now...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The approach described above typically applies when one is manually defining methods. A manually defined method will only compile if certain headers have been included. For example, one might manually define a method for type __float128
, another for _Quad
, and another for std::float128_t
. Careful preprocessor use is required because certain methods will only compile if the correct headers have been included.
In contrast to this approach, this approach relies on duck typing. In C++ duck typing happens at compile time and is implemented with templates and SFINAE. It also allows a single templatized method to be used for various types. For example, the 128 bit type is used by the following types: __float128
, _Quad
, std::float128_t
, multiprecision::float128
, as well as std_real_concept
when it's set up to emulate a 128 bit float. I may be missing some, but the compiler will not.
The benefit of the duck type approach that the code only needs to compile for the template parameters that are actually used. For this reason, the approach does not require the careful preprocessor use of the manual approach.
Also, thank you for working on the C++23 macros.
This is chunk #4 for #1000. Also closes #1008.
This PR adds tools that enable efficient bisection for the 1D Newton root finder.
The Situation
The 1D Newton root finder has a bisection fallback mechanism. If one of the bounds is large (i.e.,
std::numeric_limits::max()
), naively bisecting the bounds is slow. For example, for typedouble
, bisecting from1.79769e+308
to1
takes1024
iterations. The situation is even worse for types with more precision. When the Newton solver runs out of (e.g., 200) iterations during bisection, the solver declares success even if far from a root. Again, see issue #1008.Functionality
What's the maximum number of bisections needed to root find a
double
? Adouble
is 64 bits. So the maximum number of iterations needed is 64. Efficient bisection is bisection in bit space. An interesting consequence of bisection in bitspace, is that infinity is a bisectable bound.For types that do not conform to the standard (i.e., IEEE 754), an approximate version of the bisection algorithm above is used. The approximate version supports bisection with infinity if the type is specialized (i.e., if
std::numeric_limits<T>::is_specialized = true
).Fun Facts
The mean of
0
andinfinity
is:infinity
(arithmetic mean)NaN
(geometric mean)double
andfloat
)long double
using the approximate algorithm)