Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skew normal distribution PPF incorrect for high skew #1120

Closed
dschmitz89 opened this issue Apr 17, 2024 · 7 comments · Fixed by #1123
Closed

Skew normal distribution PPF incorrect for high skew #1120

dschmitz89 opened this issue Apr 17, 2024 · 7 comments · Fixed by #1123

Comments

@dschmitz89
Copy link
Contributor

This showed up in SciPy: scipy/scipy#20124

Minimal Python reproducer:

from scipy.stats import skewnorm
skewnorm.ppf(0.01, 500)  # 500 = skewness parameter

RuntimeWarning: Error in function boost::math::quantile(const skew_normal_distribution<d>&, %1%): Unable to locate solution in a reasonable time: either there is no answer to quantile or the answer is infinite.  Current best guess is %1%
-> 4.833181656374989e+142

SciPy recently updated to Boost's develop branch at this commit, which did not solve this issue but only changed the error message to the one above.

Thanks in advance!

@mborland
Copy link
Member

Confirmed with reproducer:

#include <boost/math/distributions/skew_normal.hpp>
#include "math_unit_test.hpp"

int main()
{
    using scipy_policy = boost::math::policies::policy<boost::math::policies::promote_double<false>>;
    constexpr auto q = 0.012533469508013;
    
    boost::math::skew_normal_distribution<double, scipy_policy> dist(0, 1, 500);
    CHECK_ULP_CLOSE(0.009999999999999898, boost::math::cdf(dist, q), 1);
    CHECK_ULP_CLOSE(q, boost::math::quantile(dist, 0.01), 1);

    return boost::math::test::report_errors();
}

@mborland
Copy link
Member

mborland commented Apr 18, 2024

@jzmaddock here are the first couple steps. The guess looks pretty good but then it makes a massive first step and attempts to work downward taking 3-4 steps per order of magnitude.

Newton_raphson_iterate, guess = -0.061091221067366294206202326222410193, min = -1.7976931348623157081452742373170436e+308, max = 1.7976931348623157081452742373170436e+308, digits = 53, max_iter = 200

Newton iteration 1, delta = -3.8833117392140707250772683903547802e+202, residual = -0.010000000000000000208166817117216851

Newton iteration 2, delta = 1.9416558696070353625386341951773901e+202, residual = 0.98999999999999999111821580299874768

Newton iteration 3, delta = 9.7082793480351768126931709758869504e+201, residual = 0.98999999999999999111821580299874768

Newton iteration 4, delta = 4.8541396740175884063465854879434752e+201, residual = 0.98999999999999999111821580299874768

Newton iteration 5, delta = 2.4270698370087942031732927439717376e+201, residual = 0.98999999999999999111821580299874768

Increasing the max number of steps to 10000 this converges after 710 steps to the correct answer.

@NAThompson
Copy link
Collaborator

NAThompson commented Apr 19, 2024

@mborland , @jzmaddock : The following Mathematica code gets the update for the Halley iterate-I wonder if it can be used to stabilize the "wildness" of the Newton iterate:

Screenshot 2024-04-19 at 13 55 43

(You can see that the simplification could've been just a bit better there-most of the bells cancel.) So each iteration will take 2 erf calls, one Owen's T call, and an exp.)

I confess to quite enjoying the Halley iterate for scalar rootfinding problems. For example, if Φ'(x) = 0 at some point, Newton's method fails, but Halley's method can gracefully degrade to Halley's irrational method. And if the denominator 2Φ'(x)² - Φ(x)Φ''(x) in Halley's method becomes zero, we can gracefully degrade to a Newton step.

I also note that we might be able to do a bit better on constraining the domain:

const RealType search_min = support(dist).first;
const RealType search_max = support(dist).second;

We might be able to use the fact that the mean of the skew normal distribution can be expressed in terms of a computation that requires only an addition and multiplication. That would allow us to write:

if (p >= 0.5) {
    search_min = μ + ωδsqrt(2/π);
} else {
   search_max = μ + ωδsqrt(2/π)
}

Maybe not the tightest constraint, but perhaps better than the entire real line.

@jzmaddock
Copy link
Collaborator

Just looking at this now - that looks like a rather expensive way of computing derivatives, but more to the point, I have a hunch that for sufficiently high skewness you will always reach a point where the computation blows up: effectively the CDF becomes something like a square wave with zero derivative(s) either side of a near vertical crossover point. I'm wondering if the TOMS748 algorithm might be a better choice here: it has robust root bracketing (provided you start the right side of zero). It's a little worse than Newton in the smooth cases (typically ~ 12 iterations for double precision) but against that there's no need to compute roots? I'm out today, but I'll try and experiment later...

@NAThompson
Copy link
Collaborator

that looks like a rather expensive way of computing derivatives

Well, the idea is that you have to compute these quantities just to evaluate the CDF anyway, so you might as well use them for the derivatives too. But I think you're probably right: This is a bracketing problem, not a convergence problem.

@jzmaddock
Copy link
Collaborator

TOMS748 gets that particular case in 20 iterations which looks pretty good, but I need to devise some better tests and make sure it doesn't blow up anywhere...

jzmaddock added a commit that referenced this issue Apr 21, 2024
Rather than Newton iterations.
Add test case.
Fixes #1120
@maresb
Copy link

maresb commented May 5, 2024

Hey, sorry I missed out on this conversation. I have some thoughts regarding this. I'm fairly convinced that Newton's method would work great on the logit-CDF. Unfortunately, as far as I can tell, there is no numerically stable implementation of skewnorm CDF in the literature. (For positive skew and mildly-negative x, there is catastrophic cancellation when subtracting the Owen's T, even accounting for the skew -> 1/skew transformation.)

I think I know how to compute it stably, but I'm unfortunately slammed with other work at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants