Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible incorrect argument used in test of inc_beta_ddb #2417

Open
ricardoV94 opened this issue Mar 9, 2021 · 32 comments
Open

Possible incorrect argument used in test of inc_beta_ddb #2417

ricardoV94 opened this issue Mar 9, 2021 · 32 comments

Comments

@ricardoV94
Copy link

ricardoV94 commented Mar 9, 2021

I think an incorrect argument is being passed to the function calls inc_beta_ddb_test:

EXPECT_FLOAT_EQ(3.2996082e-05,
inc_beta_ddb(small_a, small_b, small_z, digamma(small_a),
digamma(small_a + small_b)))
<< "reasonable values for a, b, x";

Shouldn't the fourth argument be digamma(small_b) instead of digamma(small_a) (and similarly to the other tests)?

Obviously this does not mean anything for the function itself, but it makes it difficult to check independently what the expected value should have been.

Apologies if I misunderstood something :)

@bbbales2
Copy link
Member

bbbales2 commented Mar 9, 2021

Hmm, I don't really know about this function. I agree it looks strange.

Since you're asking the question I'm hoping maybe you can help me figure it out :D.

Do you know how to go about computing these things in R by any chance?

library(zipfR)

a = 1.5
b = 1.25
z = 0.5

dx = 1e-5

(zipfR::Ibeta(z, a, b + dx) - zipfR::Ibeta(z, a, b - dx)) / (2 * dx)

I thought this should be equivalent to inc_beta_ddb(1.5, 1.25, 0.5, digamma(1.25), digamma(2.75)), but it doesn't appear to be. Would there be different parameterizations of this type of function?

@ricardoV94
Copy link
Author

ricardoV94 commented Mar 10, 2021

Looking here in the source code it looks like it should be the beta digamma

+= inc_beta_ddb(alpha_dbl, beta_dbl, y_dbl, digamma_beta[n],

I actually compared it with your other function which suggests the same: https://github.com/stan-dev/math/blob/master/stan/math/prim/fun/grad_reg_inc_beta.hpp

I am more used to python, so I also tested via

from scipy.special import betainc
eps = 1e-5
a = 2
b = 1
z = 0.5

grad = (betainc(a, b+eps, z) - betainc(a, b-eps, z)) / (2*eps)
print(grad)

The gradient is much closer using the digamma beta

@ricardoV94
Copy link
Author

ricardoV94 commented Mar 10, 2021

(zipfR::Ibeta(z, a, b + dx) - zipfR::Ibeta(z, a, b - dx)) / (2 * dx)

It should be the zipf::Rbeta (regularized incomplete beta)

* Returns the partial derivative of the regularized
* incomplete beta function, I_{z}(a, b) with respect to b.

@bbbales2
Copy link
Member

@ricardoV94 thanks for putting the time into investigating this. I'll have a look at this hopefully later today.

@bbbales2
Copy link
Member

bbbales2 commented Mar 10, 2021

Hmm, the code here seems to work for me for small values, but for the large value tests something might be going on. It might be that this function just isn't valid out to extreme parts of parameter space (in which case we won't test it out there), or could be a bug for large values (in which case will need to fix this).

I can run this code on wolframcloud and get a different thing than what the code gives (but it agrees with the finite differences):

a = 15000
b = 1.25
z = 0.999
((Gamma[b] Gamma[a + b])/Gamma[a]) (1 - z)^b HypergeometricPFQRegularized[ {b, b, 1 - a}, {1 + b, 1 + b}, 1 - z] + BetaRegularized[1 - z, b, a] (PolyGamma[b] - PolyGamma[a + b] - Log[1 - z])

That code comes from: https://functions.wolfram.com/GammaBetaErf/BetaRegularized/20/01/03/0001/

R finite differences give: 2.00849792806346e-06
Wolfram gives: 2.0085e-6
Stan gives: -2.1355524

Will have to continue the investigation later.

R code is:

(zipfR::Rbeta(z, a, b + dx) - zipfR::Rbeta(z, a, b - dx)) / (2 * dx))

Stan code is:

inc_beta_ddb(large_a, small_b, large_z, digamma(small_b),
                               digamma(large_a + small_b));

@ricardoV94
Copy link
Author

ricardoV94 commented Mar 11, 2021

I noticed that when adapting your code to pymc3. The difference seems to go away if you make the threshold smaller in the STAN functions from 1e-10 to 1e-15

double threshold = 1e-10;

And on the dda

Here is the python output of the same algorithm with more stringent thresholds:

# threshold = 1e-10
_betainc_ddb(15000, 1.25, 0.999, digamma(1.25), digamma(15000+1.25))
# -2.1355523032324224

# threshod = 1e-12
_betainc_ddb(15000, 1.25, 0.999, digamma(1.25), digamma(15000+1.25))
# -2.3924212087640326e-07

# threshold = 1e-15
_betainc_ddb(15000, 1.25, 0.999, digamma(1.25), digamma(15000+1.25))
# 2.0060173144203154e-06

# threshold = 1e-18
_betainc_ddb(15000, 1.25, 0.999, digamma(1.25), digamma(15000+1.25))
# 2.008495010807732e-06

I assumed that was a speed tradeoff on your side. Or is it a red flag that the original difference was so large?

@ricardoV94
Copy link
Author

ricardoV94 commented Mar 11, 2021

Something looks a bit funny with the summand variable in that extreme case. It starts by getting larger from the initial value and then getting smaller. By the "looks" of the algorithm I expected it to be a non-increasing value:

_betainc_ddb(15000, 1.25, 0.999, digamma(1.25), digamma(15000+1.25))
threshold=1e-12
summand=2.2496250468697938e-11  # initial summand is smaller than old threshold of 1e-10
summand=1.0384442264431505e-10
summand=3.665891374207815e-10
summand=1.047694298123003e-09
summand=2.5153463787056327e-09
summand=5.206333323510543e-09
summand=9.470635850818627e-09
summand=1.536623464951879e-08
summand=2.2501039747638058e-08
summand=3.002188727750973e-08
summand=3.6789065746491285e-08
summand=4.168201149077467e-08
summand=4.391455852728936e-08  # only here does it start decreasing
summand=4.323568264710523e-08
summand=3.995043593027368e-08
summand=3.477714397321581e-08
summand=2.8616825497631537e-08
summand=2.232595530024962e-08
summand=1.655896810584071e-08
summand=1.1704463091376664e-08
summand=7.90182769489177e-09
summand=5.1055153156812025e-09
summand=3.162945689332276e-09
summand=1.8820153177406136e-09
summand=1.0772476439193211e-09
summand=5.940180700712923e-10
summand=3.1598081569836583e-10
summand=1.6234662200064858e-10
summand=8.065943698192561e-11
summand=3.8794608086322824e-11
summand=1.8081595435582655e-11
summand=8.174648510873313e-12
summand=3.5880740066010608e-12
summand=1.5303262874820438e-12
summand=6.347265741755701e-13
Out[2]: -2.3924212087640326e-07  # Output

@bbbales2
Copy link
Member

Oof, yeah, I'm not sure where this power series came from or if someone derived it by hand, but I wasn't able to quickly identify the algorithm or anything here.

I wonder if the convergence criteria could be written in terms of sum_numer / sum_denom and make it more reliable? I'm not sure.

It seems like there's a problem here with the code, however I'm not quite sure what this is affecting yet. I went grepping in the code and I found grad_reg_inc_beta which from the comments seems like the same function but is computed a different way lol. I'm not sure what's going on. Anyway at this point this needs a more careful inspection in the code. If you figure out what's going on let me know.

Searching around I found this: https://core.ac.uk/download/pdf/82094101.pdf which might be something more concrete to base an implementation on :P.

@ricardoV94
Copy link
Author

ricardoV94 commented Mar 11, 2021

grad_reg_inc_beta is also different in that it computes the derivatives of a and b simultaneously.

In any case, the current implementation (_dd*) seems to give the right results for non extreme values and also for extreme values when using a smaller threshold. So it's definitely not completely out there. I have no intuition about what could be used as a better convergence criteria however. I thought of something silly like checking for the threshold but also making sure the summand has decreased since the last iteration.

I found that reference you linked to but I am not sure how useful it is, since the goal seems to extend the derivatives to negative values of the incomplete beta. I found this older reference (https://core.ac.uk/download/pdf/6303153.pdf) with a C implementation here (https://github.com/canerturkmen/betaincder/blob/master/betaincder/c/beta.c) more promising as a benchmark.

The current algorithm in STAN gives the right results for the tabulated values in Table 1 of that reference.

Edit: It seems the current functions were added by @betanalpha in here: b465d12 and tweaked later a couple of times (without a substantial change in their logic), possibly because they were deemed more robust than the grad_reg_inc_beta?

@bbbales2
Copy link
Member

bbbales2 commented Mar 11, 2021

I found this older reference

Woops, yeah, that's the one I meant to link. Thanks for finding that C implementation.

seems to give the right results for non extreme values and also for extreme values when using a smaller threshold. So it's definitely not completely out there

Yeah I'm just not sure how to think about this. It seems like this function gets used in a lot of places so I'd need to look at those places to get a sense how important the large values are.

possibly because they were deemed more robust

Oh oh nice, so maybe we should be replacing grad_reg_inc_beta with this. It looks like the negative binomial cdfs and lcdfs are doing different things, which doesn't seem great. (Edit: and it does seem like inc_beta_dd* replaced grad_reg_inc_beta)

@ricardoV94
Copy link
Author

Hi @bbbales2

I am just wondering if you had any insights about the "issues" above in the meanwhile. I still couldn't quite figure out the source of the algorithm, but I am happy to help if you have any ideas on how to proceed :)

@bbbales2
Copy link
Member

@ricardoV94 thanks for the ping. I don't have any particular insights :P.

From what you've dug up, I'm thinking:

  1. If there's no simple way to make these functions converge for the test cases, make the test cases easier and fix the bugs in them
  2. grad_reg_inc_beta should probably be replaced everywhere in our code with these functions if it was worth doing in negative binomial
  3. It would be nice to pin down the algorithm so that future people digging in the code know where to start looking but if it's not easy, as long as the results match the Mathematica reference in a useful range of values I'm happy :D.

but I am happy to help

If that's an offer to make a PR, by all means :D.

1 is a definite need to do and should be pretty straightforward.

2 would be more work -- it looks like this grad_reg_inc_beta gets used a lot -- would need to check the new algorithm works numerically everywhere and also doesn't cause big slowdowns.

@sakrejda
Copy link
Contributor

A couple of comments:

  1. the function states that it's only tested up to a/b = 12500 so 15000 is beyond that range (esp. with the large z). It's not surprising you'd lower accuracy outside that range
  2. it makes sense the initial summand might be small and then get larger since the initial value and the following steps come from different formulas, I think overall that's ok (unless there's a specific error)
  3. when I've changed related functions in the past it's definitely a lot of work to track down all the places they cause problems... so excellent warning :)
    I've also found it useful to think of these as functions that always fail beyond some range, so one useful question is whether we can write a specific test that fails to produce values matching some credible reference (finite diff, mathematica, etc..) That would make for a more specific target for improvement.

@ricardoV94
Copy link
Author

I've also found it useful to think of these as functions that always fail beyond some range, so one useful question is whether we can write a specific test that fails to produce values matching some credible reference (finite diff, mathematica, etc..) That would make for a more specific target for improvement.

@sakrejda Could you expand on this? We found by accident that the discussed test point (even if given the right arguments) fails to match other implementations (i.e., finite-differences, the previous grad_reg_inc_beta and the automatic differentiation obtained in PyMC3, which all agree on the same value). Do you suggest this test condition should be kept to highlight a known convergence failure beyond the "working domain"?

@ricardoV94
Copy link
Author

ricardoV94 commented Mar 24, 2021

@bbbales2

From what you've dug up, I'm thinking:

  1. If there's no simple way to make these functions converge for the test cases, make the test cases easier and fix the bugs in them

It actually seems pretty straightforward to make them converge by making the threshold more strict. But I am completely unable to assess whether this is a worthwhile trade-off, since there is no clear justification for the initial threshold in the documentation. Changing the tests sounds fine, altough @sakrejda seems to argue such "failing" tests may still be useful to keep around.

  1. grad_reg_inc_beta should probably be replaced everywhere in our code with these functions if it was worth doing in negative binomial

I don't have a good intuition. I can do some benchmarks to see if there is an obvious difference in computational cost for "usual values". One possible advantage of the old function is that it computes the derivatives of a and b simultaneously which might be faster than calling the separate methods one at a time.

  1. It would be nice to pin down the algorithm so that future people digging in the code know where to start looking but if it's not easy, as long as the results match the Mathematica reference in a useful range of values I'm happy :D.

I will have a look at this, together with point 1 above. Perhaps a simple numerical analysis to measure the largest error / disagreement observed with this function in a given a, b, z range would be useful to add to the inline documentation of the functions.

but I am happy to help

If that's an offer to make a PR, by all means :D.

1 is a definite need to do and should be pretty straightforward.

I can give it a try.

2 would be more work -- it looks like this grad_reg_inc_beta gets used a lot -- would need to check the new algorithm works numerically everywhere and also doesn't cause big slowdowns.

We can discuss this after collecting more info perhaps.

@sakrejda
Copy link
Contributor

Changing the tests sounds fine, altough @sakrejda seems to argue such "failing" tests may still be useful to keep around.

Since there's almost always a range beyond which these functions fail it seems like the standard is to document the range in the function doc (it's a soft failure, they get less accurate and there's error in everything else, it often doesn't matter much, it can often be avoided by rescaling). I would modify the test range so the function passes and leave an issue that states that beyond a certain point the function looses accuracy. If somebody needs that region we can add a PR to fix that region.

We found by accident that the discussed test point (even if given the right arguments) fails to match other implementations (i.e., finite-differences, the previous grad_reg_inc_beta and the automatic differentiation obtained in PyMC3, which all agree on the same value).

My suggestion is: 1) update the main post on the issue to document the reference values used (with code) and where you think the range should be extended; 2) make a branch that documents the failure; and 3) create a PR that fixes the branch to pass the new tests. So far I see mentions of various reference values and I'd be happy to help fix issues with the function but I'm not currently up to documenting the issue b/c you guys seem to have a handle on that already.

I don't have a good intuition. I can do some benchmarks to see if there is an obvious difference in computational cost for "usual values". One possible advantage of the old function is that it computes the derivatives of a and b simultaneously which might be faster than calling the separate methods one at a time.

Yes, there's usually some efficiency to be gained by computing both at the same time, and sometimes you can get better stability but you'd have to compare the code between the different functions (I can help with that once it gets there if you want).

I will have a look at this, together with point 1 above. Perhaps a simple numerical analysis to measure the largest error / disagreement observed with this function in a given a, b, z range would be useful to add to the inline documentation of the functions.

The algorithm may well be that some took the power-series used for the more general function and simplified the calculation where possible. I don't know what the current math lib standards are but from my point of view getting closer to known errors for specific domains would help the Stan math library.

@sakrejda Could you expand on this?

My main point was that all the other implementations fail beyond some domain too. They just do it more or less gracefully and some of them cover a larger domain before failing.

@sakrejda
Copy link
Contributor

Some history: so I looked at this function (https://github.com/stan-dev/math/blob/9895be8ee0a27a9c7935cf19345d76fcbc85a69d/stan/math/prim/fun/inc_beta_ddb.hpp) and at grad_inc_beta and I think the original code was Michael Betancourt, and that I modified some of the current code to improve accuracy in parts of the domain (that's those if statements at the head of the function which use standard identities). grad_inc_beta currently defers to grad_2F1 and that avoids duplicating this kind of code.. I wrote that version of the algorithm and we (me and @bbbales2 I think...) updated the convergence conditions there so it might make sense to look at that version of the calculation if we want to improve the convergence checks.

So tl;dr: @ricardoV94 if you're interested in improving this algorithm in a specific direction I think that would make the lib better and I'm happy to help with the implementation.

@ricardoV94
Copy link
Author

ricardoV94 commented Mar 24, 2021

@sakrejda Improving the convergence checks (assuming it can be done) sounds good. But does it make sense to maintain the two distinct approaches in the first place?

@bbbales2
Copy link
Member

I would modify the test range so the function passes and leave an issue that states that beyond a certain point the function looses accuracy. If somebody needs that region we can add a PR to fix that region.

+1

finite-differences, the previous grad_reg_inc_beta and the automatic differentiation obtained in PyMC3, which all agree on the same value)

grad_inc_beta currently defers to grad_2F1 and that avoids duplicating this kind of code.. I wrote that version of the algorithm

Yeah I was assuming that inc_beta_ddb was just better than grad_reg_inc_beta always. It sounds like maybe this is not true or has changed since the inc_beta_ddb pull if grad_reg_inc_beta has been improved.

But does it make sense to maintain the two distinct approaches at all?

I guess if one is better in different places then two makes sense, but it might be that one of these is just better than the other everywhere.

It seems wrong that we're using one for neg_binomial_cdf and another for neg_binomial_lcdf for sure.

@sakrejda
Copy link
Contributor

That's a good question. To answer it we would need to compare inc_beta_ddb to grad_inc_beta over the same domain both in terms of accuracy and runtime. Is that something you are interested in trying? That seems to be where Ben is going with this as well. Either we get a clear answer about the need for a better inc_beta_ddb or we get to remove extra code. Ultimately it's going a be a PR with a pretty significant testing burden..

@bbbales2
Copy link
Member

It looks like inc_beta_dda is used in neg_binomial_cdf, neg_binomial_2_cdf and beta_cdf. So it seems like the function with a smaller reach. If grad_inc_beta covers that domain, then yeah I think we can just use grad_inc_beta.

@ricardoV94
Copy link
Author

On the other hand the inc_beta_dd* seem to be tested in more extreme scenarios. Not sure whether grad_inc_beta has been applied to the same cases to see how it fares.

@bbbales2
Copy link
Member

Yeah I'm definitely into an ez cleanup pull to fix tests and adjust docs if necessary.

significant testing burden

I guess the things we'd need to check are:

  1. The problem inc_beta_ddb was introduced for is solved by current grad_reg_inc_beta. This is probably a combo of sleuthing the old pull + just running all our unit tests (a unit test should have been added to catch this, but it doesn't always happen)
  2. Check performance. There's a benchmark script now so doing that should be as simple as:
benchmarks/benchmark.py --overloads Rev --skip_similar_signatures neg_binomial_lcdf

Example output looks like:

----------------------------------------------------------------------------------------------------------------
Benchmark                                                                      Time             CPU   Iterations
----------------------------------------------------------------------------------------------------------------
neg_binomial_lcdf_Prim_int_Rev_real_Rev_real/1/manual_time                 23838 ns        24529 ns        28707
neg_binomial_lcdf_Prim_int_Rev_real_Rev_vector/1/manual_time               28881 ns        30994 ns        25135
neg_binomial_lcdf_Prim_int_Rev_real_Rev_vector/4/manual_time               94405 ns        96667 ns         7047
neg_binomial_lcdf_Prim_int_Rev_real_Rev_vector/16/manual_time             357939 ns       361341 ns         1897
neg_binomial_lcdf_Prim_int_Rev_real_Rev_vector/64/manual_time            1438082 ns      1445610 ns          484
neg_binomial_lcdf_Prim_int_Rev_real_Rev_vector/256/manual_time           5635943 ns      5659356 ns          113
neg_binomial_lcdf_Prim_int_Rev_real_Rev_vector/1024/manual_time         22753745 ns     22839340 ns           31
neg_binomial_lcdf_Prim_int_Rev_real_Rev_vector/4096/manual_time         89693252 ns     89981340 ns            7
...

@sakrejda
Copy link
Contributor

I think this issue is drifting so I wanted to address the original question: it does look like the wrong argument is used in the test.

@ricardoV94
Copy link
Author

ricardoV94 commented Mar 24, 2021

I just did a very rough comparison (in Python) and the inc_beta_dd* seems to be a couple order of magnitudes faster than the grad_reg_inc_beta (even when calling both _dda and _ddb separately):

a, b, z = 150, 1.25, 0.99
digamma_alpha = sp.digamma(a)
digamma_beta = sp.digamma(b)
digamma_sum = sp.digamma(a+b)

grad_reg_inc_beta(a, b, z, digamma_alpha, digamma_beta, digamma_sum)
# (-0.002718021453572821, 0.3368083597855926)

inc_beta_ddab(a, b, z, digamma_alpha, digamma_beta, digamma_sum)
# (-0.0027178389107593466, 0.3368034464501962)

%timeit grad_reg_inc_beta(a, b, z, digamma_alpha, digamma_beta, digamma_sum)
# 59.9 ms ± 806 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit inc_beta_ddab(a, b, z, digamma_alpha, digamma_beta, digamma_sum)
# 43.6 µs ± 5.05 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Of course this may not be the case for other values + the accuracy doubts raised above

@bbbales2
Copy link
Member

Did you try the implementation here: https://github.com/canerturkmen/betaincder/blob/master/betaincder/c/beta.c ? It looks like it comes with a Python library: https://github.com/canerturkmen/betaincder/

Maybe if this is both fast and covers the parameter space we should switch to a version of it instead of inc_beta_dda or grad_reg_inc_beta?

I plugged grad_reg_inc_beta into the benchmarks in neg_binomial_cdf and a grad_reg_inc_beta-solution was also quite a bit slower (on whatever generic values its being tested here):

inc_beta_dda
-----------------------------------------------------------------------------------------------------------------------
Benchmark                                                                             Time             CPU   Iterations
-----------------------------------------------------------------------------------------------------------------------
neg_binomial_cdf_Prim_int_Rev_real_Rev_real/1/manual_time                          2971 ns         3013 ns       233350
neg_binomial_cdf_Prim_int_Rev_real_Rev_vector/1/manual_time                        3070 ns         3133 ns       227581
neg_binomial_cdf_Prim_int_Rev_real_Rev_vector/4/manual_time                       11831 ns        11901 ns        57954
neg_binomial_cdf_Prim_int_Rev_real_Rev_vector/16/manual_time                      46401 ns        46484 ns        14662
neg_binomial_cdf_Prim_int_Rev_real_Rev_vector/64/manual_time                     185235 ns       185327 ns         3707
neg_binomial_cdf_Prim_int_Rev_real_Rev_vector/256/manual_time                    742690 ns       742856 ns          913
neg_binomial_cdf_Prim_int_Rev_real_Rev_vector/1024/manual_time                  2966910 ns      2965076 ns          236

grad_inc_beta
-----------------------------------------------------------------------------------------------------------------------
Benchmark                                                                             Time             CPU   Iterations
-----------------------------------------------------------------------------------------------------------------------
neg_binomial_cdf_Prim_int_Rev_real_Rev_real/1/manual_time                          9408 ns         9447 ns        72263
neg_binomial_cdf_Prim_int_Rev_real_Rev_vector/1/manual_time                        9548 ns         9617 ns        73250
neg_binomial_cdf_Prim_int_Rev_real_Rev_vector/4/manual_time                       39634 ns        39687 ns        17772
neg_binomial_cdf_Prim_int_Rev_real_Rev_vector/16/manual_time                     156383 ns       156474 ns         4386
neg_binomial_cdf_Prim_int_Rev_real_Rev_vector/64/manual_time                     610616 ns       610688 ns         1135
neg_binomial_cdf_Prim_int_Rev_real_Rev_vector/256/manual_time                   2516604 ns      2516469 ns          280
neg_binomial_cdf_Prim_int_Rev_real_Rev_vector/1024/manual_time                 10101337 ns     10096834 ns           61

@sakrejda
Copy link
Contributor

Benchmarks on a limited set of values are not going to be very informative...

@bbbales2
Copy link
Member

not going to be very informative...

I feel informed though! I now know that in two situations grad_inc_beta was slower.

But these are all iterative algorithms, so are you saying it is likely grad_reg_inc_beta is faster in other areas? Or just that differences this radical seem fishy to you?

I don't see tolerance arguments here so it's not even clear we're computing to similar precisions or anything.

@ricardoV94
Copy link
Author

Obviously we also need to check for the accuracy of the computations, but those speed benchmarks look promising.

I'll do a more careful test soon taking into consideration whether/where the outputs agree or diverge also with finite differences and autodiff (I don't have Mathematica to compare, but maybe someone here has?) I can also check for that other implementation on that repo.

@sakrejda
Copy link
Contributor

I feel informed though! I now know that in two situations grad_inc_beta was slower.

I'm saying the number of iterations it takes the iterative algorithm to arrive at a specific accuracy depends heavily on the point in parameter space you are running the calculation for. So every time I modified these functions I ended up having to run a grid or random sample with over-sampling at the boundaries to feel confident the code changes weren't just thrashing around. It's very common to have different implementations for different parts of the domain too, so it may make sense to keep both implementations, just re-organize when they're called!

Anyway, I wasn't trying to be negative about the work you are doing, just cautioning against making decisions that turn into thrashing

@bbbales2
Copy link
Member

bbbales2 commented Mar 25, 2021

I don't have Mathematica to compare, but maybe someone here has?

I used this to compute the earlier functions: https://www.wolframcloud.com/

Seems free to use.

Edit: And derivatives copy-pasted from: https://functions.wolfram.com/GammaBetaErf/BetaRegularized/20/01/03/0001/

every time I modified these functions I ended up having to run a grid or random sample with over-sampling at the boundaries to feel confident the code changes weren't just thrashing around

Yeah makes sense. Hopefully we added unit tests for all the numeric weirdnesses we've hit along the way so we don't accidentally shrink our function domains or anything :/.

I wasn't trying to be negative about the work you are doing

Yeah no prob

@ricardoV94
Copy link
Author

ricardoV94 commented May 11, 2021

I finally got some time to do some comparisons. You can check my repo here and the comparison notebook here:

I compared the following algorithms:

  1. grad_reg_inc_beta (STAN)
  2. inc_beta (STAN)
  3. inc_beta_strict (same as inc_beta with a threshold of 1e-18 instead of the now default 1e-10)
  4. inbeder (described in https://core.ac.uk/download/pdf/6303153.pdf)

All algorithms were implemented in vanilla non-vectorized python, which hopefully makes for a fair comparison.

I evaluated the derivative w.r.t. a on the following grid:

search_grid = dict(
    z = (0.001, 0.25, 0.5, 0.75, 0.999),
    a = (1.5, 15, 150, 1500, 15000),
    b = (1.25, 12.5, 125, 1250, 12500),
)

And used the following code to obtain reference values from Wolfram Cloud:

NumberForm[
  Table[
    (
      -((Gamma[a] Gamma[a + b]) / Gamma[b])) z^a HypergeometricPFQRegularized[{a, a, 1 - b}, {1 + a, 1 + a}, z] 
      + BetaRegularized[z, a, b] (Log[z] - PolyGamma[a] + PolyGamma[a + b]
    ),
    {z, { 0.001,0.25,0.5, 0.75, 0.999}},
    {a, {1.5, 15, 150, 1500, 15000}},
    {b, {1.25, 12.5, 125, 1250, 12500}}
  ], 
  16
]

Most of the times the 4 algorithms agree closely to each other. The results of the current inc_beta tend to be a bit less precise, but changing the threshold to 1e-18 (i.e., inc_beta_strict) fixes these differences.

The grad_reg_inc_beta fails to converge more often than the alternatives (see nans in the first plot), and tends to be considerably slower. It also takes "ages" when z=0.999 and b in (1.25, 12.5) (see second plot), but maybe this would be aborted early on by this line that I did not implement:

check_2F1_converges("grad_2F1", a1, a2, b1, z);

The output of inc_beta_strict and inbeder tend to be very similar (more so than those of inc_beta_strict and grad_reg_inc_beta). In terms of performance inc_beta_strict seems sligtly faster for extreme z in (0.001, 0.999), while incbeder is faster for intermediate z in (0.25, 0.5, 0.75).

Log absolute error of derivative wrt a (using Wolfram as a reference)
image

Log of average execution time in seconds (out of 100 reps; computing both derivatives wrt a and b simultaneously for fairness):
image


You may also notice large absolute errors relative to the wolfram reference that are shared by all implementations in the first plot (bright yellow squares). When inspecting these it seems instability on the side of wolfram. For instance for z= 0.25; a = 150; b = 1250, wolfram reports a crazy large derivative of -9.0079425013435e+215, while the 4 algorithms return a derivative close to zero, which I think is correct? Here is the plot of betainc for z=0.25, b=1250 with a on the x-axis and incbeta on the y-axis:

                  wolfram: -9.0079425013435e+215
   stan_grad_reg_inc_beta: 5.64240006350661e-13
            stan_inc_beta: -6.875931379629293e-42
     stan_inc_beta_strict: -6.8759313797203e-42
                  inbeder: -6.875906440587697e-42

image

In addition, whenever the grad_reg_inc_beta seems to disagree with the alternatives and be closer to the wolfram reference, it seems both are biased away from the correct answer (e.g., z=0.25, a=15; b=12500):

                  wolfram: 5.374002228495603
   stan_grad_reg_inc_beta: 5.307335561828936
            stan_inc_beta: -0.0
     stan_inc_beta_strict: -0.0
                  inbeder: -0.0

image


Overall, I would say the inc_beta algorithm looks preferable to the grad_reg_inc_beta, but perhaps it is justified to increase the threshold from 1e-10 closer to 1e-18 for more stable results.

This change also sounds less disruptive than replacing it by the alternative inbeder even though that one has the potential to be slightly faster in some scenarios (plus it's derivation being clearly documented in the reference above).

Let me know if you think there is something else / more I should have done :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants