-
-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible incorrect argument used in test of inc_beta_ddb #2417
Comments
Hmm, I don't really know about this function. I agree it looks strange. Since you're asking the question I'm hoping maybe you can help me figure it out :D. Do you know how to go about computing these things in R by any chance?
I thought this should be equivalent to |
Looking here in the source code it looks like it should be the beta digamma math/stan/math/prim/prob/beta_cdf.hpp Line 133 in 9895be8
I actually compared it with your other function which suggests the same: https://github.com/stan-dev/math/blob/master/stan/math/prim/fun/grad_reg_inc_beta.hpp I am more used to python, so I also tested via from scipy.special import betainc
eps = 1e-5
a = 2
b = 1
z = 0.5
grad = (betainc(a, b+eps, z) - betainc(a, b-eps, z)) / (2*eps)
print(grad) The gradient is much closer using the digamma beta |
It should be the zipf::Rbeta (regularized incomplete beta) math/stan/math/prim/fun/inc_beta_ddb.hpp Lines 20 to 21 in 9895be8
|
@ricardoV94 thanks for putting the time into investigating this. I'll have a look at this hopefully later today. |
Hmm, the code here seems to work for me for small values, but for the large value tests something might be going on. It might be that this function just isn't valid out to extreme parts of parameter space (in which case we won't test it out there), or could be a bug for large values (in which case will need to fix this). I can run this code on wolframcloud and get a different thing than what the code gives (but it agrees with the finite differences):
That code comes from: https://functions.wolfram.com/GammaBetaErf/BetaRegularized/20/01/03/0001/ R finite differences give: 2.00849792806346e-06 Will have to continue the investigation later. R code is:
Stan code is:
|
I noticed that when adapting your code to pymc3. The difference seems to go away if you make the threshold smaller in the STAN functions from 1e-10 to 1e-15 math/stan/math/prim/fun/inc_beta_ddb.hpp Line 59 in 9895be8
And on the dda Here is the python output of the same algorithm with more stringent thresholds: # threshold = 1e-10
_betainc_ddb(15000, 1.25, 0.999, digamma(1.25), digamma(15000+1.25))
# -2.1355523032324224
# threshod = 1e-12
_betainc_ddb(15000, 1.25, 0.999, digamma(1.25), digamma(15000+1.25))
# -2.3924212087640326e-07
# threshold = 1e-15
_betainc_ddb(15000, 1.25, 0.999, digamma(1.25), digamma(15000+1.25))
# 2.0060173144203154e-06
# threshold = 1e-18
_betainc_ddb(15000, 1.25, 0.999, digamma(1.25), digamma(15000+1.25))
# 2.008495010807732e-06 I assumed that was a speed tradeoff on your side. Or is it a red flag that the original difference was so large? |
Something looks a bit funny with the _betainc_ddb(15000, 1.25, 0.999, digamma(1.25), digamma(15000+1.25))
|
Oof, yeah, I'm not sure where this power series came from or if someone derived it by hand, but I wasn't able to quickly identify the algorithm or anything here. I wonder if the convergence criteria could be written in terms of It seems like there's a problem here with the code, however I'm not quite sure what this is affecting yet. I went grepping in the code and I found grad_reg_inc_beta which from the comments seems like the same function but is computed a different way lol. I'm not sure what's going on. Anyway at this point this needs a more careful inspection in the code. If you figure out what's going on let me know. Searching around I found this: https://core.ac.uk/download/pdf/82094101.pdf which might be something more concrete to base an implementation on :P. |
In any case, the current implementation (_dd*) seems to give the right results for non extreme values and also for extreme values when using a smaller threshold. So it's definitely not completely out there. I have no intuition about what could be used as a better convergence criteria however. I thought of something silly like checking for the threshold but also making sure the summand has decreased since the last iteration. I found that reference you linked to but I am not sure how useful it is, since the goal seems to extend the derivatives to negative values of the incomplete beta. I found this older reference (https://core.ac.uk/download/pdf/6303153.pdf) with a C implementation here (https://github.com/canerturkmen/betaincder/blob/master/betaincder/c/beta.c) more promising as a benchmark. The current algorithm in STAN gives the right results for the tabulated values in Table 1 of that reference. Edit: It seems the current functions were added by @betanalpha in here: b465d12 and tweaked later a couple of times (without a substantial change in their logic), possibly because they were deemed more robust than the |
Woops, yeah, that's the one I meant to link. Thanks for finding that C implementation.
Yeah I'm just not sure how to think about this. It seems like this function gets used in a lot of places so I'd need to look at those places to get a sense how important the large values are.
Oh oh nice, so maybe we should be replacing |
Hi @bbbales2 I am just wondering if you had any insights about the "issues" above in the meanwhile. I still couldn't quite figure out the source of the algorithm, but I am happy to help if you have any ideas on how to proceed :) |
@ricardoV94 thanks for the ping. I don't have any particular insights :P. From what you've dug up, I'm thinking:
If that's an offer to make a PR, by all means :D. 1 is a definite need to do and should be pretty straightforward. 2 would be more work -- it looks like this |
A couple of comments:
|
@sakrejda Could you expand on this? We found by accident that the discussed test point (even if given the right arguments) fails to match other implementations (i.e., finite-differences, the previous |
It actually seems pretty straightforward to make them converge by making the threshold more strict. But I am completely unable to assess whether this is a worthwhile trade-off, since there is no clear justification for the initial threshold in the documentation. Changing the tests sounds fine, altough @sakrejda seems to argue such "failing" tests may still be useful to keep around.
I don't have a good intuition. I can do some benchmarks to see if there is an obvious difference in computational cost for "usual values". One possible advantage of the old function is that it computes the derivatives of
I will have a look at this, together with point 1 above. Perhaps a simple numerical analysis to measure the largest error / disagreement observed with this function in a given
I can give it a try.
We can discuss this after collecting more info perhaps. |
Since there's almost always a range beyond which these functions fail it seems like the standard is to document the range in the function doc (it's a soft failure, they get less accurate and there's error in everything else, it often doesn't matter much, it can often be avoided by rescaling). I would modify the test range so the function passes and leave an issue that states that beyond a certain point the function looses accuracy. If somebody needs that region we can add a PR to fix that region.
My suggestion is: 1) update the main post on the issue to document the reference values used (with code) and where you think the range should be extended; 2) make a branch that documents the failure; and 3) create a PR that fixes the branch to pass the new tests. So far I see mentions of various reference values and I'd be happy to help fix issues with the function but I'm not currently up to documenting the issue b/c you guys seem to have a handle on that already.
Yes, there's usually some efficiency to be gained by computing both at the same time, and sometimes you can get better stability but you'd have to compare the code between the different functions (I can help with that once it gets there if you want).
The algorithm may well be that some took the power-series used for the more general function and simplified the calculation where possible. I don't know what the current math lib standards are but from my point of view getting closer to known errors for specific domains would help the Stan math library.
My main point was that all the other implementations fail beyond some domain too. They just do it more or less gracefully and some of them cover a larger domain before failing. |
Some history: so I looked at this function (https://github.com/stan-dev/math/blob/9895be8ee0a27a9c7935cf19345d76fcbc85a69d/stan/math/prim/fun/inc_beta_ddb.hpp) and at grad_inc_beta and I think the original code was Michael Betancourt, and that I modified some of the current code to improve accuracy in parts of the domain (that's those if statements at the head of the function which use standard identities). So tl;dr: @ricardoV94 if you're interested in improving this algorithm in a specific direction I think that would make the lib better and I'm happy to help with the implementation. |
@sakrejda Improving the convergence checks (assuming it can be done) sounds good. But does it make sense to maintain the two distinct approaches in the first place? |
+1
Yeah I was assuming that inc_beta_ddb was just better than grad_reg_inc_beta always. It sounds like maybe this is not true or has changed since the inc_beta_ddb pull if
I guess if one is better in different places then two makes sense, but it might be that one of these is just better than the other everywhere. It seems wrong that we're using one for neg_binomial_cdf and another for neg_binomial_lcdf for sure. |
That's a good question. To answer it we would need to compare |
It looks like inc_beta_dda is used in |
On the other hand the |
Yeah I'm definitely into an ez cleanup pull to fix tests and adjust docs if necessary.
I guess the things we'd need to check are:
Example output looks like:
|
I think this issue is drifting so I wanted to address the original question: it does look like the wrong argument is used in the test. |
I just did a very rough comparison (in Python) and the a, b, z = 150, 1.25, 0.99
digamma_alpha = sp.digamma(a)
digamma_beta = sp.digamma(b)
digamma_sum = sp.digamma(a+b)
grad_reg_inc_beta(a, b, z, digamma_alpha, digamma_beta, digamma_sum)
# (-0.002718021453572821, 0.3368083597855926)
inc_beta_ddab(a, b, z, digamma_alpha, digamma_beta, digamma_sum)
# (-0.0027178389107593466, 0.3368034464501962)
%timeit grad_reg_inc_beta(a, b, z, digamma_alpha, digamma_beta, digamma_sum)
# 59.9 ms ± 806 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit inc_beta_ddab(a, b, z, digamma_alpha, digamma_beta, digamma_sum)
# 43.6 µs ± 5.05 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) Of course this may not be the case for other values + the accuracy doubts raised above |
Did you try the implementation here: https://github.com/canerturkmen/betaincder/blob/master/betaincder/c/beta.c ? It looks like it comes with a Python library: https://github.com/canerturkmen/betaincder/ Maybe if this is both fast and covers the parameter space we should switch to a version of it instead of inc_beta_dda or grad_reg_inc_beta? I plugged grad_reg_inc_beta into the benchmarks in neg_binomial_cdf and a grad_reg_inc_beta-solution was also quite a bit slower (on whatever generic values its being tested here):
|
Benchmarks on a limited set of values are not going to be very informative... |
I feel informed though! I now know that in two situations grad_inc_beta was slower. But these are all iterative algorithms, so are you saying it is likely I don't see tolerance arguments here so it's not even clear we're computing to similar precisions or anything. |
Obviously we also need to check for the accuracy of the computations, but those speed benchmarks look promising. I'll do a more careful test soon taking into consideration whether/where the outputs agree or diverge also with finite differences and autodiff (I don't have Mathematica to compare, but maybe someone here has?) I can also check for that other implementation on that repo. |
I'm saying the number of iterations it takes the iterative algorithm to arrive at a specific accuracy depends heavily on the point in parameter space you are running the calculation for. So every time I modified these functions I ended up having to run a grid or random sample with over-sampling at the boundaries to feel confident the code changes weren't just thrashing around. It's very common to have different implementations for different parts of the domain too, so it may make sense to keep both implementations, just re-organize when they're called! Anyway, I wasn't trying to be negative about the work you are doing, just cautioning against making decisions that turn into thrashing |
I used this to compute the earlier functions: https://www.wolframcloud.com/ Seems free to use. Edit: And derivatives copy-pasted from: https://functions.wolfram.com/GammaBetaErf/BetaRegularized/20/01/03/0001/
Yeah makes sense. Hopefully we added unit tests for all the numeric weirdnesses we've hit along the way so we don't accidentally shrink our function domains or anything :/.
Yeah no prob |
I finally got some time to do some comparisons. You can check my repo here and the comparison notebook here: I compared the following algorithms:
All algorithms were implemented in vanilla non-vectorized python, which hopefully makes for a fair comparison. I evaluated the derivative w.r.t. a on the following grid: search_grid = dict(
z = (0.001, 0.25, 0.5, 0.75, 0.999),
a = (1.5, 15, 150, 1500, 15000),
b = (1.25, 12.5, 125, 1250, 12500),
) And used the following code to obtain reference values from Wolfram Cloud: NumberForm[
Table[
(
-((Gamma[a] Gamma[a + b]) / Gamma[b])) z^a HypergeometricPFQRegularized[{a, a, 1 - b}, {1 + a, 1 + a}, z]
+ BetaRegularized[z, a, b] (Log[z] - PolyGamma[a] + PolyGamma[a + b]
),
{z, { 0.001,0.25,0.5, 0.75, 0.999}},
{a, {1.5, 15, 150, 1500, 15000}},
{b, {1.25, 12.5, 125, 1250, 12500}}
],
16
] Most of the times the 4 algorithms agree closely to each other. The results of the current The math/stan/math/prim/fun/grad_2F1.hpp Line 41 in 9207570
The output of Log absolute error of derivative wrt a (using Wolfram as a reference) Log of average execution time in seconds (out of 100 reps; computing both derivatives wrt a and b simultaneously for fairness): You may also notice large absolute errors relative to the wolfram reference that are shared by all implementations in the first plot (bright yellow squares). When inspecting these it seems instability on the side of wolfram. For instance for
In addition, whenever the
Overall, I would say the This change also sounds less disruptive than replacing it by the alternative Let me know if you think there is something else / more I should have done :) |
I think an incorrect argument is being passed to the function calls inc_beta_ddb_test:
math/test/unit/math/prim/fun/inc_beta_ddb_test.cpp
Lines 18 to 21 in de404cc
Shouldn't the fourth argument be
digamma(small_b)
instead ofdigamma(small_a)
(and similarly to the other tests)?Obviously this does not mean anything for the function itself, but it makes it difficult to check independently what the expected value should have been.
Apologies if I misunderstood something :)
The text was updated successfully, but these errors were encountered: