Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add regularized incomplete beta and gamma functions. Use them and their auxiliary functions as much possible. #148

Closed
wants to merge 1 commit into from

Conversation

andreasnoack
Copy link
Member

This is a big one. The three major functions are the incomplete beta and gamma functions and the inverse of the last, but these bring some auxiliary functions which are of interest by their own rights. For example, some are useful for precise calculations of pdfs. Please have a look at the headers and see if it can be used anywhere else.

I have done a lot of testing of the functions agains Rmath and the Fortran version of NSWC, but there are many possible combinations of parameter values so I cannot guarantee that all error are gone. When the new implementation differed (relatively) from the old by more than 1e-14 I compared to Mathematica and in all cases both versions were inaccurate by more that they differed from each other.

I have also written fallback quantile functions. The continuous one is just Newton's method and the discrete is a normal approximation and a search. The two quantile functions also allow for a fallback random number method. I did this because I saw that a couple of the R's quantile functions and RNGs worked through "generic" methods. Please consider the choice of initial value in the continuous quantile function. I am not sure my choice is the right thing to do.

What remains is specialised RNG's for at least binomial and Poisson. The code R uses is not free enough for our needs.

@johnmyleswhite
Copy link
Member

This is a pretty amazing body of work. Well done!

My first order reaction is that (1) these new special functions should probably be in Base and (2) it would be great to use less Fortran-like names.

@dmbates
Copy link
Contributor

dmbates commented Sep 17, 2013

Indeed, thanks @andreasnoackjensen . This is amazing.

@simonbyrne
Copy link
Member

Agreed, thanks @andreasnoackjensen, it's some very impressive work.

I agree with @johnmyleswhite on the names. There are a couple of cases where multiple dispatch could be used to cut down on prefixes. We could also take hints from functions like log1p, so that dgmln1 could be lgamma1p.

@lindahua
Copy link
Contributor

This is really impressive! The 3000+ lines of codes in specialfuns.jl is amazing.

Some of these functions would have broader use than within this package -- actually I think they deserve in Base or a package of itself (say SpecialFunctions.jl) ?

cc @StefanKarpinski @ViralBShah

@simonbyrne
Copy link
Member

Do we want to merge this straight into master, or into the rm-rmath branch first?

@simonbyrne
Copy link
Member

These are probably the most useful functions, which we might want to consider renaming (/'s indicate Float32 and Float64 versions), I've also noted some possible names:

  • gratio / dgrat = incomplete gamma function ratio
  • rcomp / drcomp = exp(-x) * x^a / gamma(a)
    • I don't know of a good name for this function
  • dpdel = lgamma(x) - (x-0.5)*log(x) - x + 0.5*log(2pi) (Stirling series)
    • I've called this lstirling in the rm-rmath branch, with stirling(x) = exp(lstirling(x)), but I'm open to better names for this.
  • rlog / drlog = x - 1 - log(x)
    • I've called its negation logmxp1 in rm-rmath
  • gaminv / dginv = inverse gamma function ratio
  • bratio = beta function ratio
  • brcomp = x^a * y^b / beta(a,b)
  • drlog1 = x - log(1+x)
    • Its negation could be log1pmx?
  • dbcorr = dpdel(a) + dpdel(b) - dpdel(a+b)
  • dlgdiv = log(gamma(b)/gamma(a+b))
  • gam1 / dgam1 = 1/gamma(1+x)
    • rgamma1p? (r for reciprocal)
  • gamln1 / dgmln1 = lgamma(1+x)
    • lgamma1p?
  • exparg / dxparg = log(realmin()) : log(realmax())) (depending on Boolean argument)
    • logrealmax / logrealmin? It would also be great to define this for BigFloats (actually, what I really need is the largest x such that exp(x) < Inf).
    • UPDATE: It turns out in the documentation that is how they actually define it. So how about realmaxexp?

One potential problem is that some of these functions are only valid for certain values (e.g. dpdel is only valid for arguments > 10).

Also, it seems the following already have equivalents in Base:

  • alnrel / dlnrel = log1p
  • dbetln = lbeta

@andreasnoack
Copy link
Member Author

@simonbyrne I don't know what is the easiest way to handle the merging. Please tell me if it is better to make the pull request against rm-rmath.

I kept the old names to make it easier to compare with the source, but now that the translation is done, I think it is good to change the names and use multiple dispatch. I think your name suggestions are fine. Some specific comments

  • rcomp and brcomp are the two pdfs multiplied with x and x(1-x) respectively. We could keep the names and export the pdfs for those two distributions or just make them available through pdf.
  • Maybe we could check the argument in dpdel and throw an exception if it is outside the domain valid for the approximation.
  • gamln1/dgmln1 is not worth exporting as I just defined it as a dummy function. As the only auxiliary function dgmln1 was not more precise than than just calculating it directly.
  • I would also like realmax and realmin for BigFloat because then gratio and gaminv would work for that type as well. Maybe I should just remove exparg and call log(realmax/min) directly.
  • I just missed log1p. I have tried use build in functions as much as possible, e.g. the error function. I'll update that. However, dbetln is another thing. In Base lbeta is calculated from lgamma and precision is lost and therefore dbetln is a better version of that function. Still, I guess that for BigFloat it is better to use the version from base because MPFR ships with a gamma function.

@simonbyrne
Copy link
Member

I think I've made some conflicting changes, so I'll try to merge it in manually.

  • Ideally, it would be great to extend the functions to the whole domain, but in the meantime I think throwing an error is probably best.
  • A true lgamma1p function could be useful, as lgamma(1+x) is inaccurate for small x.
  • In that case, perhaps we should change the lbeta in Base (leaving the existing definition for BigFloats only)?

@simonbyrne
Copy link
Member

How about:

import Base.realmax
realmax(::Type{BigFloat}) = prevfloat(inf(BigFloat))
realmaxexp(::Type{BigFloat}) = with_bigfloat_rounding(()-> log(realmax(BigFloat)), RoundDown)

@andreasnoack
Copy link
Member Author

Just a clarification to avoid double work. Are you manually copying this into rm-math? If so, I will not do updates on this pull request.

I just had a look at the source and it defines dgmln(x)=-log1p(dgam1(x)) for the interval -0.5 .le. x .le. 1.5. Outside that interval I think lgamma(1+x) is fine.

The realmax for BigFloat seems fine. Can you also do a realmin?

@simonbyrne
Copy link
Member

I've merged it in via git merge, so feel free to do it on either branch.

You could try nextfloat(zero(BigFloat)) for realmin.

@andreasnoack
Copy link
Member Author

I think that is too small. At least for Float64, I get

julia> realmin()
2.2250738585072014e-308

julia> nextfloat(zero(Float64))
5.0e-324

@simonbyrne
Copy link
Member

Ahh, I forgot about subnormals.

However it seems that MPFR doesn't do subnormals (or at least if it does, it doesn't do them the same way), as for regular IEEE-754 floats:

julia> z = zero(Float64)
0.0

julia> nextfloat(zero(Float64))
5.0e-324

julia> nextfloat(nextfloat(zero(Float64)))
1.0e-323

the second one doubles the first, wheras for BigFloats

julia> z = zero(BigFloat)
0e+00 with 256 bits of precision

julia> nextfloat(z)
2.382564904887951073216169781732674520415196125559239787955023752600945386104324e-323228497 with 256 bits of precision

julia> nextfloat(nextfloat(z))
2.382564904887951073216169781732674520415196125559239787955023752600945386104366e-323228497 with 256 bits of precision

there isn't a loss of granularity.

Might want to get someone who understands this stuff to confirm it though...

@lindahua
Copy link
Contributor

As the file specialfuns.jl gets much bigger, it would be easier to read and maintain, if we divide the file into smaller ones, and put them into a directory specialfuns.

@dmbates
Copy link
Contributor

dmbates commented Sep 18, 2013

Has there been further discussion of which of the special functions should go into Base? In particular have @ViralBShah or @StefanKarpinski expressed opinions yet? It may be my parochial point of view but I feel the incomplete beta function and incomplete gamma function should be in Base along with Bessel functions, etc. but my anti-Bessel function stance may have been influenced by a poorly-taught undergraduate applied math course.

@StefanKarpinski
Copy link

cc: @JeffBezanson as well. Have to catch up on this thread myself.

@StefanKarpinski
Copy link

It may be desirable to open an issue against https://github.com/JuliaLang/julia, liking back to this discussion, in order to have a little more visibility.

@JeffBezanson
Copy link

What is the original fortran code some of this is based on? Can we call it?

@lindahua
Copy link
Contributor

@andreasnoackjensen Would you please incorporate the latest changes to the master to make this mergeable again?

@andreasnoack
Copy link
Member Author

@simonbyrne You did some further work on this, right? What is the status on that?

@lindahua
Copy link
Contributor

My refactoring was not related to this PR. But I have moved stuff around. In particular, I move generic functions (and fallbacks) to src/univariates.jl. I think that's what blocks this PR.

@simonbyrne
Copy link
Member

I did merge this into #138, but I've since abandoned that in favour a piecewise approach. I've added some of the utility functions (log1pmx and lstirling_asym are based on functions from this), though I've found that the NSWC code is not always optimal. Part of the reason is that it predates the IEEE floating point standard, and so was written without niceties as gradual underflow and exact round-to-nearest arithmetic: you can see this from lines such as Q = 0.5 - (0.5 - P).

I've been slowly working on replacing these and comparing to alternative implementations (Rmath, Boost, GSL, etc.), but it's a slow process. We should probably implement some automatic tests that will compute error at some large number of points (similar to what Boost does): this would require some reference BigFloat implementation, but it might be worth the effort.

@lindahua
Copy link
Contributor

@andreasnoack Can you close this and make a PR to the StatsFuns package?

@andreasnoack
Copy link
Member Author

I'll close her, but please keep the branch. I don't know when I'll prepare the new pull request.

@matbesancon matbesancon deleted the anj/beta branch November 16, 2018 21:21
@andreasnoack andreasnoack restored the anj/beta branch September 22, 2020 07:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature: new distribution Feature request for a new distribution feature: new generic function Feature request for a new generic function (i.e. with methods for all distributions)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants