Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite gamma_inc_taylor and gamma_inc_asym more concisely #443

Merged
merged 17 commits into from
Oct 29, 2024
Merged
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ IrrationalConstants = "0.1, 0.2"
LogExpFunctions = "0.3.2"
OpenLibm_jll = "0.7, 0.8"
OpenSpecFun_jll = "0.5"
julia = "1.3"
julia = "1.5"
BioTurboNick marked this conversation as resolved.
Show resolved Hide resolved

[extras]
ChainRulesCore = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4"
Expand Down
98 changes: 39 additions & 59 deletions src/gamma_inc.jl
Original file line number Diff line number Diff line change
Expand Up @@ -426,39 +426,29 @@ See also: [`gamma_inc(a,x,ind)`](@ref SpecialFunctions.gamma_inc)
"""
function gamma_inc_taylor(a::Float64, x::Float64, ind::Integer)
acc = acc0[ind + 1]
wk = zeros(30)
flag = false
apn = a + 1.0
t = x/apn
wk[1] = t
loop = 2
for indx = 2:20
tolerance = 0.5acc

# compute first 21 terms
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if summation in reverse order is necessary (something like #442 (comment) would be much simpler implementation-wise)? I wonder if there is any argument/comment regarding this choice in the original Fortran code - does anyone know where the current implementation was copied from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory better for numerical stability and precision to do it this way, because otherwise you can lose the accumulated effect of the small components when they're very much smaller than the large components, but I don't know in practice if/when it affects results of this function.

Copy link
Contributor Author

@BioTurboNick BioTurboNick Feb 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've found a couple possible sources, but I think I'm gathering that the impetus was to support arbitrarily low precision. Maybe with Float64 it works pretty much fine, which this package forces. But with lower precision, maybe the errors would add up too much if floating point math wasn't carefully managed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like @Sumegh-git and @simonbyrne were writing/reviewing the PR that introduced them, if they can comment :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does anyone know where the current implementation was copied from?

ACM TOMS (FREE ACCESS): https://dl.acm.org/doi/abs/10.1145/22721.23109
Fig. 2 in the paper shows the flowchart of the algorithm as a whole.

  • gamma_inc_taylor: Equ. 15
  • gamma_inc_asym: Equ. 16

The original paper is mentioned in the code comments.
Perhaps those comments should be moved to the function documentation.

# Reference : 'Computation of the incomplete gamma function ratios and their inverse' by Armido R DiDonato, Alfred H Morris.
# Published in Journal: ACM Transactions on Mathematical Software (TOMS)
# Volume 12 Issue 4, Dec. 1986 Pages 377-393
# doi>10.1145/22721.23109

Copy link
Contributor Author

@BioTurboNick BioTurboNick Feb 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! In that case maybe this is the FORTRAN code itself? https://jblevins.org/mirror/amiller/inc_gam.f90

ts = cumprod(ntuple(i -> x / (a + i), Val(21)))

# sum the smaller terms directly
first_small_t = something(findfirst(<(1.0e-3), ts), 21)
sm = t = ts[first_small_t]
apn = a + first_small_t
while t > tolerance
BioTurboNick marked this conversation as resolved.
Show resolved Hide resolved
apn += 1.0
t *= x/apn
if t <= 1.0e-3
loop = indx
flag = true
break
end
wk[indx] = t
end
if !flag
loop = 20
end
sm = t
tol = 0.5*acc #tolerance
while true
apn += 1.0
t *= x/apn
t *= x / apn
sm += t
if t <= tol
break
end
end
for j = loop-1:-1:1
sm += wk[j]

# sum terms from small to large
last_large_t = first_small_t - 1
for j ∈ last_large_t:(-1):1
sm += ts[j]
end
p = (rgammax(a, x)/a)*(1.0 + sm)

p = (rgammax(a, x) / a) * (1.0 + sm)

return (p, 1.0 - p)
end

Expand All @@ -476,39 +466,29 @@ External links: [DLMF 8.11.2](https://dlmf.nist.gov/8.11.2)
See also: [`gamma_inc(a,x,ind)`](@ref SpecialFunctions.gamma_inc)
"""
function gamma_inc_asym(a::Float64, x::Float64, ind::Integer)
wk = zeros(30)
flag = false
acc = acc0[ind + 1]
amn = a - 1.0
t = amn/x
wk[1] = t
loop = 2
for indx = 2:20
amn -= 1.0
t *= amn/x
if abs(t) <= 1.0e-3
loop = indx
flag = true
break
end
wk[indx] = t
end
if !flag
loop = 20
end
sm = t
while true
if abs(t) < acc
break
end
amn -= 1.0
t *= amn/x
sm += t

# compute first 21 terms
ts = cumprod(ntuple(i -> (a - i) / x, Val(21)))

# sum the smaller terms directly
first_small_t = something(findfirst(x -> abs(x) < 1.0e-3, ts), 21)
Comment on lines +472 to +475
Copy link
Member

@andreasnoack andreasnoack Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good to avoid the heap allocation but this unconditionally computes all 21 elements and then subsequently does a search while the original breaks early while providing the index for free. Would it be possible to keep the benefits of the old implementation and still avoid the heap allocation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not one I was able to think of at the time, but it didn't seem to much matter in practice because the allocation was more expensive than the extra computations. I even tried a version that was a bit more conservative and in practice didn't seem to matter much at all on my system.

sm = t = ts[first_small_t]
amn = a - first_small_t
while abs(t) ≥ acc
amn -= 1.0
t *= amn / x
sm += t
end
for j = loop-1:-1:1
sm += wk[j]

# sum terms from small to large
last_large_t = first_small_t - 1
for j in last_large_t:(-1):1
sm += ts[j]
end
q = (rgammax(a, x)/x)*(1.0 + sm)

q = (rgammax(a, x) / x) * (1.0 + sm)

return (1.0 - q, q)
end

Expand Down
Loading