Issue with support of Beta distribution when alpha < 1 or beta < 1 #1003

MFairley · 2019-10-29T18:10:47Z

I am having a numerical issue with sampling from a Beta distribution when alpha or beta is less than 1. E.g. sometimes a sample will come back as exactly 1.0 but the support of the Beta distribution does not include 1 when beta is less than 1 and the pdf evaluates to Inf due to division by 0.

devmotion · 2019-11-20T11:06:57Z

I just ran into the same issue:

using Distributions
using Random
using Test

Random.seed!(1234)
v = rand(Beta(0.01, 1.01), 10_000)
@test_broken !any(iszero, v)

The standard implementation in R does not suffer from the same problem:

set.seed(1234)
v <- rbeta(n=10000, shape1=0.01, shape2=1.01)
0 %in% v && stop("incorrect samples")

It seems the standard implementation in R uses the algorithms BB and BC from
R. C. H. Cheng (1978). Generating beta variates with nonintegral shape parameters. Communications of the ACM, 21, 317–322. In general, it might be interesting to have a look at Ying-Chao Hung, Narayanaswamy Balakrishnan & Yi-Te Lin (2009). Evaluation of Beta Generation Algorithms. Communications in Statistics - Simulation and Computation, 38:4, 750-770 for a comparison of different algorithms for sampling from the Beta distribution.

devmotion · 2019-11-20T11:44:41Z

I think the main issue with the current implementation is that the Gamma sampler returns incorrect values:

using Distributions
using Random
using Test

Random.seed!(1234)
v = rand(Gamma(0.01, 1.0), 10_000)
@test_broken !any(iszero, v)

andreasnoack · 2019-11-20T12:18:44Z

I think the main issue with the current implementation is that the Gamma sampler returns incorrect values:

Why do you conclude that the values are incorrect? The examples seems similar to

julia> any(iszero, rand(10000))
false

There is zero probability at zero. The probability in a small neighborhood looks okay

julia> x = rand(Gamma(0.01, 1.0), 10000);

julia> mean(t -> t < eps(), x)
0.7067

julia> cdf(Gamma(0.01, 1.0), eps())
0.7013514054165827

devmotion · 2019-11-20T12:28:24Z

The problem is that the sampler returns values of 0, which are not in the support of the Gamma distribution. In the case of the Beta distribution, for the parameter values mentioned above, the current sampling algorithm uses the Gamma sampler; if that one returns a value of 0 outside of the support of the Gamma distribution, the Beta sampler returns also a value of 0, which is not in the support of the Beta distribution.

andreasnoack · 2019-11-20T12:41:29Z

Doh. I misread your test. Indeed, it doesn't look right. Would be good to figure out why that happens.

andreasnoack · 2019-11-20T12:52:05Z

The problem is that the probability of observing a value smaller than the smallest (positive) Float64 is pretty large.

julia> cdf(Gamma(0.01, 1.0), floatmin(Float64))
0.0008432274068068664

so I'm not sure if there is much we can do for the Gamma. Maybe we can switch to a different algorithm for the Beta for these parameter values.

devmotion · 2019-11-20T12:52:16Z

According to some quick debugging, the problem seems to be

Distributions.jl/src/samplers/gamma.jl

Line 210 in a2304a8

x*exp(s.nia*e)

Here x is sampled from Gamma(a + 1, theta), s.nia = -1 / a, and e = randexp(). For a = 0.01 sampling of x seems to be fine, but exp(s.nia * e) evaluates to 0 from time to time.

andreasnoack · 2019-11-20T12:58:21Z

Looks like this carries over to the Beta

julia> cdf(Beta(0.01, 1.01), floatmin(Float64))
0.0008385787538914535

so I'm not really sure what an be done.

devmotion · 2019-11-20T13:35:07Z

Hmm I see the problem but I still think that a sampler should only return valid samples (or throw an error), but not yield samples that are not in the support of the distribution.

devmotion · 2019-11-21T08:39:59Z

In addition, depending on the use case, one could sample real-valued X with exp(X) ~ Gamma(0.01, 1) directly, i.e., without sampling from Gamma(0.01, 1) (by, e.g., sampling log(x) + s.nia * e in the example above). This would then allow to sample real-valued Y with logistic(Y) ~ Beta(0.01, 1.01) by sampling X1 with exp(X1) ~ Gamma(0.01, 1) and X2 with exp(X2) ~ Gamma(1.01, 1) and setting Y = X1 - X2.

Of course, that would only be helpful if the downstream calculations are based on the logarithm of the samples from the Gamma distribution or the logit of the samples from the Beta distribution (which they were in my particular use case).

devmotion · 2020-05-31T10:10:45Z

The algorithm in https://arxiv.org/pdf/1302.1884.pdf seems to be an efficient method for sampling X with exp(X) ~ Gamma(a, 1) for small shape parameter a directly, without sampling exp(X) first. Hence X would be guaranteed to be finite even for very small shape parameters. According to the publication, there is a cut-off at a = 0.3 where the acceptance rate falls below the one of algorithm 3 by Kundu and Gupta.

ParadaCarleton · 2023-09-04T00:47:23Z

Hmm I see the problem but I still think that a sampler should only return valid samples (or throw an error), but not yield samples that are not in the support of the distribution.

Seems like we could just replace 0 with the next-largest float. Is there a problem with this approach?

devmotion · 2023-09-04T08:41:33Z

This would correspond to a truncated Beta distribution. As mentioned e.g. in #1003 (comment), the mass between 0 and floatmin(Float64) can be non-negligible.

devmotion · 2023-09-04T08:46:05Z

Maybe we should just support sampling of inv-logistic(X) as discussed in #1003 (comment) and point users to it for the problematic parameter choices.

ParadaCarleton · 2023-09-06T17:59:40Z

As mentioned e.g. in #1003 (comment), the mass between 0 and floatmin(Float64) can be non-negligible.

Right, but is there anything we can do about that mass? We can't return numbers we can't represent.

The two options I can see are:

Rounding everything that's equal to zero(type) to eps(zero(type))
Throw an error whenever we have zero(type), so users know there's potential numerical inaccuracies.

(Unless you're saying the current method is inaccurate for subnormal numbers, and are proposing we return more accurate subnormal numbers?)

matbesancon added the bug label Mar 19, 2020

belledon mentioned this issue Jan 30, 2021

sampling from beta distribution leads to +Inf logpdf probcomp/Gen.jl#350

Closed

chelate mentioned this issue Dec 11, 2023

random log-gamma for taming underflow issues #1810

Open

yahrMason mentioned this issue Jan 24, 2024

Issue sampling from model using Beta distribution with small alpha values TuringLang/Turing.jl#2159

Open

devmotion mentioned this issue Aug 17, 2024

Fix Dirichlet rand overflows #1702 #1886

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with support of Beta distribution when alpha < 1 or beta < 1 #1003

Issue with support of Beta distribution when alpha < 1 or beta < 1 #1003

MFairley commented Oct 29, 2019

devmotion commented Nov 20, 2019

devmotion commented Nov 20, 2019

andreasnoack commented Nov 20, 2019

devmotion commented Nov 20, 2019

andreasnoack commented Nov 20, 2019

andreasnoack commented Nov 20, 2019

devmotion commented Nov 20, 2019

andreasnoack commented Nov 20, 2019

devmotion commented Nov 20, 2019

devmotion commented Nov 21, 2019

devmotion commented May 31, 2020

ParadaCarleton commented Sep 4, 2023

devmotion commented Sep 4, 2023

devmotion commented Sep 4, 2023

ParadaCarleton commented Sep 6, 2023 •

edited

Loading

Issue with support of Beta distribution when alpha < 1 or beta < 1 #1003

Issue with support of Beta distribution when alpha < 1 or beta < 1 #1003

Comments

MFairley commented Oct 29, 2019

devmotion commented Nov 20, 2019

devmotion commented Nov 20, 2019

andreasnoack commented Nov 20, 2019

devmotion commented Nov 20, 2019

andreasnoack commented Nov 20, 2019

andreasnoack commented Nov 20, 2019

devmotion commented Nov 20, 2019

andreasnoack commented Nov 20, 2019

devmotion commented Nov 20, 2019

devmotion commented Nov 21, 2019

devmotion commented May 31, 2020

ParadaCarleton commented Sep 4, 2023

devmotion commented Sep 4, 2023

devmotion commented Sep 4, 2023

ParadaCarleton commented Sep 6, 2023 • edited Loading

ParadaCarleton commented Sep 6, 2023 •

edited

Loading