-
Notifications
You must be signed in to change notification settings - Fork 422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with support of Beta distribution when alpha < 1 or beta < 1 #1003
Comments
I just ran into the same issue: using Distributions
using Random
using Test
Random.seed!(1234)
v = rand(Beta(0.01, 1.01), 10_000)
@test_broken !any(iszero, v) The standard implementation in R does not suffer from the same problem: set.seed(1234)
v <- rbeta(n=10000, shape1=0.01, shape2=1.01)
0 %in% v && stop("incorrect samples") It seems the standard implementation in R uses the algorithms BB and BC from |
I think the main issue with the current implementation is that the Gamma sampler returns incorrect values: using Distributions
using Random
using Test
Random.seed!(1234)
v = rand(Gamma(0.01, 1.0), 10_000)
@test_broken !any(iszero, v) |
Why do you conclude that the values are incorrect? The examples seems similar to julia> any(iszero, rand(10000))
false There is zero probability at zero. The probability in a small neighborhood looks okay julia> x = rand(Gamma(0.01, 1.0), 10000);
julia> mean(t -> t < eps(), x)
0.7067
julia> cdf(Gamma(0.01, 1.0), eps())
0.7013514054165827 |
The problem is that the sampler returns values of 0, which are not in the support of the Gamma distribution. In the case of the Beta distribution, for the parameter values mentioned above, the current sampling algorithm uses the Gamma sampler; if that one returns a value of 0 outside of the support of the Gamma distribution, the Beta sampler returns also a value of 0, which is not in the support of the Beta distribution. |
Doh. I misread your test. Indeed, it doesn't look right. Would be good to figure out why that happens. |
The problem is that the probability of observing a value smaller than the smallest (positive) julia> cdf(Gamma(0.01, 1.0), floatmin(Float64))
0.0008432274068068664 so I'm not sure if there is much we can do for the Gamma. Maybe we can switch to a different algorithm for the Beta for these parameter values. |
According to some quick debugging, the problem seems to be Distributions.jl/src/samplers/gamma.jl Line 210 in a2304a8
Here x is sampled from Gamma(a + 1, theta) , s.nia = -1 / a , and e = randexp() . For a = 0.01 sampling of x seems to be fine, but exp(s.nia * e) evaluates to 0 from time to time.
|
Looks like this carries over to the Beta julia> cdf(Beta(0.01, 1.01), floatmin(Float64))
0.0008385787538914535 so I'm not really sure what an be done. |
Hmm I see the problem but I still think that a sampler should only return valid samples (or throw an error), but not yield samples that are not in the support of the distribution. |
In addition, depending on the use case, one could sample real-valued Of course, that would only be helpful if the downstream calculations are based on the logarithm of the samples from the Gamma distribution or the logit of the samples from the Beta distribution (which they were in my particular use case). |
The algorithm in https://arxiv.org/pdf/1302.1884.pdf seems to be an efficient method for sampling |
Seems like we could just replace 0 with the next-largest float. Is there a problem with this approach? |
This would correspond to a truncated Beta distribution. As mentioned e.g. in #1003 (comment), the mass between 0 and |
Maybe we should just support sampling of |
Right, but is there anything we can do about that mass? We can't return numbers we can't represent. The two options I can see are:
(Unless you're saying the current method is inaccurate for subnormal numbers, and are proposing we return more accurate subnormal numbers?) |
I am having a numerical issue with sampling from a Beta distribution when alpha or beta is less than 1. E.g. sometimes a sample will come back as exactly 1.0 but the support of the Beta distribution does not include 1 when beta is less than 1 and the pdf evaluates to Inf due to division by 0.
The text was updated successfully, but these errors were encountered: