Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix mode of LKJCholesky #1938

Merged
merged 3 commits into from
Jan 17, 2025
Merged

Fix mode of LKJCholesky #1938

merged 3 commits into from
Jan 17, 2025

Conversation

devmotion
Copy link
Member

@devmotion devmotion commented Jan 14, 2025

For LKJ the mode is only defined (and then identical to the identity matrix) if η > 1: If 0 < η < 1, the identity matrix is a trough of the density, and if η = 1 then it's a uniform distribution of correlation matrices (see e.g. https://mc-stan.org/docs/functions-reference/correlation_matrix_distributions.html#probability-density-function). Therefore for LKJ mode is defined only if η > 1:

julia> mode(LKJ(1, 0.5))
ERROR: DomainError with 0.5:
LKJ: mode is defined only when η > 1.
Stacktrace:
 [1] #545
   @ ~/.julia/packages/Distributions/cWeit/src/matrix/lkj.jl:79 [inlined]
 [2] check_args
   @ ~/.julia/packages/Distributions/cWeit/src/utils.jl:89 [inlined]
 [3] #mode#544
   @ ~/.julia/packages/Distributions/cWeit/src/matrix/lkj.jl:79 [inlined]
 [4] mode(d::LKJ{Float64, Int64})
   @ Distributions ~/.julia/packages/Distributions/cWeit/src/matrix/lkj.jl:78
 [5] top-level scope
   @ REPL[48]:1

julia> mode(LKJ(1, 1.5))
1×1 Matrix{Float64}:
 1.0

For LKJCholesky, however, on the master branch mode always returns the identity matrix, irrespective of the parameters:

julia> mode(LKJCholesky(1, 0.5))
LinearAlgebra.Cholesky{Float64, Matrix{Float64}}
L factor:
1×1 LinearAlgebra.LowerTriangular{Float64, Matrix{Float64}}:
 1.0

julia> mode(LKJCholesky(1, 1.5))
LinearAlgebra.Cholesky{Float64, Matrix{Float64}}
L factor:
1×1 LinearAlgebra.LowerTriangular{Float64, Matrix{Float64}}:
 1.0

This PR fixes the problem, and defines mean for LKJCholesky which is currently missing and should always return the identity matrix:

julia> mean(LKJ(1, 0.5))
1×1 Matrix{Float64}:
 1.0

julia> mean(LKJ(1, 1.5))
1×1 Matrix{Float64}:
 1.0

julia> mean(LKJCholesky(1, 0.5))
ERROR: MethodError: no method matching iterate(::LKJCholesky{Float64})
[...]

julia> mean(LKJCholesky(1, 1.5))
ERROR: MethodError: no method matching iterate(::LKJCholesky{Float64})
[...]

@codecov-commenter
Copy link

codecov-commenter commented Jan 14, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 86.02%. Comparing base (957f0c0) to head (ed0ab6d).
Report is 2 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1938   +/-   ##
=======================================
  Coverage   86.02%   86.02%           
=======================================
  Files         144      144           
  Lines        8699     8700    +1     
=======================================
+ Hits         7483     7484    +1     
  Misses       1216     1216           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@devmotion devmotion force-pushed the dw/mode_lkjcholesky branch from bc5dbac to c45af15 Compare January 14, 2025 14:32
Copy link
Contributor

@sethaxen sethaxen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a bug or should be fixed. The logpdf of LKJ is defined wrt the Lebesgue measure on the strict upper/lower triangle of the matrix, while the logpdf of LKJCholesky is defined wrt the Lebesgue measure on the strict upper/lower triangle of the triangular factor:

julia> X = rand(LKJCholesky(10, 2.0));

julia> logpdf(LKJCholesky(10, 2.0), X)
-5.741870506362201

julia> logpdf(LKJ(10, 2.0), Matrix(X))
1.3678551789839233

This is important for e.g. Bijectors' logdetjac to work correctly, and this is why they have different modes.

As a result, they have different modes. Also, while LKJ has a reasonable notion of mean (sample infinite matrices, average them), LKJCholesky has no such reasonable notion (averaging the triangular factor produces an invalid and useless Cholesky object). One could interpret the mean as what one gets from calling Matrix on the Cholesky objects and then averaging, but just like Factorization is not a subtype of AbstractMatrix so doesn't support e.g. addition, I think it makes more sense to leave it unimplemented. If we think one will really want the mean of LKJ from LKJCholesky, maybe it makes more sense to implement convert methods for switching between these distributions.

@sethaxen
Copy link
Contributor

Hm, but actually, I think the logpdf implementation might have an off-by-one error. I'll double-check the math later.

@sethaxen
Copy link
Contributor

Yeah, math is fine. But it's true that LKJCholesky doesn't have a defined mode for η < 1, since the mode will only be unique and in the support when p + 2(η - 1) - i > 0 for all 2 <= i <= p, which implies η > 1. When η = 1, the marginal density of the pth row/column of the factor is uniform on the hemisphere. I'm not certain what Distribution's policy is about non-unique modes; if non-unique modes are allowed, then identity matrix could still be returned for η = 1.

@devmotion
Copy link
Member Author

Hmm... My interpretation and use of LKJCholesky so far has been that it is LKJ with a more convenient and numerically more efficient (already factorized) variate. This seems to be in line with the docstring that states

Variates or samples of the distribution are `LinearAlgebra.Cholesky` objects, as might
be returned by `F = LinearAlgebra.cholesky(R)`, so that `Matrix(F) ≈ R` is a variate or
sample of [`LKJ`](@ref). 

Sampling `LKJCholesky` is faster than sampling `LKJ`, and often having the correlation
matrix in factorized form makes subsequent computations cheaper as well.

Based on this interpretation (variates of type Cholesky as a "better" non-AbstractMatrix matrix type), I think it is natural to expect that mean, var, etc. of a LKJCholesky are identical to the corresponding LKJ distribution (since in that interpretation they are the "same" thing with only an implementation difference). Of course, that understanding breaks down if

the logpdf of LKJCholesky is defined wrt the Lebesgue measure on the strict upper/lower triangle of the triangular factor

I haven't been aware of this (or at least I don't remember it), and it still seems unclear (surprising?) given the docstring of LKJCholesky. [I have been wondering for a while even whether the variate type of LKJ should be changed to a PDMat, which possibly seemed to remove the need for a dedicated LKJCholesky.]

@sethaxen
Copy link
Contributor

Hmm... My interpretation and use of LKJCholesky so far has been that it is LKJ with a more convenient and numerically more efficient (already factorized) variate.

the logpdf of LKJCholesky is defined wrt the Lebesgue measure on the strict upper/lower triangle of the triangular factor

I haven't been aware of this (or at least I don't remember it), and it still seems unclear (surprising?) given the docstring of LKJCholesky.

I'd originally proposed in #1336 adding a MatrixVariate distribution for the triangular factor of LKJ, and through discussion we agreed a Cholesky object would be more useful. But I'd always intended for the underlying random variable here to be the triangular factor (hence the different logpdf). The docstring of LKJCholesky doesn't mention the assumed reference measure. As noted in #224 (comment), each variate type assumes absolute continuity wrt some measure, but it's never documented. In this case the measure was wrt the strict lower triangle. This is why we compute the logdetjac this way:

# Compute logdetjac of ϕ: L → L L' where only strict lower triangle of L and L L' are unique
function cholesky_inverse_logdetjac(L)
size(L, 1) == 1 && return 0.0
J = jacobian(central_fdm(5, 1), cholesky_vec_to_corr_vec, stricttril_to_vec(L))[1]
return logabsdet(J)[1]
end
stricttril_to_vec(L) = [L[i, j] for i in axes(L, 1) for j in 1:(i - 1)]
function vec_to_stricttril(l)
n = length(l)
p = Int((1 + sqrt(8n + 1)) / 2)
L = similar(l, p, p)
fill!(L, 0)
k = 1
for i in 1:p, j in 1:(i - 1)
L[i, j] = l[k]
k += 1
end
return L
end
function cholesky_vec_to_corr_vec(l)
L = vec_to_stricttril(l)
for i in axes(L, 1)
w = view(L, i, 1:(i-1))
wnorm = norm(w)
if wnorm > 1
w ./= wnorm
wnorm = 1
end
L[i, i] = sqrt(1 - wnorm^2)
end
return stricttril_to_vec(L * L')
end

[I have been wondering for a while even whether the variate type of LKJ should be changed to a PDMat, which possibly seemed to remove the need for a dedicated LKJCholesky.]

I agree this would be better. But since this would be breaking, is this likely to happen anytime soon? FWIW, I've had a branch for some time with implementations of a WishartCholesky and InverseWishartCholesky I've been meaning to finish up; Wishart and InverseWishart could likewise get a PDMat variate type.

@devmotion devmotion changed the title Fix mode of LKJCholesky and define mean(::LKJCholesky) Fix mode of LKJCholesky Jan 15, 2025
@devmotion
Copy link
Member Author

I removed the definition of mean(::LKJCholesky) from this PR.

I'm not certain what Distribution's policy is about non-unique modes; if non-unique modes are allowed, then identity matrix could still be returned for η = 1.

My understanding was that mode returns the mode of an unimodal distribution. This seems to be in line with the following comment:

mode, # the mode of a unimodal distribution

@sethaxen
Copy link
Contributor

My understanding was that mode returns the mode of an unimodal distribution. This seems to be in line with the following comment:

mode, # the mode of a unimodal distribution

As a counterpoint, for a UnivariateDistribution the docstring of mode clarifies it returns the first mode (hence modes for getting all modes):

"""
modes(d::UnivariateDistribution)
Get all modes (if this makes sense).
"""
modes(d::UnivariateDistribution) = [mode(d)]
"""
mode(d::UnivariateDistribution)
Returns the first mode.
"""
mode(d::UnivariateDistribution)

There's no corresponding docstring for mode for other variates, but Dirichlet does raise an error if requesting the mode of a uniform Dirichlet distribution. So for consistency I think it makes the most sense to require η>1 here.

Copy link
Contributor

@sethaxen sethaxen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a minor suggestion.

test/cholesky/lkjcholesky.jl Outdated Show resolved Hide resolved
@sethaxen
Copy link
Contributor

I have been wondering for a while even whether the variate type of LKJ should be changed to a PDMat, which possibly seemed to remove the need for a dedicated LKJCholesky

It would be great to have this as its own issue so it doesn't get lost

@devmotion
Copy link
Member Author

I opened #1939.

@devmotion devmotion merged commit 3b981a0 into master Jan 17, 2025
15 checks passed
@devmotion devmotion deleted the dw/mode_lkjcholesky branch January 17, 2025 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants