Fix `mode` of `LKJCholesky` #1938

devmotion · 2025-01-14T13:45:30Z

For LKJ the mode is only defined (and then identical to the identity matrix) if η > 1: If 0 < η < 1, the identity matrix is a trough of the density, and if η = 1 then it's a uniform distribution of correlation matrices (see e.g. https://mc-stan.org/docs/functions-reference/correlation_matrix_distributions.html#probability-density-function). Therefore for LKJ mode is defined only if η > 1:

julia> mode(LKJ(1, 0.5))
ERROR: DomainError with 0.5:
LKJ: mode is defined only when η > 1.
Stacktrace:
 [1] #545
   @ ~/.julia/packages/Distributions/cWeit/src/matrix/lkj.jl:79 [inlined]
 [2] check_args
   @ ~/.julia/packages/Distributions/cWeit/src/utils.jl:89 [inlined]
 [3] #mode#544
   @ ~/.julia/packages/Distributions/cWeit/src/matrix/lkj.jl:79 [inlined]
 [4] mode(d::LKJ{Float64, Int64})
   @ Distributions ~/.julia/packages/Distributions/cWeit/src/matrix/lkj.jl:78
 [5] top-level scope
   @ REPL[48]:1

julia> mode(LKJ(1, 1.5))
1×1 Matrix{Float64}:
 1.0

For LKJCholesky, however, on the master branch mode always returns the identity matrix, irrespective of the parameters:

julia> mode(LKJCholesky(1, 0.5))
LinearAlgebra.Cholesky{Float64, Matrix{Float64}}
L factor:
1×1 LinearAlgebra.LowerTriangular{Float64, Matrix{Float64}}:
 1.0

julia> mode(LKJCholesky(1, 1.5))
LinearAlgebra.Cholesky{Float64, Matrix{Float64}}
L factor:
1×1 LinearAlgebra.LowerTriangular{Float64, Matrix{Float64}}:
 1.0

This PR fixes the problem, ~~and defines mean for LKJCholesky which is currently missing and should always return the identity matrix:~~

julia> mean(LKJ(1, 0.5))
1×1 Matrix{Float64}:
 1.0

julia> mean(LKJ(1, 1.5))
1×1 Matrix{Float64}:
 1.0

julia> mean(LKJCholesky(1, 0.5))
ERROR: MethodError: no method matching iterate(::LKJCholesky{Float64})
[...]

julia> mean(LKJCholesky(1, 1.5))
ERROR: MethodError: no method matching iterate(::LKJCholesky{Float64})
[...]

codecov-commenter · 2025-01-14T13:59:29Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 86.02%. Comparing base (957f0c0) to head (ed0ab6d).
Report is 2 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #1938   +/-   ##
=======================================
  Coverage   86.02%   86.02%           
=======================================
  Files         144      144           
  Lines        8699     8700    +1     
=======================================
+ Hits         7483     7484    +1     
  Misses       1216     1216

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

sethaxen

I don't think this is a bug or should be fixed. The logpdf of LKJ is defined wrt the Lebesgue measure on the strict upper/lower triangle of the matrix, while the logpdf of LKJCholesky is defined wrt the Lebesgue measure on the strict upper/lower triangle of the triangular factor:

julia> X = rand(LKJCholesky(10, 2.0));

julia> logpdf(LKJCholesky(10, 2.0), X)
-5.741870506362201

julia> logpdf(LKJ(10, 2.0), Matrix(X))
1.3678551789839233

This is important for e.g. Bijectors' logdetjac to work correctly, and this is why they have different modes.

As a result, they have different modes. Also, while LKJ has a reasonable notion of mean (sample infinite matrices, average them), LKJCholesky has no such reasonable notion (averaging the triangular factor produces an invalid and useless Cholesky object). One could interpret the mean as what one gets from calling Matrix on the Cholesky objects and then averaging, but just like Factorization is not a subtype of AbstractMatrix so doesn't support e.g. addition, I think it makes more sense to leave it unimplemented. If we think one will really want the mean of LKJ from LKJCholesky, maybe it makes more sense to implement convert methods for switching between these distributions.

sethaxen · 2025-01-14T17:10:53Z

Hm, but actually, I think the logpdf implementation might have an off-by-one error. I'll double-check the math later.

sethaxen · 2025-01-14T17:58:09Z

Yeah, math is fine. But it's true that LKJCholesky doesn't have a defined mode for η < 1, since the mode will only be unique and in the support when p + 2(η - 1) - i > 0 for all 2 <= i <= p, which implies η > 1. When η = 1, the marginal density of the pth row/column of the factor is uniform on the hemisphere. I'm not certain what Distribution's policy is about non-unique modes; if non-unique modes are allowed, then identity matrix could still be returned for η = 1.

devmotion · 2025-01-14T18:03:49Z

Hmm... My interpretation and use of LKJCholesky so far has been that it is LKJ with a more convenient and numerically more efficient (already factorized) variate. This seems to be in line with the docstring that states

Variates or samples of the distribution are `LinearAlgebra.Cholesky` objects, as might
be returned by `F = LinearAlgebra.cholesky(R)`, so that `Matrix(F) ≈ R` is a variate or
sample of [`LKJ`](@ref). 

Sampling `LKJCholesky` is faster than sampling `LKJ`, and often having the correlation
matrix in factorized form makes subsequent computations cheaper as well.

Based on this interpretation (variates of type Cholesky as a "better" non-AbstractMatrix matrix type), I think it is natural to expect that mean, var, etc. of a LKJCholesky are identical to the corresponding LKJ distribution (since in that interpretation they are the "same" thing with only an implementation difference). Of course, that understanding breaks down if

the logpdf of LKJCholesky is defined wrt the Lebesgue measure on the strict upper/lower triangle of the triangular factor

I haven't been aware of this (or at least I don't remember it), and it still seems unclear (surprising?) given the docstring of LKJCholesky. [I have been wondering for a while even whether the variate type of LKJ should be changed to a PDMat, which possibly seemed to remove the need for a dedicated LKJCholesky.]

sethaxen · 2025-01-14T19:12:27Z

Hmm... My interpretation and use of LKJCholesky so far has been that it is LKJ with a more convenient and numerically more efficient (already factorized) variate.

the logpdf of LKJCholesky is defined wrt the Lebesgue measure on the strict upper/lower triangle of the triangular factor

I haven't been aware of this (or at least I don't remember it), and it still seems unclear (surprising?) given the docstring of LKJCholesky.

I'd originally proposed in #1336 adding a MatrixVariate distribution for the triangular factor of LKJ, and through discussion we agreed a Cholesky object would be more useful. But I'd always intended for the underlying random variable here to be the triangular factor (hence the different logpdf). The docstring of LKJCholesky doesn't mention the assumed reference measure. As noted in #224 (comment), each variate type assumes absolute continuity wrt some measure, but it's never documented. In this case the measure was wrt the strict lower triangle. This is why we compute the logdetjac this way:

Distributions.jl/test/cholesky/lkjcholesky.jl

Lines 44 to 75 in 957f0c0

    
           # Compute logdetjac of ϕ: L → L L' where only strict lower triangle of L and L L' are unique 
        
           function cholesky_inverse_logdetjac(L) 
        
               size(L, 1) == 1 && return 0.0 
        
               J = jacobian(central_fdm(5, 1), cholesky_vec_to_corr_vec, stricttril_to_vec(L))[1] 
        
               return logabsdet(J)[1] 
        
           end 
        
           stricttril_to_vec(L) = [L[i, j] for i in axes(L, 1) for j in 1:(i - 1)] 
        
           function vec_to_stricttril(l) 
        
               n = length(l) 
        
               p = Int((1 + sqrt(8n + 1)) / 2) 
        
               L = similar(l, p, p) 
        
               fill!(L, 0) 
        
               k = 1 
        
               for i in 1:p, j in 1:(i - 1) 
        
                   L[i, j] = l[k] 
        
                   k += 1 
        
               end 
        
               return L 
        
           end 
        
           function cholesky_vec_to_corr_vec(l) 
        
               L = vec_to_stricttril(l) 
        
               for i in axes(L, 1) 
        
                   w = view(L, i, 1:(i-1)) 
        
                   wnorm = norm(w) 
        
                   if wnorm > 1 
        
                       w ./= wnorm 
        
                       wnorm = 1 
        
                   end 
        
                   L[i, i] = sqrt(1 - wnorm^2) 
        
               end 
        
               return stricttril_to_vec(L * L') 
        
           end

[I have been wondering for a while even whether the variate type of LKJ should be changed to a PDMat, which possibly seemed to remove the need for a dedicated LKJCholesky.]

I agree this would be better. But since this would be breaking, is this likely to happen anytime soon? FWIW, I've had a branch for some time with implementations of a WishartCholesky and InverseWishartCholesky I've been meaning to finish up; Wishart and InverseWishart could likewise get a PDMat variate type.

devmotion · 2025-01-15T19:36:20Z

I removed the definition of mean(::LKJCholesky) from this PR.

I'm not certain what Distribution's policy is about non-unique modes; if non-unique modes are allowed, then identity matrix could still be returned for η = 1.

My understanding was that mode returns the mode of an unimodal distribution. This seems to be in line with the following comment:

Distributions.jl/src/Distributions.jl

Line 239 in 957f0c0

mode, # the mode of a unimodal distribution

sethaxen · 2025-01-16T09:11:47Z

My understanding was that mode returns the mode of an unimodal distribution. This seems to be in line with the following comment:

Distributions.jl/src/Distributions.jl

Line 239 in 957f0c0

mode, # the mode of a unimodal distribution

As a counterpoint, for a UnivariateDistribution the docstring of mode clarifies it returns the first mode (hence modes for getting all modes):

Distributions.jl/src/univariates.jl

Lines 191 to 203 in 957f0c0

    
           """ 
        
               modes(d::UnivariateDistribution) 
        
           Get all modes (if this makes sense). 
        
           """ 
        
           modes(d::UnivariateDistribution) = [mode(d)] 
        
           """ 
        
               mode(d::UnivariateDistribution) 
        
           Returns the first mode. 
        
           """ 
        
           mode(d::UnivariateDistribution)

There's no corresponding docstring for mode for other variates, but Dirichlet does raise an error if requesting the mode of a uniform Dirichlet distribution. So for consistency I think it makes the most sense to require η>1 here.

sethaxen

LGTM, just a minor suggestion.

test/cholesky/lkjcholesky.jl

Co-authored-by: Seth Axen <[email protected]>

sethaxen · 2025-01-16T12:28:28Z

I have been wondering for a while even whether the variate type of LKJ should be changed to a PDMat, which possibly seemed to remove the need for a dedicated LKJCholesky

It would be great to have this as its own issue so it doesn't get lost

devmotion · 2025-01-16T14:36:12Z

I opened #1939.

Fix mode of LKJCholesky and define mean(::LKJCholesky)

c45af15

devmotion force-pushed the dw/mode_lkjcholesky branch from bc5dbac to c45af15 Compare January 14, 2025 14:32

sethaxen reviewed Jan 14, 2025

View reviewed changes

Remove definition of mean(::LKJCholesky)

a50ff12

devmotion changed the title ~~Fix mode of LKJCholesky and define mean(::LKJCholesky)~~ Fix mode of LKJCholesky Jan 15, 2025

sethaxen approved these changes Jan 16, 2025

View reviewed changes

test/cholesky/lkjcholesky.jl Outdated Show resolved Hide resolved

Also test boundary case η = 1

ed0ab6d

Co-authored-by: Seth Axen <[email protected]>

devmotion mentioned this pull request Jan 16, 2025

Use PDMat variates for LKJ, Wishart and InverseWishart #1939

Open

devmotion merged commit 3b981a0 into master Jan 17, 2025
15 checks passed

devmotion deleted the dw/mode_lkjcholesky branch January 17, 2025 09:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `mode` of `LKJCholesky` #1938

Fix `mode` of `LKJCholesky` #1938

devmotion commented Jan 14, 2025 •

edited

Loading

codecov-commenter commented Jan 14, 2025 •

edited

Loading

sethaxen left a comment •

edited

Loading

sethaxen commented Jan 14, 2025

sethaxen commented Jan 14, 2025

devmotion commented Jan 14, 2025

sethaxen commented Jan 14, 2025

devmotion commented Jan 15, 2025

sethaxen commented Jan 16, 2025

sethaxen left a comment

sethaxen commented Jan 16, 2025

devmotion commented Jan 16, 2025

Fix mode of LKJCholesky #1938

Fix mode of LKJCholesky #1938

Conversation

devmotion commented Jan 14, 2025 • edited Loading

codecov-commenter commented Jan 14, 2025 • edited Loading

Codecov Report

sethaxen left a comment • edited Loading

Choose a reason for hiding this comment

sethaxen commented Jan 14, 2025

sethaxen commented Jan 14, 2025

devmotion commented Jan 14, 2025

sethaxen commented Jan 14, 2025

devmotion commented Jan 15, 2025

sethaxen commented Jan 16, 2025

sethaxen left a comment

Choose a reason for hiding this comment

sethaxen commented Jan 16, 2025

devmotion commented Jan 16, 2025

Fix `mode` of `LKJCholesky` #1938

Fix `mode` of `LKJCholesky` #1938

devmotion commented Jan 14, 2025 •

edited

Loading

codecov-commenter commented Jan 14, 2025 •

edited

Loading

sethaxen left a comment •

edited

Loading