Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Big variation of the 'loglik' after one additional VE step #91

Closed
scj-robin opened this issue Aug 8, 2022 · 7 comments
Closed

Big variation of the 'loglik' after one additional VE step #91

scj-robin opened this issue Aug 8, 2022 · 7 comments
Assignees

Comments

@scj-robin
Copy link

scj-robin commented Aug 8, 2022

The ELBO (named 'loglik' in the output of the PLN function) seem to vary dramatically when performing one additional VE step, while convergence is supposed to be reached.

Considering the Barents data with no offset, Y = counts (n=59, p=30)and X = intercept + covariates (d=5), we get:

fit <- PLN(Y ~ -1+X, control=list(trace=0))
VE <- fit$VEstep(covariates=X, offsets=matrix(0, n, p), responses=Y, weights = rep(1,n))
print(c(fit$loglik, sum(VE$log.lik)))
> [1] -8989.666 -6655.017

Using a homemade function to calculate the ELBO:

ELBO <- function(Y, X, M, S2, Theta, Sigma){
   n <- nrow(Y); p <- ncol(Y); Omega <- solve(Sigma); Beta <- t(Theta)
   EqLogPZ <- -n/2 * log(det(Sigma)) -0.5 * sum(diag(M %*% Omega %*% t(M))) - 
     0.5 * sum(sapply(1:n, function(i){diag(Omega%*%diag(S2[i, ]))}))
   EqLogPY.Z <- -sum(exp(X%*%Beta + M + S2/2)) + sum(Y * (X %*% Beta + M)) - sum(lgamma(Y+1))
   EqLogQZ <- -0.5 * sum(log(S2)) -n*p/2
   return(list(EqLogPZ=EqLogPZ, EqLogPY.Z=EqLogPY.Z, EqLogQZ=EqLogQZ, 
               elbo=EqLogPZ + EqLogPY.Z - EqLogQZ))
 }

we get different but similar results:

print(c(ELBO(Y, X, fit$var_par$M, fit$var_par$S2, fit$model_par$Theta, fit$model_par$Sigma)$elbo, 
        ELBO(Y, X, VE$M, VE$S2, fit$model_par$Theta, fit$model_par$Sigma)$elbo))
> [1] -8989.132 -5630.619

It seems that the variational parameters M and S2 vary quite a lot during the additional VE step:

print(c(max(abs(fit$var_par$M - VE$M)), max(abs(fit$var_par$S2 - VE$S2))))
> [1] 1.964532 1.083126
@mahendra-mariadassou mahendra-mariadassou self-assigned this Aug 8, 2022
@mahendra-mariadassou
Copy link
Collaborator

mahendra-mariadassou commented Aug 8, 2022

2 remarks:

  • There was indeed a bug when computing the log-likelihood at the end of VEstep() (a S which should have been a S2)
  • I'm completely puzzled by the lack of convergence. We can even reproduce the problem by calling optimize() twice (once implicitly during the first call to PLN and then once explicitly)
library(PLNmodels)
load("../Data/BarentsFish/BarentsFish.Rdata")
n <- nrow(Data$count)
p <- ncol(Data$count)
X <- Data$covariates
Y <- Data$count
O <- matrix(0, n, p)
w <- rep(1, n)
fit <- PLN(count ~ -1 + covariates, control=list(trace=1), data = Data)
fit_new <- fit$clone(deep = TRUE)
fit_new$optimize(covariates=X, offsets=O, responses=Y, weights = w, control = PLNmodels:::PLN_param(NULL, n, p))
print(c(fit$loglik, fit_new$loglik))
[1] -9035.610 -8755.656

And produce an even greater jump by manually updating the variational parameters (with VEstep()) before calling optimize()

VE <- fit$VEstep(covariates= X, offsets= O, responses= Y, weights = w)
fit_new$update(M = VE$M, S2 = VE$S2)
fit_new$optimize(covariates=X, offsets=O, responses=Y, weights = w, control = PLNmodels:::PLN_param(NULL, n, p))
print(c(fit$loglik, fit_new$loglik))
[1] -9035.61 -5343.67

And repeat the process

VE <- fit_new$VEstep(covariates= X, offsets= O, responses= Y, weights = w)
fit_new$update(M = VE$M, S2 = VE$S2)
fit_new$optimize(covariates=X, offsets=O, responses=Y, weights = w, control = PLNmodels:::PLN_param(NULL, n, p))
print(c(fit$loglik, fit_new$loglik))
[1] -9035.610 -5077.922

This looks (at least for the first block) like a problem with the optimizer although for all model we have

fit$optim_par$message
[1] "xtol_rel or xtol_abs was reached"
fit_new$optim_par$message
[1] "xtol_rel or xtol_abs was reached"

@mahendra-mariadassou
Copy link
Collaborator

After a bit of investigation, it seems like convergence is incomplete with the default parameters. If I change xtol_rel to 1e-6 (instead of the default 1e-4)

  • convergence is achieved because the ELBO doesn't increase anymore ("ftol_rel or ftol_abs was reached") although the parameters may still change slightly
  • the ELBO is much better (-4400) using the smaller xtol_rel than the default one (-9035.610)
library(PLNmodels)
#> This is packages 'PLNmodels' version 0.11.6-9200
#> Use future::plan(multicore/multisession) to speed up PLNPCA/PLNmixture/stability_selection.
load("~/Documents/PLN/Data/BarentsFish/BarentsFish.Rdata")
n <- nrow(Data$count)
p <- ncol(Data$count)
X <- Data$covariates
Y <- Data$count
O <- matrix(0, n, p)
w <- rep(1, n)
fit <- PLN(count ~ -1 + covariates, data = Data, control = list(xtol_rel = 1e-6, trace = 1))
#> 
#>  Initialization...
#>  Adjusting a PLN model with full covariance model
#>  Post-treatments...
#>  DONE!
fit$optim_par
#> $iterations
#> [1] 3533
#> 
#> $message
#> [1] "ftol_rel or ftol_abs was reached"
  • Calling optimize() once again doesn't drastically affect the value of the ELBO and the values of the parameters (model and variational) don't change ("xtol_rel or xtol_abs was reached")
fit_new <- fit$clone(deep = TRUE)
fit_new$optimize(covariates=X, offsets=O, responses=Y, weights = w, control = PLNmodels:::PLN_param(list(xtol_rel = 1e-6), n, p))
fit_new$optim_par
#> $iterations
#> [1] 11
#> 
#> $message
#> [1] "xtol_rel or xtol_abs was reached"
print(c(fit$loglik, fit_new$loglik))
#> [1] -4401.694 -4401.691
  • Doing one VE step and calling optimize() with the model parameters from fit and the variational parameters from VE doesn't affect the results ("xtol_rel or xtol_abs was reached")
VE <- fit$VEstep(covariates= X, offsets= O, responses= Y, weights = w, control = PLNmodels:::PLN_param(list(xtol_rel = 1e-6), n, p))
fit_new$update(Theta = fit$model_par$Theta, Sigma = fit$model_par$Sigma, M = VE$M, S2 = VE$S2)
fit_new$optimize(covariates=X, offsets=O, responses=Y, weights = w, control = PLNmodels:::PLN_param(list(xtol_rel = 1e-6), n, p))
fit_new$optim_par
#> $iterations
#> [1] 11
#> 
#> $message
#> [1] "xtol_rel or xtol_abs was reached"
print(c(fit$loglik, fit_new$loglik))
#> [1] -4401.694 -4401.673

We can check visually that the parameters are qualitatively comparable in fit and fit_new.

par(mfrow = c(2, 2))
plot(fit$var_par$M ~ fit_new$var_par$M);abline(a = 0,b = 1, col = "red")
plot(fit$var_par$S2 ~ fit_new$var_par$S2);abline(a = 0,b = 1, col = "red")
plot(fit$model_par$Theta ~ fit_new$model_par$Theta);abline(a = 0,b = 1, col = "red")
par(mfrow = c(1, 1))

In summary, the main problem with the code appears to be the default value of xtol_rel which leads to faster but incomplete optimization (+ the typo that will be fixed)

Created on 2022-08-09 by the reprex package (v2.0.1)

@scj-robin
Copy link
Author

Thank you for your superfast answer. I actually do not get the same numerical value when running your code on my laptop (I wonder if some difference may exist between our respective datafiles).
In conclusion, do you suggest that, when mostly interested in the value of the elbo and/or the variational parameters, we always run the code with the control option xtol_rel = 1e-6, or even less?

@mahendra-mariadassou
Copy link
Collaborator

The differences are probably due to my computer having a slightly different version of R/C++ than yours. The question about the elbo is a tricky one ! I don't have general suggestions apart from checking convergence like you did:

  • if I call optimize() again, does it change the likelihood ?
  • if I do a VEstep, does it change the likelihood ?
  • was convergence achieved because the parameters don't change too much or because the objective reached a plateau ?
    Setting a lower value for xtol_rel = 1e-6 is good in principle but will result in longer computation times and may not always be the best trade off between time and accuracy.

@scj-robin
Copy link
Author

Using homemade functions for both the VE and the M step, it seems that the biggest changes occur in S2.
Could the problem be related to some boundary imposed to S2 in the VEM algorithm?

@jchiquet
Copy link
Member

Hi,

Thank Stéphane for raising this issue and to Mahendra for his responsiveness.

Some comments afterwards:

When we set the parameters of the optimization methods in PLNmodels, we did our best to reach a good compromise between good optimization and performance. We have checked on a number of problems that the optimum was reached. This is obviously not the case for barent fish data. I am not too much in favor of lowering the tolerance threshold of the convergence of parameters (xtol). We use the default of the NLOpt optimization library and I think it's fine like that.

On the other hand, I'm considering integrating Bastien's code as an alternative solution for the optimization, which uses the Rprop optimizer from torch, now well interfaced with R... It works very well, but it probably means more effort on the user's side to check that the model has been properly adjusted.

Finally, concerning the remark about the optimization in S²: in fact, we optimize in S such that S*S = S² by not putting any constraint on S, so the problem pointed out by Stéphane is not directly linked to the big variations of the ELBO, but maybe indirectly.

@jchiquet
Copy link
Member

jchiquet commented Sep 2, 2022

Ok, just to let you know that, after a couple of tests,

  • we now set xtol_rel = 1e-6 by default
  • I started the implementation of an alternative backend for optimization using torch + rprop, which seems to work for PLN and its variants (diagonal, spherical and full matrix), to be continued...

@jchiquet jchiquet closed this as completed Feb 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants