Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable compatibility with loo package #526

Closed
athowes opened this issue Feb 11, 2025 · 15 comments
Closed

Enable compatibility with loo package #526

athowes opened this issue Feb 11, 2025 · 15 comments
Labels
enhancement New feature or request

Comments

@athowes
Copy link
Collaborator

athowes commented Feb 11, 2025

The loo package I believe is compatible with base-brms models. It looks like we might need to modify our custom likelihoods / families in order to make them compatible. The error I get is for a simple example traces back to the pdist_draw function within epidist_gen_log_lik and most immediately looks to involve cens being missing.

loo is a leading package for Bayesian model comparison, so I think it's relatively important that we support it. I would like to use it in my work to compare different model specifications. (I had also wanted to extend the Ebola vignette to perform model comparison using loo.)

Here is a reproducible example:

library(reprex)
library(epidist)
library(loo)
#> This is loo version 2.8.0
#> - Online documentation and vignettes at mc-stan.org/loo
#> - As of v2.0.0 loo defaults to 1 core but we recommend using as many as possible. Use the 'cores' argument or set options(mc.cores = NUM_CORES) for an entire session.

set.seed(101)

obs_time <- 25
sample_size <- 500

meanlog <- 1.8
sdlog <- 0.5

sim_obs <- simulate_gillespie() |>
  simulate_secondary(
    dist = rlnorm,
    meanlog = meanlog,
    sdlog = sdlog
  ) |>
  dplyr::mutate(
    ptime_lwr = floor(.data$ptime),
    ptime_upr = .data$ptime_lwr + 1,
    stime_lwr = floor(.data$stime),
    stime_upr = .data$stime_lwr + 1,
    obs_time = obs_time
  ) |>
  dplyr::filter(.data$stime_upr <= .data$obs_time) |>
  dplyr::slice_sample(n = sample_size, replace = FALSE)

sim_obs <- as_epidist_linelist_data(
  sim_obs$ptime_lwr,
  sim_obs$ptime_upr,
  sim_obs$stime_lwr,
  sim_obs$stime_upr,
  sim_obs$obs_time
)

prep_obs <- as_epidist_latent_model(sim_obs)

fit <- epidist(
  data = prep_obs, seed = 1, chains = 2, cores = 2, silent = 2, refresh = 0,
  backend = "cmdstanr"
)
#> Loading required package: rstan
#> Loading required package: StanHeaders
#> 
#> rstan version 2.32.6 (Stan version 2.32.2)
#> For execution on a local, multicore CPU with excess RAM we recommend calling
#> options(mc.cores = parallel::detectCores()).
#> To avoid recompilation of unchanged Stan programs, we recommend calling
#> rstan_options(auto_write = TRUE)
#> For within-chain threading using `reduce_sum()` or `map_rect()` Stan functions,
#> change `threads_per_chain` option:
#> rstan_options(threads_per_chain = 1)

loo <- loo(fit, save_psis = TRUE)
#> Error in `purrr::map_dbl()` at epidist/R/gen.R:42:5:
#> ℹ In index: 1.
#> Caused by error in `purrr::map_dbl()` at epidist/R/gen.R:45:9:
#> ℹ In index: 1.
#> Caused by error in `if (is.null(cens) || cens == 0) ...`:
#> ! missing value where TRUE/FALSE needed

Created on 2025-02-11 with reprex v2.1.1

@athowes athowes added the bug Something isn't working label Feb 11, 2025
@seabbs seabbs added enhancement New feature or request and removed bug Something isn't working labels Feb 11, 2025
@seabbs
Copy link
Contributor

seabbs commented Feb 11, 2025

Thanks for the report. Do you see this across all models and multiple families? Note I have relabelled this as a enhancement vs a bug as I think that is a better fit

@athowes
Copy link
Collaborator Author

athowes commented Feb 11, 2025

Agree it could be thought of as enhancement over bug. There is some possibility that this incompatibility is suggestive of a wider issue which could be thought of as a bug.

@athowes
Copy link
Collaborator Author

athowes commented Feb 11, 2025

  • The issue did not occur for the naive model with lognormal family
  • The issue did occur for the latent model with lognormal and gamma families
  • I did not test the marginal model as it did not compile on my Mac (new issue?)

@athowes
Copy link
Collaborator Author

athowes commented Feb 11, 2025

Following chat with @seabbs it's possible that this could be fixed by #477.

@seabbs

This comment has been minimized.

@athowes

This comment has been minimized.

@seabbs

This comment has been minimized.

@seabbs
Copy link
Contributor

seabbs commented Feb 12, 2025

This is very likely related to #479

@seabbs

This comment has been minimized.

@athowes

This comment has been minimized.

@seabbs

This comment has been minimized.

@seabbs
Copy link
Contributor

seabbs commented Feb 19, 2025

Using the code as present on issue447 this now works but is prohibitively slow.

works in the sense of giving this:

r$> loo(fit, cores = 10)

Computed from 2000 by 91 log-likelihood matrix.

         Estimate   SE
elpd_loo   -393.1 22.8
p_loo         4.4  0.9
looic       786.1 45.5
------
MCSE of elpd_loo is 0.1.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.4, 1.1]).

All Pareto k estimates are good (k < 0.7).
See help('pareto-k-diagnostic') for details.

After about 5 minutes. The issue was that the cens modification needs to be as long as the data.

@seabbs
Copy link
Contributor

seabbs commented Feb 19, 2025

I will add a test for the loo package in order to close this (once the runtime has been brought down)

@seabbs
Copy link
Contributor

seabbs commented Feb 19, 2025

See the update in this draft PR: #540

TLDR with the changes made their and for the distributions for which there are analytical solutions this now works in a tractable amount of time. For dists where this is not the case I think #476 gates real world use.

@seabbs
Copy link
Contributor

seabbs commented Feb 26, 2025

Closed by #540. Note checking this functionality more deeply in #476

@seabbs seabbs closed this as completed Feb 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants