Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with aggregate data and pp_check() methods #541

Open
athowes opened this issue Feb 20, 2025 · 2 comments
Open

Issue with aggregate data and pp_check() methods #541

athowes opened this issue Feb 20, 2025 · 2 comments
Labels
enhancement New feature or request

Comments

@athowes
Copy link
Collaborator

athowes commented Feb 20, 2025

Describe the bug

The pp_check methods from brms work incorrectly with the epidist_aggregate_data.

To Reproduce

set.seed(1)

meanlog <- 1.5
sdlog <- 0.5

# Number of observations
N <- 1000

delay <- primarycensored::rpcens(
  n = N, rdist = stats::rlnorm, meanlog = meanlog, sdlog = sdlog,
  pwindow = 1, swindow = 1, D = Inf
)

aggregate_data <- data.frame(
    delay = delay, reference_date = as.Date("2024-01-01")
  ) |>
  group_by(delay, reference_date) |>
  summarise(count = n(), .groups = "drop") |>
  mutate(report_date = reference_date + delay, .before = delay) |>
  epidist::as_epidist_aggregate_data(
    n = "count",
    pdate_lwr = "reference_date",
    sdate_lwr = "report_date",
    obs_date = NULL
  )

prep <- epidist::as_epidist_marginal_model(aggregate_data)

fit <- epidist::epidist(prep)

epidist_diagnostics(fit)
# A tibble: 1 × 8
   time samples max_rhat divergent_transitions per_divergent_transitions max_treedepth no_at_max_treedepth per_at_max_treedepth
  <dbl>   <dbl>    <dbl>                 <dbl>                     <dbl>         <dbl>               <int>                <dbl>
1  1.38    4000     1.00                     0                         0             3                 746                0.186

brms::pp_check(fit, ndraws = 50, type = "bars")

Image

fit$data$delay_lwr
# [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 21 28

Expected behavior

The figure should have a barplot with the data observed fit$data$n times.

Session information

> sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: aarch64-apple-darwin20
Running under: macOS 15.3.1

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] bayesplot_1.11.1      primarycensored_1.1.0 ggplot2_3.5.1         dplyr_1.1.4           brms_2.22.0           Rcpp_1.0.12          
[7] epidist_0.2.0.1000   

loaded via a namespace (and not attached):
 [1] gtable_0.3.5         tensorA_0.36.2.1     xfun_0.45            QuickJSR_1.2.2       processx_3.8.4       inline_0.3.19       
 [7] lattice_0.22-6       callr_3.7.6          ps_1.7.7             vctrs_0.6.5          tools_4.4.1          generics_0.1.3      
[13] stats4_4.4.1         curl_5.2.1           parallel_4.4.1       tibble_3.2.1         fansi_1.0.6          pkgconfig_2.0.3     
[19] Matrix_1.7-0         checkmate_2.3.1      distributional_0.4.0 RcppParallel_5.1.8   lifecycle_1.0.4      farver_2.1.2        
[25] compiler_4.4.1       stringr_1.5.1        Brobdingnag_1.2-9    munsell_0.5.1        codetools_0.2-20     htmltools_0.5.8.1   
[31] yaml_2.3.9           pillar_1.9.0         StanHeaders_2.32.10  bridgesampling_1.1-2 abind_1.4-5          nlme_3.1-164        
[37] posterior_1.6.0      rstan_2.32.6         digest_0.6.36        tidyselect_1.2.1     mvtnorm_1.2-5        stringi_1.8.4       
[43] reshape2_1.4.4       purrr_1.0.2          labeling_0.4.3       fastmap_1.2.0        grid_4.4.1           colorspace_2.1-0    
[49] cli_3.6.3            magrittr_2.0.3       loo_2.8.0            pkgbuild_1.4.4       utf8_1.2.4           withr_3.0.0         
[55] scales_1.3.0         backports_1.5.0      lubridate_1.9.3      timechange_0.3.0     estimability_1.5.1   rmarkdown_2.27      
[61] matrixStats_1.3.0    emmeans_1.10.3       gridExtra_2.3        evaluate_0.24.0      coda_0.19-4.1        knitr_1.48          
[67] V8_4.4.2             rstantools_2.4.0     rlang_1.1.4          xtable_1.8-4         glue_1.7.0           reprex_2.1.1        
[73] rstudioapi_0.16.0    jsonlite_1.8.8       plyr_1.8.9           R6_2.5.1             fs_1.6.4
@athowes athowes added the bug Something isn't working label Feb 20, 2025
@athowes
Copy link
Collaborator Author

athowes commented Feb 20, 2025

I assume that this works for brms models where the weights argument is used. Perhaps some kind of modification required to allow pp_check() to see that.

@seabbs
Copy link
Contributor

seabbs commented Feb 20, 2025

Have you checked to see if it works with linelist and weighted data? My expectation would be it doesn't

I assume that this works for brms models where the weights argument is used.

I think the first step is checking this and thinking about how it works.

What happens if you pass new data? What happens if you don't enforce a type?

The references to rows in https://mc-stan.org/bayesplot/reference/PPC-overview.html makes me wonder what the general support for this function for weighted data is like?

@seabbs seabbs added enhancement New feature or request and removed bug Something isn't working labels Feb 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants