`int_pctl()` should pick the _first_ eval time if `eval_time = NULL` #809

hfrick · 2024-01-16T11:45:10Z

Part of #766

int_pctl() should pick the first eval time stored in the object if eval_time = NULL is provided by the user.

Plus, choosing an eval time in this case is currently broken. AKA we need some tests for this in extratests, now filed as tidymodels/extratests#163.

library(tidymodels)
library(censored)
#> Loading required package: survival

set.seed(1)
sim_dat <- prodlim::SimSurv(500) %>%
  mutate(event_time = Surv(time, event)) %>%
  select(event_time, X1, X2) %>%
  as_tibble()

set.seed(2)
sim_rs <- vfold_cv(sim_dat)

time_points <- c(10, 1, 5, 15)

mod_spec <-
  proportional_hazards(penalty = tune(), mixture = 1) %>%
  set_engine("glmnet") %>%
  set_mode("censored regression")

grid <- tibble(penalty = 10^c(-4, -2, -1))

gctrl <- control_grid(save_pred = TRUE, save_workflow = TRUE)

mix_mtrc <- metric_set(brier_survival, brier_survival_integrated, concordance_survival)

set.seed(2193)
tune_res <-
  mod_spec %>%
  tune_grid(
    event_time ~ X1 + X2,
    resamples = sim_rs,
    grid = grid,
    metrics = mix_mtrc,
    eval_time = time_points,
    control = gctrl
  )


# `eval_time` across different functions --------------------------------

res <- show_best(tune_res)
#> Warning in show_best(tune_res): No value of `metric` was given;
#> "brier_survival" will be used.
#> Warning: 4 evaluation times are available; the first (10) will be used.

res <- select_best(tune_res)
#> Warning in select_best(tune_res): No value of `metric` was given; "brier_survival" will be used.
#> 4 evaluation times are available; the first (10) will be used.

res <- select_by_one_std_err(tune_res, penalty)
#> Warning in select_by_one_std_err(tune_res, penalty): No value of `metric` was given; "brier_survival" will be used.
#> 4 evaluation times are available; the first (10) will be used.

res <- select_by_pct_loss(tune_res, penalty)
#> Warning in select_by_pct_loss(tune_res, penalty): No value of `metric` was given; "brier_survival" will be used.
#> 4 evaluation times are available; the first (10) will be used.

res <- autoplot(tune_res)
#> Warning in filter_plot_eval_time(dat, eval_time): No evaluation time was set; a
#> value of 5 was used.

res <- int_pctl(tune_res)
#> Warning in int_pctl(tune_res): No evaluation time was set; a value of 5 was
#> used.
#> Error in `purrr::map2()`:
#> ℹ In index: 1.
#> Caused by error in `dplyr::rename()`:
#> ! Can't rename columns that don't exist.
#> ✖ Column `term` doesn't exist.

res <- augment(tune_res)
#> Warning: 4 evaluation times are available; the first (10) will be used.

res <- fit_best(tune_res)

^{Created on 2024-01-16 with reprex v2.0.2}

The text was updated successfully, but these errors were encountered:

topepo · 2024-01-24T11:54:02Z

I think that we should make the default (i.e. eval_time = NULL) to compute the intervals for all evaluation times.

For the select/show APIs, we have to pick a single value for those operations. In autoplot(), the main reason to plot a single time’s results is so that we don’t have a ton of facets (perhaps making the visualization unreadable).

I think the use case is different for int_pctl(). If the user specifies multiple times, they are probably interested in the results across times, and percentile intervals across time would be very informative:

library(tidymodels)
library(censored)
#> Loading required package: survival
library(doMC)
#> Loading required package: foreach
#> 
#> Attaching package: 'foreach'
#> The following objects are masked from 'package:purrr':
#> 
#>     accumulate, when
#> Loading required package: iterators
#> Loading required package: parallel

tidymodels_prefer()
theme_set(theme_bw())
options(pillar.advice = FALSE, pillar.min_title_chars = Inf)
registerDoMC(cores = parallel::detectCores())

set.seed(1)
sim_dat <- prodlim::SimSurv(500) %>%
  mutate(event_time = Surv(time, event)) %>%
  select(event_time, X1, X2)

set.seed(2)
split <- initial_split(sim_dat)
sim_tr <- training(split)
sim_te <- testing(split)
sim_rs <- vfold_cv(sim_tr)

time_points <- 1:20

mod_spec <-
  bag_tree() %>%
  set_mode("censored regression")

rsctrl <- control_resamples(save_pred = TRUE)

mix_mtrc  <- metric_set(brier_survival)

set.seed(2193)
rs__res <-
  mod_spec %>%
  fit_resamples(
    event_time ~ X1 + X2,
    resamples = sim_rs,
    metrics = mix_mtrc,
    eval_time = time_points,
    control = rsctrl
  )

set.seed(1)
int_pctl(rs__res, times = 2000, eval_time = time_points) %>% 
  ggplot(aes(.eval_time)) + 
  geom_line(aes(y = .estimate)) + 
  geom_ribbon(aes(ymin = .lower, ymax = .upper), alpha = 1 / 10) +
  labs(x = "Time", y = "Brier Score")

If they don't want them all, they could opt-in to excluding time points.

I think that multiple metrics is a good analogy; if they specify all of our event time metrics, they probably want intervals for all of them.

github-actions · 2024-02-09T00:32:50Z

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

hfrick mentioned this issue Jan 16, 2024

When we need to default to a single value for eval_time #766

Closed

6 tasks

simonpcouch mentioned this issue Jan 23, 2024

Test int_pctl() for survival models tidymodels/extratests#163

Closed

topepo closed this as completed Jan 25, 2024

github-actions bot locked and limited conversation to collaborators Feb 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`int_pctl()` should pick the _first_ eval time if `eval_time = NULL` #809

`int_pctl()` should pick the _first_ eval time if `eval_time = NULL` #809

hfrick commented Jan 16, 2024 •

edited

Loading

topepo commented Jan 24, 2024

github-actions bot commented Feb 9, 2024

int_pctl() should pick the _first_ eval time if eval_time = NULL #809

int_pctl() should pick the _first_ eval time if eval_time = NULL #809

Comments

hfrick commented Jan 16, 2024 • edited Loading

topepo commented Jan 24, 2024

github-actions bot commented Feb 9, 2024

`int_pctl()` should pick the _first_ eval time if `eval_time = NULL` #809

`int_pctl()` should pick the _first_ eval time if `eval_time = NULL` #809

hfrick commented Jan 16, 2024 •

edited

Loading