-
-
Notifications
You must be signed in to change notification settings - Fork 404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Survival stuff #1833
Improve Survival stuff #1833
Conversation
R/measures.R
Outdated
cindex.uno = makeMeasure(id = "cindex.uno", minimize = FALSE, best = 1, worst = 0, | ||
properties = c("surv", "req.pred", "req.truth", "req.model"), | ||
name = "Uno's Concordance index", | ||
note = "Fraction of all pairs of subjects whose predicted survival times are correctly ordered among all subjects that can actually be ordered. In other words, it is the probability of concordance between the predicted and the observed survival. Corrected by weighting with IPCW as suggested by Uno.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a paper reference for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added references.
}, | ||
extra.args = list(max.time = NULL, resolution = 1000) | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add hand-constructed tests for the new measures please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by hand-constructed? Calculating these measures without a package would require a few hundred LOC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For most of the other measure tests along the lines of incorrect predictions 5, correct predictions 10, therefore error rate 33%. Check that implemented measure gets that number. The point is to check that the number is correct for specific cases (and these can be constructed, i.e. you know what the answer should be).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's complicated. I've added a small test to check if perfect predictions lead to (nearly) perfect performance if there is no censoring. For all other cases, I'd need an external package because you cannot compute this by hand (in a reasonable time frame). I guess we have to rely on the package authors of survAUC
for correctness.
@PhilippPro Do you have any ideas how to test those measures?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, not really. Of course one can construct simple cases without censoring that can be calculated by hand, but with censoring we have to use the complicated formulas from Uno's Paper here (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3079915/), which do not look very simple at first glance.
predict(.model$learner.model, newdata = .newdata, type = "link") | ||
else | ||
stop("Unknown predict type") | ||
predict(.model$learner.model, newdata = .newdata, type = "link") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the if no longer necessary here (and below)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Survival learners do not support multiple predict types currently. There was an attempt to support survival probabilities, but this is not implemented. The calling function checks for predict type and matches against properties, so this is dead code.
@@ -53,7 +53,8 @@ trainLearner.surv.cforest = function(.learner, .task, .subset, | |||
|
|||
#' @export | |||
predictLearner.surv.cforest = function(.learner, .model, .newdata, ...) { | |||
predict(.model$learner.model, newdata = .newdata, ...) | |||
# cforest returns median survival times; multiply by -1 so that high values correspond to high risk | |||
-1 * predict(.model$learner.model, newdata = .newdata, type = "response", ...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a test for this please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a test in this PR which detects if the predictions are reversed/inverted.
@@ -92,6 +92,13 @@ getSurvData = function(n = 100, p = 10) { | |||
cens.time = rexp(n, rate = 1 / 10) | |||
status = ifelse(real.time <= cens.time, TRUE, FALSE) | |||
obs.time = ifelse(real.time <= cens.time, real.time, cens.time) + 1 | |||
|
|||
# mark large outliers in survival as censored |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some models do not behave well otherwise. Additionally, it is closer to what we usually see in reality 😉.
test_that("setMeasurePars", { | ||
mm = mmce | ||
expect_list(mm$extra.args, len = 0L, names = "named") | ||
mm = setMeasurePars(mm, foo = 1, bar = 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example with mmce is a bit senseless? (Maybe ok, it is just for testing) Are there other measures (except of cindex.uno and iauc.uno) where setting parameters could be useful at the moment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't know if there are others. Parameters of cindex.uno and iauc.uno are tested in the other file.
@@ -0,0 +1,43 @@ | |||
context("survival measures") | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why an extra file instead of putting it into test_base_measures.R?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just personal preference (in order to let the base group of tests run quickly), we could also join the two files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer splitting into more files as well. We're already running into issues with tests being too large.
* Check that fixup.data is used * Use fixup.data and check.data in cluster, multilabel * Make make[Un]SupervisedTask parameters non-optional
* xgboost: expose watchlist and callbacks; remove silent from params; set default lambda=1; add tweedie_variance_power param * disable TODO linter
…ng and to prevent errors in wrapped feature selection and imputation (#1871) * Allows nested feature selection and tuning * Only restrict tune wrappers from being wrapped * Add tests for nested nested resampling positive case and negative case for wrapping tuning wrapper * Fix lintrbot issues
Is this ready to be merged? |
Tests have been failing randomly (connection issues, timeouts). But now everything looks good, so yes, merging. Thanks for reviewing. |
Summary:
setMeasurePars()
).cindex.uno
andiauc.uno
. We had more measures in the other PR (Survival measures #1372), but they turned out to be numerical unstable. They can still be integrated in separate PRs in the future (if required).surv.cforest
predicted median survival times of the respective terminal nodes (note that this is undocumented). But we expect the survival learners to predict a numerical value which is high if the individual is expected to die quickly. Thus this learner yielded reversed predictions. Fixed by multiplying with -1. Additionally, I've added a test to check if the performance on a really simple task is closer to the best possible value than to the worst possible.surv.penalized
predict risks. The method suggested by the package authors seems to be broken, yielding constant predictions for very basic tasks. I've removed the learner, and glmnet seems to be a viable alternative for penalized survival regression.