Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Survival stuff #1833

Merged
merged 19 commits into from
Jul 11, 2017
Merged

Improve Survival stuff #1833

merged 19 commits into from
Jul 11, 2017

Conversation

mllg
Copy link
Member

@mllg mllg commented Jun 9, 2017

Summary:

  • New helper to set measure parameters (setMeasurePars()).
  • Two new survival measures: cindex.uno and iauc.uno. We had more measures in the other PR (Survival measures #1372), but they turned out to be numerical unstable. They can still be integrated in separate PRs in the future (if required).
  • surv.cforest predicted median survival times of the respective terminal nodes (note that this is undocumented). But we expect the survival learners to predict a numerical value which is high if the individual is expected to die quickly. Thus this learner yielded reversed predictions. Fixed by multiplying with -1. Additionally, I've added a test to check if the performance on a really simple task is closer to the best possible value than to the worst possible.
  • There is currently no way to let surv.penalized predict risks. The method suggested by the package authors seems to be broken, yielding constant predictions for very basic tasks. I've removed the learner, and glmnet seems to be a viable alternative for penalized survival regression.

R/measures.R Outdated
cindex.uno = makeMeasure(id = "cindex.uno", minimize = FALSE, best = 1, worst = 0,
properties = c("surv", "req.pred", "req.truth", "req.model"),
name = "Uno's Concordance index",
note = "Fraction of all pairs of subjects whose predicted survival times are correctly ordered among all subjects that can actually be ordered. In other words, it is the probability of concordance between the predicted and the observed survival. Corrected by weighting with IPCW as suggested by Uno.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a paper reference for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added references.

},
extra.args = list(max.time = NULL, resolution = 1000)
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add hand-constructed tests for the new measures please?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by hand-constructed? Calculating these measures without a package would require a few hundred LOC.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For most of the other measure tests along the lines of incorrect predictions 5, correct predictions 10, therefore error rate 33%. Check that implemented measure gets that number. The point is to check that the number is correct for specific cases (and these can be constructed, i.e. you know what the answer should be).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's complicated. I've added a small test to check if perfect predictions lead to (nearly) perfect performance if there is no censoring. For all other cases, I'd need an external package because you cannot compute this by hand (in a reasonable time frame). I guess we have to rely on the package authors of survAUC for correctness.

@PhilippPro Do you have any ideas how to test those measures?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, not really. Of course one can construct simple cases without censoring that can be calculated by hand, but with censoring we have to use the complicated formulas from Uno's Paper here (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3079915/), which do not look very simple at first glance.

predict(.model$learner.model, newdata = .newdata, type = "link")
else
stop("Unknown predict type")
predict(.model$learner.model, newdata = .newdata, type = "link")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the if no longer necessary here (and below)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Survival learners do not support multiple predict types currently. There was an attempt to support survival probabilities, but this is not implemented. The calling function checks for predict type and matches against properties, so this is dead code.

@@ -53,7 +53,8 @@ trainLearner.surv.cforest = function(.learner, .task, .subset,

#' @export
predictLearner.surv.cforest = function(.learner, .model, .newdata, ...) {
predict(.model$learner.model, newdata = .newdata, ...)
# cforest returns median survival times; multiply by -1 so that high values correspond to high risk
-1 * predict(.model$learner.model, newdata = .newdata, type = "response", ...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test for this please?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a test in this PR which detects if the predictions are reversed/inverted.

@@ -92,6 +92,13 @@ getSurvData = function(n = 100, p = 10) {
cens.time = rexp(n, rate = 1 / 10)
status = ifelse(real.time <= cens.time, TRUE, FALSE)
obs.time = ifelse(real.time <= cens.time, real.time, cens.time) + 1

# mark large outliers in survival as censored
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some models do not behave well otherwise. Additionally, it is closer to what we usually see in reality 😉.

test_that("setMeasurePars", {
mm = mmce
expect_list(mm$extra.args, len = 0L, names = "named")
mm = setMeasurePars(mm, foo = 1, bar = 2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example with mmce is a bit senseless? (Maybe ok, it is just for testing) Are there other measures (except of cindex.uno and iauc.uno) where setting parameters could be useful at the moment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know if there are others. Parameters of cindex.uno and iauc.uno are tested in the other file.

@@ -0,0 +1,43 @@
context("survival measures")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why an extra file instead of putting it into test_base_measures.R?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just personal preference (in order to let the base group of tests run quickly), we could also join the two files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer splitting into more files as well. We're already running into issues with tests being too large.

mllg and others added 6 commits June 18, 2017 23:49
* Check that fixup.data is used

* Use fixup.data and check.data in cluster, multilabel

* Make make[Un]SupervisedTask parameters non-optional
* xgboost: expose watchlist and callbacks; remove silent from params; set default lambda=1; add tweedie_variance_power param

* disable TODO linter
…ng and to prevent errors in wrapped feature selection and imputation (#1871)

* Allows nested feature selection and tuning

* Only restrict tune wrappers from being wrapped

* Add tests for nested nested resampling positive case and negative case for wrapping tuning wrapper

* Fix lintrbot issues
@larskotthoff
Copy link
Member

Is this ready to be merged?

@mllg mllg merged commit 90f7422 into master Jul 11, 2017
@mllg
Copy link
Member Author

mllg commented Jul 11, 2017

Tests have been failing randomly (connection issues, timeouts). But now everything looks good, so yes, merging. Thanks for reviewing.

@mllg mllg deleted the survival branch July 11, 2017 18:20
mllg added a commit that referenced this pull request Jul 11, 2017
@mb706 mb706 mentioned this pull request Dec 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants