Improve Survival stuff #1833

mllg · 2017-06-09T12:01:24Z

Summary:

New helper to set measure parameters (setMeasurePars()).
Two new survival measures: cindex.uno and iauc.uno. We had more measures in the other PR (Survival measures #1372), but they turned out to be numerical unstable. They can still be integrated in separate PRs in the future (if required).
surv.cforest predicted median survival times of the respective terminal nodes (note that this is undocumented). But we expect the survival learners to predict a numerical value which is high if the individual is expected to die quickly. Thus this learner yielded reversed predictions. Fixed by multiplying with -1. Additionally, I've added a test to check if the performance on a really simple task is closer to the best possible value than to the worst possible.
There is currently no way to let surv.penalized predict risks. The method suggested by the package authors seems to be broken, yielding constant predictions for very basic tasks. I've removed the learner, and glmnet seems to be a viable alternative for penalized survival regression.

larskotthoff · 2017-06-10T21:00:45Z

R/measures.R

+cindex.uno = makeMeasure(id = "cindex.uno", minimize = FALSE, best = 1, worst = 0,
+  properties = c("surv", "req.pred", "req.truth", "req.model"),
+  name = "Uno's Concordance index",
+  note = "Fraction of all pairs of subjects whose predicted survival times are correctly ordered among all subjects that can actually be ordered. In other words, it is the probability of concordance between the predicted and the observed survival. Corrected by weighting with IPCW as suggested by Uno.",


Is there a paper reference for this?

I've added references.

larskotthoff · 2017-06-10T21:03:48Z

R/measures.R

+  },
+  extra.args = list(max.time = NULL, resolution = 1000)
+)
+


Could you add hand-constructed tests for the new measures please?

What do you mean by hand-constructed? Calculating these measures without a package would require a few hundred LOC.

For most of the other measure tests along the lines of incorrect predictions 5, correct predictions 10, therefore error rate 33%. Check that implemented measure gets that number. The point is to check that the number is correct for specific cases (and these can be constructed, i.e. you know what the answer should be).

It's complicated. I've added a small test to check if perfect predictions lead to (nearly) perfect performance if there is no censoring. For all other cases, I'd need an external package because you cannot compute this by hand (in a reasonable time frame). I guess we have to rely on the package authors of survAUC for correctness.

@PhilippPro Do you have any ideas how to test those measures?

No, not really. Of course one can construct simple cases without censoring that can be calculated by hand, but with censoring we have to use the complicated formulas from Uno's Paper here (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3079915/), which do not look very simple at first glance.

larskotthoff · 2017-06-10T21:04:23Z

R/RLearner_surv_gamboost.R

-    predict(.model$learner.model, newdata = .newdata, type = "link")
-  else
-    stop("Unknown predict type")
+  predict(.model$learner.model, newdata = .newdata, type = "link")


Why is the if no longer necessary here (and below)?

Survival learners do not support multiple predict types currently. There was an attempt to support survival probabilities, but this is not implemented. The calling function checks for predict type and matches against properties, so this is dead code.

larskotthoff · 2017-06-10T21:04:49Z

R/RLearner_surv_cforest.R

@@ -53,7 +53,8 @@ trainLearner.surv.cforest = function(.learner, .task, .subset,

 #' @export
 predictLearner.surv.cforest = function(.learner, .model, .newdata, ...) {
-  predict(.model$learner.model, newdata = .newdata, ...)
+  # cforest returns median survival times; multiply by -1 so that high values correspond to high risk
+  -1 * predict(.model$learner.model, newdata = .newdata, type = "response", ...)


Could you add a test for this please?

There is a test in this PR which detects if the predictions are reversed/inverted.

…or survival in the example section?

PhilippPro · 2017-06-12T13:41:31Z

tests/testthat/helper_objects.R

@@ -92,6 +92,13 @@ getSurvData = function(n = 100, p = 10) {
  cens.time = rexp(n, rate = 1 / 10)
  status = ifelse(real.time <= cens.time, TRUE, FALSE)
  obs.time = ifelse(real.time <= cens.time, real.time, cens.time) + 1
+
+  # mark large outliers in survival as censored


Some models do not behave well otherwise. Additionally, it is closer to what we usually see in reality 😉.

PhilippPro · 2017-06-12T13:46:38Z

tests/testthat/test_base_measures.R

+test_that("setMeasurePars", {
+  mm = mmce
+  expect_list(mm$extra.args, len = 0L, names = "named")
+  mm = setMeasurePars(mm, foo = 1, bar = 2)


Example with mmce is a bit senseless? (Maybe ok, it is just for testing) Are there other measures (except of cindex.uno and iauc.uno) where setting parameters could be useful at the moment?

Don't know if there are others. Parameters of cindex.uno and iauc.uno are tested in the other file.

PhilippPro · 2017-06-12T13:54:15Z

tests/testthat/test_surv_measures.R

@@ -0,0 +1,43 @@
+context("survival measures")
+


Why an extra file instead of putting it into test_base_measures.R?

This is just personal preference (in order to let the base group of tests run quickly), we could also join the two files.

I prefer splitting into more files as well. We're already running into issues with tests being too large.

* Check that fixup.data is used * Use fixup.data and check.data in cluster, multilabel * Make make[Un]SupervisedTask parameters non-optional

* xgboost: expose watchlist and callbacks; remove silent from params; set default lambda=1; add tweedie_variance_power param * disable TODO linter

This reverts commit 17d7eac.

…ng and to prevent errors in wrapped feature selection and imputation (#1871) * Allows nested feature selection and tuning * Only restrict tune wrappers from being wrapped * Add tests for nested nested resampling positive case and negative case for wrapping tuning wrapper * Fix lintrbot issues

larskotthoff · 2017-07-11T16:00:20Z

Is this ready to be merged?

mllg · 2017-07-11T18:20:45Z

Tests have been failing randomly (connection issues, timeouts). But now everything looks good, so yes, merging. Thanks for reviewing.

mllg added 3 commits June 9, 2017 13:42

...

068fc2d

removed timeROC from Suggests

8be0fd3

improve test

49f2867

mllg requested review from studerus, larskotthoff, berndbischl, schiffner and PhilippPro June 9, 2017 12:03

mllg added pr-please review prio-medium type-bug labels Jun 9, 2017

fixed test

4396ecf

larskotthoff requested changes Jun 10, 2017

View reviewed changes

mllg and others added 5 commits June 12, 2017 09:52

Merge remote-tracking branch 'origin/master' into survival

9042625

added reference for uno

fea0216

ref for iauc.uno

154dd02

added an additional test

b28d209

Make the help file of performance better. Maybe also add an example f…

d3da8b1

…or survival in the example section?

PhilippPro reviewed Jun 12, 2017

View reviewed changes

mllg added 4 commits June 13, 2017 10:53

added test for surv measures

d6ed181

Merge remote-tracking branch 'origin/master' into survival

f57b70a

Merge remote-tracking branch 'origin/master' into survival

a96c043

dropped support for lcens and icens

00631d3

larskotthoff approved these changes Jun 15, 2017

View reviewed changes

mllg and others added 6 commits June 18, 2017 23:49

Merge remote-tracking branch 'origin/master' into survival

d78d59b

Fix 1857 fixup.data (#1858)

41df983

* Check that fixup.data is used * Use fixup.data and check.data in cluster, multilabel * Make make[Un]SupervisedTask parameters non-optional

xgboost: expose watchlist and callbacks (#1859)

4b40764

* xgboost: expose watchlist and callbacks; remove silent from params; set default lambda=1; add tweedie_variance_power param * disable TODO linter

Revert "BSD LICENSE file (#1772)" (#1867)

a21c2d6

This reverts commit 17d7eac.

NEWS for #1871

99ad790

mllg force-pushed the survival branch from d0710fa to 99ad790 Compare July 11, 2017 13:11

mllg merged commit 90f7422 into master Jul 11, 2017

mllg deleted the survival branch July 11, 2017 18:20

mllg added a commit that referenced this pull request Jul 11, 2017

NEWS for #1833

8679be4

mb706 mentioned this pull request Dec 13, 2017

Fix task data #2107

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Survival stuff #1833

Improve Survival stuff #1833

mllg commented Jun 9, 2017 •

edited

Loading

larskotthoff Jun 10, 2017

mllg Jun 12, 2017

larskotthoff Jun 10, 2017

mllg Jun 12, 2017

larskotthoff Jun 13, 2017

mllg Jun 13, 2017

PhilippPro Jun 13, 2017

larskotthoff Jun 10, 2017

mllg Jun 12, 2017

larskotthoff Jun 10, 2017

mllg Jun 12, 2017

PhilippPro Jun 12, 2017

mllg Jun 12, 2017

PhilippPro Jun 12, 2017

mllg Jun 12, 2017

PhilippPro Jun 12, 2017

mllg Jun 12, 2017

larskotthoff Jun 13, 2017

larskotthoff commented Jul 11, 2017

mllg commented Jul 11, 2017

Improve Survival stuff #1833

Improve Survival stuff #1833

Conversation

mllg commented Jun 9, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

larskotthoff commented Jul 11, 2017

mllg commented Jul 11, 2017

mllg commented Jun 9, 2017 •

edited

Loading