Add criterion to sksurv.ensemble.RandomSurvivalForest #108

arturomoncadatorres · 2020-04-16T10:36:53Z

It would be fantastic to have criterion (i.e., the function to measure the quality of a split) as a parameter of RandomSurvivalForest. I know that currently only the log-rank splitting rule is supported. For now, this could be set as the default (and only option). In the future, this could be expanded to cover other options (for example, from the original paper conservation, log_rank_score_rule, log_rank_random) - changing the corresponding splitting code as well. This would also make the RandomSurvivalForest more similar to its scikit counterparts (e.g., RandomForestRegressor), making it (even) more compatible with other packages that build on scikit's standard structure.

I think this could be done easily in forest.py:

    def __init__(self,
                 n_estimators=100,
                 #-->
                 criterion="log_rank",
                 #-->
                 max_depth=None,
                 min_samples_split=6,
                 min_samples_leaf=3,
                 min_weight_fraction_leaf=0.,
                 max_features="auto",
                 max_leaf_nodes=None,
                 bootstrap=True,
                 oob_score=False,
                 n_jobs=None,
                 random_state=None,
                 verbose=0,
                 warm_start=False):
        super().__init__(
            base_estimator=SurvivalTree(),
            n_estimators=n_estimators,
            #-->
            criterion=criterion,
            #-->
            estimator_params=("max_depth",
                              "min_samples_split",
                              "min_samples_leaf",
                              "min_weight_fraction_leaf",
                              "max_features",
                              "max_leaf_nodes",
                              "random_state"),
            bootstrap=bootstrap,
            oob_score=oob_score,
            n_jobs=n_jobs,
            random_state=random_state,
            verbose=verbose,
            warm_start=warm_start)

        self.max_depth = max_depth
        self.min_samples_split = min_samples_split
        self.min_samples_leaf = min_samples_leaf
        self.min_weight_fraction_leaf = min_weight_fraction_leaf
        self.max_features = max_features
        self.max_leaf_nodes = max_leaf_nodes

If this is something you think it might be interesting, I would be more than happy to help with a proper PR request.

The text was updated successfully, but these errors were encountered:

james-sexton96 · 2023-04-10T19:47:59Z

Since this was posted, there's a growing literature suggesting that the time-varying nature of some features would necessitate alternative splitting strategies in RSF's.

Having only a single strategy (log-rank) that is subject to some of the same proportionality assumptions of a Cox Regression might defeat the purpose of a model ideally designed for non-linear problems.

Having at least one alternative option like a Poisson regression log-likelihood could offer an intermediate solution before open-ended splitting strategies become available.

See the following examples of varying splitting strategies:

sebp · 2023-04-14T11:26:49Z

@james-sexton96 The options for the splitting rule is quite large in the literature. I haven't followed closely the last couple of years, so I'm not sure if a consensus emerged by now. Conditional Inference Forests would definitely be interesting (see #341).

Do you have a reference for the Poisson regression log-likelihood you mentioned?

james-sexton96 · 2023-04-21T15:57:23Z

@sebp
Sure thing. See references below.

A poisson regression log-likelihood is well suited for real-world data as opposed to data with structured follow up.
There was an attempt to branch the R package randomforestSRC's survival functionality (RF-SLAM paper by Wongvibulsin below). However, both this branch and the original package appear to be unsupported.

It would be nice to mirror sklearn's random forest regressor's parameters by including a kwa for criterion, and if I have time, I can draft an implementation of a poisson split criteria!

Crowther et al. 2012
Autsin P. 2017
Wongvibulsin et al. 2019

james-sexton96 · 2023-04-21T16:55:15Z

See also, poisson criteria added to sci-kit learn

sebp added the enhancement label Jun 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add criterion to sksurv.ensemble.RandomSurvivalForest #108

Add criterion to sksurv.ensemble.RandomSurvivalForest #108

arturomoncadatorres commented Apr 16, 2020 •

edited

Loading

james-sexton96 commented Apr 10, 2023

sebp commented Apr 14, 2023 •

edited

Loading

james-sexton96 commented Apr 21, 2023 •

edited

Loading

james-sexton96 commented Apr 21, 2023

Add criterion to sksurv.ensemble.RandomSurvivalForest #108

Add criterion to sksurv.ensemble.RandomSurvivalForest #108

Comments

arturomoncadatorres commented Apr 16, 2020 • edited Loading

james-sexton96 commented Apr 10, 2023

sebp commented Apr 14, 2023 • edited Loading

james-sexton96 commented Apr 21, 2023 • edited Loading

james-sexton96 commented Apr 21, 2023

arturomoncadatorres commented Apr 16, 2020 •

edited

Loading

sebp commented Apr 14, 2023 •

edited

Loading

james-sexton96 commented Apr 21, 2023 •

edited

Loading