forked from moreno-betancur/survMCOD
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
130 lines (84 loc) · 7.38 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
```
# survMCOD
[![License](https://img.shields.io/badge/License-GPL%20%28%3E=%203%29-brightgreen.svg)](http://www.gnu.org/licenses/gpl-3.0.html)
**survMCOD** is an R package to fit Cox regression models for the hazard of death due to a disease of interest
based on multiple cause-of-death data ("Multiple-cause analysis"). Such data are usually obtained by extracting all the diseases mentioned on the death certificate, not only the so-called "underlying cause of death". This multiple-cause mortality model thus acknowledges that death may be caused by several disease processes acting together. It was proposed by [Moreno-Betancur et al. (2017)](http://journals.lww.com/epidem/Abstract/2017/01000/Survival_Analysis_with_Multiple_Causes_of_Death_.3.aspx), and it is the first formal extension of the single-cause competing risks model to that setting.
Specifically, using all deaths partially attributed to the disease of interest, we model the pure hazard rate of the disease of interest. This is the rate of deaths caused exclusively by that disease, and is thus a quantity that is conceptually closer to the marginal "causal" hazard than the cause-specific hazard. The latter is the quantity modeled when using competing risks Cox regression based on the underlying cause of death and ignoring all other diseases mentioned on the death certificate ("Single-cause analysis").
**Note:** Please note that the version available on GitHub is the most up-to-date *development* version of the package. A stable version of the package will be available from CRAN once it is released.
## Getting Started
### Prerequisites
The **survMCOD** package requires the **survival** and **Matrix** packages.
### Installation
The **survMCOD** package can be installed directly from GitHub using the **devtools** package. To do this you should first check you have devtools installed by executing the following commands from within your R session:
```r
if (!require(devtools)) {
install.packages("devtools")
}
```
Then execute the following commands to install **survMCOD**:
```r
library(devtools)
install_github("moreno-betancur/survMCOD")
```
### Reference manual
The full reference manual of the package can be dowloaded [here](https://rawgit.com/moreno-betancur/Reference_manuals/master/survMCOD.pdf).
## Example
To illustrate the use of the package, we use data simulated using the package's `simMCOD` function (use `?simMCOD` for more details):
```{r tidy=T}
library(survMCOD)
set.seed(234) #we set a seed to ensure we get the same simulated dataset everytime
datEx<-simMCOD(n=4000,xi=-1,rho=-2,phi=0,
pgen=c(1,0,0.75,0.25,0.125,0.083),
lambda=0.001,v=2,pUC=c(1,0.75))
head(datEx)
```
The dataset is of the format required by `survMCOD`. It contains one row per invididual, and the variables consist of:
* The covariates to include as regressors in the models (X1 and Z1).
* A time of entry and a time of exit corresponding to the set-up of a time-to-event outcome with delayed entry and right censoring (TimeEntry and TimeExit). N.B. The package also handles the simple right-censoring case.
* The vital status of the indvidual at TimeExit (Status), coded 1=died and 0=alive.
* A weight indicating the proportion of the death attributed to the disease of interest (Pi) which is a number between 0 and 1 if Status=1 and missing (NA) if Status=0.
* The underlying cause indicator (UC) which is 1 if the disease of interest was selected as underlying cause using WHO rules and 0 otherwise.
When analysing your own survival data with multiple causes of death, a key preliminary step to fitting models with `survMCOD` will be to assign a weight to each death that represents the proportion of the death attributed to the disease of interest. The user is referred to [Moreno-Betancur et al. (2017)](http://journals.lww.com/epidem/Abstract/2017/01000/Survival_Analysis_with_Multiple_Causes_of_Death_.3.aspx), [Piffaretti et al. (2016)](http://cdrwww.who.int/bulletin/volumes/94/12/16-172189.pdf) and [Rey et al. (2017)](http://invs.santepubliquefrance.fr/beh/2017/1/pdf/2017_1_2.pdf) for descriptions and discussions of various weight-attribution strategies.
The assumptions of the multiple-cause model and details of the estimation procedure are provided in [Moreno-Betancur et al. (2017)](http://journals.lww.com/epidem/Abstract/2017/01000/Survival_Analysis_with_Multiple_Causes_of_Death_.3.aspx). A key feature of the model is that a Cox model for the disease of interest and a Cox model for other causes need to be fitted simultaneously, and deaths with a weight between 0 and 1 will contribute to both. This is why the user needs to specify regressors for each of these models using the two arguments `formula` and `formOther` of the function as follows (use `?survMCOD` and `?SurvM` for more details).
```{r tidy=T}
fitMCOD<-survMCOD(SurvM(time=TimeEntry,time2=TimeExit,status=Status,
weight=Pi)~X1,
formOther=~Z1,data=datEx,UC_indicator="UC")
```
One aspect of the multiple-cause model is that a fully parametric model for the log ratio of the
baseline pure hazards needs to be posited. The current default is to parametrise this
a piecewise constant function with cut-offs at the 25th, 50th and 75th percentile of the
failure time distribution of pure events (with weight=1). Future versions will provide
the user control over this. Thus the model fit provides regression coefficient estimates for the models relating to the disease of interest and other diseases (as log-hazard ratios) and also estimates of the piecewise constant estimated log ratio of the baseline hazards. These results are extracted as follows:
```{r include = T, tidy=T}
fitMCOD[["Multiple-cause"]]
```
The results from the single-cause analysis can also be extracted from the object, as follows:
```{r include = T, tidy=T}
fitMCOD[["Single-cause"]]
```
The convergence of the multiple-cause analysis should be checked using (use `?check.survMCOD` for more details):
```{r include = T, tidy=T}
check.survMCOD(fitMCOD)
```
Healthy convergence is seen by curves showing variation in estimates across the first three points, followed by a stabilisation of the curve around the final estimate.
Future versions will define a proper class of for objects created by `survMCOD` and summary and other such methods.
## Bug Reports
If you find any bugs, please report them via email to [Margarita Moreno-Betancur](mailto:[email protected]).
## References
1. Moreno-Betancur M, Sadaoui H, Piffaretti C, Rey G. [Survival analysis with multiple causes of death:
Extending the competing risks model](http://journals.lww.com/epidem/Abstract/2017/01000/Survival_Analysis_with_Multiple_Causes_of_Death_.3.aspx). Epidemiology 2017; 28(1): 12-19.
2. Piffaretti C, Moreno-Betancur M, Lamarche-Vadel A, Rey G. [Quantifying cause-related mortality by
weighting multiple causes of death](http://cdrwww.who.int/bulletin/volumes/94/12/16-172189.pdf). Bulletin of the World Health Organization 2016; 94:870-879B.
3. Rey G, Piffaretti C, Rondet C, Lamarche-Vadel A, Moreno-Betancur M. [Analyse de la mortalite par cause :
ponderation des causes multiples](http://invs.santepubliquefrance.fr/beh/2017/1/pdf/2017_1_2.pdf). Bulletin Epidemiologique Hebdomadaire, 2017; (1): 13-9.