-
-
Notifications
You must be signed in to change notification settings - Fork 562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to reverse reference level in Cox regression output? #458
Comments
Hey @pavopax - just so I'm clear, can you show an example of what this might look like in R code? |
sure: library(dplyr)
library(forcats)
library(survival)
N <- 20
df <- data_frame(
time=rexp(N),
event=as.integer(rbinom(N, 1, .5)),
treatment_=c(rep("control", N/2), rep("active", N/2)))
cox <- coxph(Surv(time, event) ~ treatment_, data=df) In the model summary below, > cox
Call:
coxph(formula = Surv(time, event) ~ treatment_, data = df)
coef exp(coef) se(coef) z p
treatment_control -0.373 0.689 0.917 -0.41 0.68 And I'd like to see the reverse HR: active to control, where control is the reference ((common in pharma/biotech). So I reverse it like so: df$treatment_ <- df$treatment_ %>% as.factor() %>% forcats::fct_rev() And the result: cox <- coxph(Surv(time, event) ~ treatment_, data=df)
> cox
Call:
coxph(formula = Surv(time, event) ~ treatment_, data = df)
coef exp(coef) se(coef) z p
treatment_active 0.373 1.452 0.917 0.41 0.68 Now, this shows me the desired hazard ratio, which I obtained by reversing the "treatment_" factor variable via (It is actually easy to get this hazard ratio: |
Let me know if this solves your question: in lifelines, all the processing & encoding is done outside of lifelines. That is, before the dataframe even hits lifelines, it should be encoded as you wish. So it's up to you whether the baseline is control or treatment in the dataframe column. The inference should proceed as normal, and the resulting Popular encoding functions are available in the patsy library (which has a R-style formula system) and Pandas df = pd.DataFrame.from_records([
{'t': 1, 'e': 0, 'cata': 'control'},
{'t': 2, 'e': 0, 'cata': 'treatment'},
{'t': 3, 'e': 0, 'cata': 'treatment'},
{'t': 4, 'e': 0, 'cata': 'control'},
{'t': 5, 'e': 0, 'cata': 'treatment'},
{'t': 6, 'e': 0, 'cata': 'control'},
])
df = pd.get_dummies(df, columns='cata')
# choice is now yours to make either baseline, so either:
del df['cata_control']
# or
del df['cata_treatment']
cph = CoxPHFitter()
cpf.fit(df, 't', 'e')
... |
Yes, this makes complete sense 💯 As suggested, I will make dummies with And I fully agree with your philosophy of "do all processing/encoding outside of lifelines." So the modification could instead be made in Issue can be closed. |
With a categorical covariate with 2 levels: "A" and "B", it looks like "A" is kept as reference level, so that cox regression output shows hazard ratio of B to A. What if I'd prefer the reversed hazard ratio of A to B (so that "B" is reference level")
To clarify, in R, this is done via "factor" and then you can set a reference factor of "B", for example.
Is this possible in
lifelines
?The text was updated successfully, but these errors were encountered: