Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to reverse reference level in Cox regression output? #458

Closed
plpxsk opened this issue May 11, 2018 · 4 comments
Closed

How to reverse reference level in Cox regression output? #458

plpxsk opened this issue May 11, 2018 · 4 comments

Comments

@plpxsk
Copy link

plpxsk commented May 11, 2018

With a categorical covariate with 2 levels: "A" and "B", it looks like "A" is kept as reference level, so that cox regression output shows hazard ratio of B to A. What if I'd prefer the reversed hazard ratio of A to B (so that "B" is reference level")

To clarify, in R, this is done via "factor" and then you can set a reference factor of "B", for example.

Is this possible in lifelines?

@CamDavidsonPilon
Copy link
Owner

Hey @pavopax - just so I'm clear, can you show an example of what this might look like in R code?

@plpxsk
Copy link
Author

plpxsk commented May 11, 2018

sure:

library(dplyr)
library(forcats)
library(survival)

N <- 20
df <- data_frame(
  time=rexp(N),
  event=as.integer(rbinom(N, 1, .5)),
  treatment_=c(rep("control", N/2), rep("active", N/2)))

cox <- coxph(Surv(time, event) ~ treatment_, data=df)

In the model summary below, exp(coef) for "treatment_control" is the hazard ratio of control:active , where active is the "reference" since it is alphabetically before "control"

> cox
Call:
coxph(formula = Surv(time, event) ~ treatment_, data = df)

                   coef exp(coef) se(coef)     z    p
treatment_control -0.373     0.689    0.917 -0.41 0.68

And I'd like to see the reverse HR: active to control, where control is the reference ((common in pharma/biotech).

So I reverse it like so:

df$treatment_  <- df$treatment_ %>% as.factor() %>% forcats::fct_rev()

And the result:

cox <- coxph(Surv(time, event) ~ treatment_, data=df)

> cox
Call:
coxph(formula = Surv(time, event) ~ treatment_, data = df)

                 coef exp(coef) se(coef)    z    p
treatment_active 0.373     1.452    0.917 0.41 0.68

Now, this shows me the desired hazard ratio, which I obtained by reversing the "treatment_" factor variable via fct_rev()

(It is actually easy to get this hazard ratio: exp(coef) is just 1/exp(coef) from the first cox model )

@CamDavidsonPilon
Copy link
Owner

Let me know if this solves your question: in lifelines, all the processing & encoding is done outside of lifelines. That is, before the dataframe even hits lifelines, it should be encoded as you wish. So it's up to you whether the baseline is control or treatment in the dataframe column. The inference should proceed as normal, and the resulting cph.print_summary should represent the choice chosen.

Popular encoding functions are available in the patsy library (which has a R-style formula system) and Pandas get_dummies function (but this doesn't allow you wish to chose as the baseline.

df = pd.DataFrame.from_records([
  {'t': 1, 'e': 0, 'cata': 'control'},
  {'t': 2, 'e': 0, 'cata': 'treatment'},
  {'t': 3, 'e': 0, 'cata': 'treatment'},
  {'t': 4, 'e': 0, 'cata': 'control'},
  {'t': 5, 'e': 0, 'cata': 'treatment'},
  {'t': 6, 'e': 0, 'cata': 'control'},
])

df = pd.get_dummies(df, columns='cata')
# choice is now yours to make either baseline, so either:
del df['cata_control']
# or 
del df['cata_treatment']

cph = CoxPHFitter()
cpf.fit(df, 't', 'e')
...

@plpxsk
Copy link
Author

plpxsk commented May 11, 2018

Yes, this makes complete sense 💯

As suggested, I will make dummies with pd.get_dummies(drop_first=False) and then drop my "reference" level column from the data frame.

And I fully agree with your philosophy of "do all processing/encoding outside of lifelines."

So the modification could instead be made in pd.get_dummies() as discussed at pandas-dev/pandas#12042 (comment). For future viewers, a more general relevant thread is at pandas-dev/pandas#8918.

Issue can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants