DoWhy Logistic Regression with Stats Api #296

cleli94 · 2021-07-12T17:05:07Z

Dear authors,

I am using dowhy for a project, and it is a GREAT tool!

Basically, I was comparing the results obtained with the method backdoor with logistic regression using stats api as suggeted by you with a method created from scratch using scikit-learn. The results were very different, and mine seemed to be the more plausible. Moreover, the result should be the same as the S-Learner with LR, If I am not mistaken. Mine was equal, while using stats api very different.

I think there could be an issue with the GLM methods: when you call .predict with GLM from stats, you do not obtain the prediction (i.e., 0 - 1) but you obtain the probability. While in scikit-learn you obtained directly the class prediction:

**

model_sklearn.predict_proba(X)[:,1] == model_statsmodel.predict(X)
model_sklearn.predict(X) == (model_statsmodel.predict(X)>0.5).astype(int)

**

So, is it true that you're actually using .predict returning the probabilites? In this case, why are you taking the probabilities for computing the ATE instead of the class prediction?

Thank you very much in advance!

amit-sharma · 2022-03-06T06:01:35Z

For most cases, probabilities are the correct output to use for computing the causal effect on a binary output. The expression is,
E[Y|do(T=1] - E[Y|do(T=0] = P[Y=1|do(T=1] - P[Y=1|do(T=0]
so it makes sense to use the probabilities.

To see an extreme example, consider that T and Y are both binary and there are no confounders. The true generating equation for Y is, y=Bernoulli(sigmoid(t*beta + N(0,0.01)) and beta is 0. So the causal effect of T on Y is zero.

Using logistic regression and the score/probability as the output, estimated P(Y=1|T=1) and P(Y=1|T=0) will be nearly the same and causal estimate will be zero.
Using the 0/1 class as output, the causal estimate can be 1 which is incorrect. This would happen whenever one of estimated P(Y=1|T=1) and P(Y=1|T=0) is less than 0.5 and the other is more than 0.5. For example, all inputs with T=1 will be predicted as 1, and all inputs with T=0 will be predicted zero.

Still, it can be useful to add flexibility to directly output the class prediction, e.g., for comparison with a default logistic metalearner. I've added an PR #386 for adding an argument predict_score to the GLM estimator. This can be specified in method_params of estimate_effect. It is True by default.

amit-sharma self-assigned this Jul 28, 2021

lucasqcdh mentioned this issue Feb 8, 2022

Binary outcome, continuous treatment #377

Closed

amit-sharma mentioned this issue Mar 6, 2022

Allow using output class for binary outcome for do operation #386

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DoWhy Logistic Regression with Stats Api #296

DoWhy Logistic Regression with Stats Api #296

cleli94 commented Jul 12, 2021 •

edited

Loading

amit-sharma commented Mar 6, 2022

DoWhy Logistic Regression with Stats Api #296

DoWhy Logistic Regression with Stats Api #296

Comments

cleli94 commented Jul 12, 2021 • edited Loading

amit-sharma commented Mar 6, 2022

cleli94 commented Jul 12, 2021 •

edited

Loading