You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using dowhy for a project, and it is a GREAT tool!
Basically, I was comparing the results obtained with the method backdoor with logistic regression using stats api as suggeted by you with a method created from scratch using scikit-learn. The results were very different, and mine seemed to be the more plausible. Moreover, the result should be the same as the S-Learner with LR, If I am not mistaken. Mine was equal, while using stats api very different.
I think there could be an issue with the GLM methods: when you call .predict with GLM from stats, you do not obtain the prediction (i.e., 0 - 1) but you obtain the probability. While in scikit-learn you obtained directly the class prediction:
So, is it true that you're actually using .predict returning the probabilites? In this case, why are you taking the probabilities for computing the ATE instead of the class prediction?
Thank you very much in advance!
The text was updated successfully, but these errors were encountered:
For most cases, probabilities are the correct output to use for computing the causal effect on a binary output. The expression is, E[Y|do(T=1] - E[Y|do(T=0] = P[Y=1|do(T=1] - P[Y=1|do(T=0]
so it makes sense to use the probabilities.
To see an extreme example, consider that T and Y are both binary and there are no confounders. The true generating equation for Y is, y=Bernoulli(sigmoid(t*beta + N(0,0.01)) and beta is 0. So the causal effect of T on Y is zero.
Using logistic regression and the score/probability as the output, estimated P(Y=1|T=1) and P(Y=1|T=0) will be nearly the same and causal estimate will be zero.
Using the 0/1 class as output, the causal estimate can be 1 which is incorrect. This would happen whenever one of estimated P(Y=1|T=1) and P(Y=1|T=0) is less than 0.5 and the other is more than 0.5. For example, all inputs with T=1 will be predicted as 1, and all inputs with T=0 will be predicted zero.
Still, it can be useful to add flexibility to directly output the class prediction, e.g., for comparison with a default logistic metalearner. I've added an PR #386 for adding an argument predict_score to the GLM estimator. This can be specified in method_params of estimate_effect. It is True by default.
Dear authors,
I am using dowhy for a project, and it is a GREAT tool!
Basically, I was comparing the results obtained with the method backdoor with logistic regression using stats api as suggeted by you with a method created from scratch using scikit-learn. The results were very different, and mine seemed to be the more plausible. Moreover, the result should be the same as the S-Learner with LR, If I am not mistaken. Mine was equal, while using stats api very different.
I think there could be an issue with the GLM methods: when you call .predict with GLM from stats, you do not obtain the prediction (i.e., 0 - 1) but you obtain the probability. While in scikit-learn you obtained directly the class prediction:
**
**
So, is it true that you're actually using .predict returning the probabilites? In this case, why are you taking the probabilities for computing the ATE instead of the class prediction?
Thank you very much in advance!
The text was updated successfully, but these errors were encountered: