-
Notifications
You must be signed in to change notification settings - Fork 721
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Updated some examples and also updated bibliography to include some papers at the core of the library that we were missing.
- Loading branch information
1 parent
af054c4
commit 697c595
Showing
1 changed file
with
42 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,7 +16,7 @@ techniques with econometrics to bring automation to complex causal inference pro | |
* Build on standard Python packages for Machine Learning and Data Analysis | ||
|
||
One of the biggest promises of machine learning is to automate decision making in a multitude of domains. At the core of many data-driven personalized decision scenarios is the estimation of heterogeneous treatment effects: what is the causal effect of an intervention on an outcome of interest for a sample with a particular set of features? In a nutshell, this toolkit is designed to measure the causal effect of some treatment variable(s) `T` on an outcome | ||
variable `Y`, controlling for a set of features `X`. The methods implemented are applicable even with observational (non-experimental or historical) datasets. | ||
variable `Y`, controlling for a set of features `X, W` and how does that effect vary as a function of `X`. The methods implemented are applicable even with observational (non-experimental or historical) datasets. For the estimation results to have a causal interpretation, some methods assume no unobserved confounders (i.e. there is no unobserved variable not included in `X, W` that simultaneously has an effect on both `T` and `Y`), while others assume access to an instrument `Z` (i.e. an observed variable `Z` that has an effect on the treatment `T` but no direct effect on the outcome `Y`). Most methods provide confidence intervals and inference results. | ||
|
||
For detailed information about the package, consult the documentation at https://econml.azurewebsites.net/. | ||
|
||
|
@@ -84,7 +84,7 @@ To install from source, see [For Developers](#for-developers) section below. | |
### Estimation Methods | ||
|
||
<details> | ||
<summary>Double Machine Learning (click to expand)</summary> | ||
<summary>Double Machine Learning (aka RLearner) (click to expand)</summary> | ||
|
||
* Linear final stage | ||
|
||
|
@@ -117,7 +117,7 @@ To install from source, see [For Developers](#for-developers) section below. | |
lb, ub = est.effect_interval(X_test, alpha=0.05) # Confidence intervals via debiased lasso | ||
``` | ||
|
||
* Nonparametric last stage | ||
* Forest last stage | ||
|
||
```Python | ||
from econml.dml import ForestDML | ||
|
@@ -129,6 +129,20 @@ To install from source, see [For Developers](#for-developers) section below. | |
# Confidence intervals via Bootstrap-of-Little-Bags for forests | ||
lb, ub = est.effect_interval(X_test, alpha=0.05) | ||
``` | ||
|
||
* Generic Machine Learning last stage | ||
|
||
```Python | ||
from econml.dml import NonParamDML | ||
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier | ||
|
||
est = NonParamDML(model_y=RandomForestRegressor(), | ||
model_t=RandomForestClassifier(), | ||
model_final=RandomForestRegressor(), | ||
discrete_treatment=True) | ||
est.fit(Y, T, X=X, W=W) | ||
treatment_effects = est.effect(X_test) | ||
``` | ||
|
||
</details> | ||
|
||
|
@@ -367,16 +381,28 @@ as p-values and z-statistics. When the CATE model is linear and parametric, then | |
est.effect_inference(X_test).summary_frame(alpha=0.05, value=0, decimals=3) | ||
# Get the population summary for the entire sample X | ||
est.effect_inference(X_test).population_summary(alpha=0.1, value=0, decimals=3, tol=0.001) | ||
# Get the inference summary for the final model | ||
# Get the parameter inference summary for the final model | ||
est.summary() | ||
``` | ||
|
||
<details><summary>Example Output (click to expand)</summary> | ||
|
||
```Python | ||
# Get the effect inference summary, which includes the standard error, z test score, p value, and confidence interval given each sample X[i] | ||
est.effect_inference(X_test).summary_frame(alpha=0.05, value=0, decimals=3) | ||
``` | ||
![image](notebooks/images/summary_frame.png) | ||
|
||
```Python | ||
# Get the population summary for the entire sample X | ||
est.effect_inference(X_test).population_summary(alpha=0.1, value=0, decimals=3, tol=0.001) | ||
``` | ||
![image](notebooks/images/population_summary.png) | ||
|
||
```Python | ||
# Get the parameter inference summary for the final model | ||
est.summary() | ||
``` | ||
![image](notebooks/images/summary.png) | ||
|
||
</details> | ||
|
@@ -448,6 +474,10 @@ contact [[email protected]](mailto:[email protected]) with any additio | |
|
||
# References | ||
|
||
X Nie, S Wager. | ||
**Quasi-Oracle Estimation of Heterogeneous Treatment Effects.** | ||
[*Biometrika*](https://doi.org/10.1093/biomet/asaa076), 2020 | ||
|
||
V. Syrgkanis, V. Lei, M. Oprescu, M. Hei, K. Battocchi, G. Lewis. | ||
**Machine Learning Estimation of Heterogeneous Treatment Effects with Instruments.** | ||
[*Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS)*](https://arxiv.org/abs/1905.10176), 2019 | ||
|
@@ -466,10 +496,18 @@ S. Künzel, J. Sekhon, J. Bickel and B. Yu. | |
**Metalearners for estimating heterogeneous treatment effects using machine learning.** | ||
[*Proceedings of the national academy of sciences, 116(10), 4156-4165*](https://www.pnas.org/content/116/10/4156), 2019. | ||
|
||
S. Athey, J. Tibshirani, S. Wager. | ||
**Generalized random forests.** | ||
[*Annals of Statistics, 47, no. 2, 1148--1178*](https://projecteuclid.org/euclid.aos/1547197251), 2019. | ||
|
||
V. Chernozhukov, D. Nekipelov, V. Semenova, V. Syrgkanis. | ||
**Plug-in Regularized Estimation of High-Dimensional Parameters in Nonlinear Semiparametric Models.** | ||
[*Arxiv preprint arxiv:1806.04823*](https://arxiv.org/abs/1806.04823), 2018. | ||
|
||
S. Wager, S. Athey. | ||
**Estimation and Inference of Heterogeneous Treatment Effects using Random Forests.** | ||
[*Journal of the American Statistical Association, 113:523, 1228-1242*](https://www.tandfonline.com/doi/citedby/10.1080/01621459.2017.1319839), 2018. | ||
|
||
Jason Hartford, Greg Lewis, Kevin Leyton-Brown, and Matt Taddy. **Deep IV: A flexible approach for counterfactual prediction.** [*Proceedings of the 34th International Conference on Machine Learning, ICML'17*](http://proceedings.mlr.press/v70/hartford17a/hartford17a.pdf), 2017. | ||
|
||
V. Chernozhukov, D. Chetverikov, M. Demirer, E. Duflo, C. Hansen, and a. W. Newey. **Double Machine Learning for Treatment and Causal Parameters.** [*ArXiv preprint arXiv:1608.00060*](https://arxiv.org/abs/1608.00060), 2016. |