Skip to content

Commit

Permalink
Update README.md (#324)
Browse files Browse the repository at this point in the history
* Updated some examples and also updated bibliography to include some papers at the core of the library that we were missing.
  • Loading branch information
vsyrgkanis authored Nov 19, 2020
1 parent af054c4 commit 697c595
Showing 1 changed file with 42 additions and 4 deletions.
46 changes: 42 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ techniques with econometrics to bring automation to complex causal inference pro
* Build on standard Python packages for Machine Learning and Data Analysis

One of the biggest promises of machine learning is to automate decision making in a multitude of domains. At the core of many data-driven personalized decision scenarios is the estimation of heterogeneous treatment effects: what is the causal effect of an intervention on an outcome of interest for a sample with a particular set of features? In a nutshell, this toolkit is designed to measure the causal effect of some treatment variable(s) `T` on an outcome
variable `Y`, controlling for a set of features `X`. The methods implemented are applicable even with observational (non-experimental or historical) datasets.
variable `Y`, controlling for a set of features `X, W` and how does that effect vary as a function of `X`. The methods implemented are applicable even with observational (non-experimental or historical) datasets. For the estimation results to have a causal interpretation, some methods assume no unobserved confounders (i.e. there is no unobserved variable not included in `X, W` that simultaneously has an effect on both `T` and `Y`), while others assume access to an instrument `Z` (i.e. an observed variable `Z` that has an effect on the treatment `T` but no direct effect on the outcome `Y`). Most methods provide confidence intervals and inference results.

For detailed information about the package, consult the documentation at https://econml.azurewebsites.net/.

Expand Down Expand Up @@ -84,7 +84,7 @@ To install from source, see [For Developers](#for-developers) section below.
### Estimation Methods

<details>
<summary>Double Machine Learning (click to expand)</summary>
<summary>Double Machine Learning (aka RLearner) (click to expand)</summary>

* Linear final stage

Expand Down Expand Up @@ -117,7 +117,7 @@ To install from source, see [For Developers](#for-developers) section below.
lb, ub = est.effect_interval(X_test, alpha=0.05) # Confidence intervals via debiased lasso
```

* Nonparametric last stage
* Forest last stage

```Python
from econml.dml import ForestDML
Expand All @@ -129,6 +129,20 @@ To install from source, see [For Developers](#for-developers) section below.
# Confidence intervals via Bootstrap-of-Little-Bags for forests
lb, ub = est.effect_interval(X_test, alpha=0.05)
```

* Generic Machine Learning last stage

```Python
from econml.dml import NonParamDML
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier

est = NonParamDML(model_y=RandomForestRegressor(),
model_t=RandomForestClassifier(),
model_final=RandomForestRegressor(),
discrete_treatment=True)
est.fit(Y, T, X=X, W=W)
treatment_effects = est.effect(X_test)
```

</details>

Expand Down Expand Up @@ -367,16 +381,28 @@ as p-values and z-statistics. When the CATE model is linear and parametric, then
est.effect_inference(X_test).summary_frame(alpha=0.05, value=0, decimals=3)
# Get the population summary for the entire sample X
est.effect_inference(X_test).population_summary(alpha=0.1, value=0, decimals=3, tol=0.001)
# Get the inference summary for the final model
# Get the parameter inference summary for the final model
est.summary()
```

<details><summary>Example Output (click to expand)</summary>

```Python
# Get the effect inference summary, which includes the standard error, z test score, p value, and confidence interval given each sample X[i]
est.effect_inference(X_test).summary_frame(alpha=0.05, value=0, decimals=3)
```
![image](notebooks/images/summary_frame.png)

```Python
# Get the population summary for the entire sample X
est.effect_inference(X_test).population_summary(alpha=0.1, value=0, decimals=3, tol=0.001)
```
![image](notebooks/images/population_summary.png)

```Python
# Get the parameter inference summary for the final model
est.summary()
```
![image](notebooks/images/summary.png)

</details>
Expand Down Expand Up @@ -448,6 +474,10 @@ contact [[email protected]](mailto:[email protected]) with any additio

# References

X Nie, S Wager.
**Quasi-Oracle Estimation of Heterogeneous Treatment Effects.**
[*Biometrika*](https://doi.org/10.1093/biomet/asaa076), 2020

V. Syrgkanis, V. Lei, M. Oprescu, M. Hei, K. Battocchi, G. Lewis.
**Machine Learning Estimation of Heterogeneous Treatment Effects with Instruments.**
[*Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS)*](https://arxiv.org/abs/1905.10176), 2019
Expand All @@ -466,10 +496,18 @@ S. Künzel, J. Sekhon, J. Bickel and B. Yu.
**Metalearners for estimating heterogeneous treatment effects using machine learning.**
[*Proceedings of the national academy of sciences, 116(10), 4156-4165*](https://www.pnas.org/content/116/10/4156), 2019.

S. Athey, J. Tibshirani, S. Wager.
**Generalized random forests.**
[*Annals of Statistics, 47, no. 2, 1148--1178*](https://projecteuclid.org/euclid.aos/1547197251), 2019.

V. Chernozhukov, D. Nekipelov, V. Semenova, V. Syrgkanis.
**Plug-in Regularized Estimation of High-Dimensional Parameters in Nonlinear Semiparametric Models.**
[*Arxiv preprint arxiv:1806.04823*](https://arxiv.org/abs/1806.04823), 2018.

S. Wager, S. Athey.
**Estimation and Inference of Heterogeneous Treatment Effects using Random Forests.**
[*Journal of the American Statistical Association, 113:523, 1228-1242*](https://www.tandfonline.com/doi/citedby/10.1080/01621459.2017.1319839), 2018.

Jason Hartford, Greg Lewis, Kevin Leyton-Brown, and Matt Taddy. **Deep IV: A flexible approach for counterfactual prediction.** [*Proceedings of the 34th International Conference on Machine Learning, ICML'17*](http://proceedings.mlr.press/v70/hartford17a/hartford17a.pdf), 2017.

V. Chernozhukov, D. Chetverikov, M. Demirer, E. Duflo, C. Hansen, and a. W. Newey. **Double Machine Learning for Treatment and Causal Parameters.** [*ArXiv preprint arXiv:1608.00060*](https://arxiv.org/abs/1608.00060), 2016.

0 comments on commit 697c595

Please sign in to comment.