Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update paper.md, mailmap and small URL fix #359

Merged
merged 5 commits into from
Feb 17, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .mailmap
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ Ravi Prakash Garg <[email protected]> Ravi Prakash Garg <[email protected]>
Ravi Prakash Garg <[email protected]> Ravi Prakash Garg <[email protected]>
Scott Otterson <[email protected]>
Talles Alves <[email protected]>
Titipat Achakulvisut <[email protected]>
Titipat Achakulvisut <[email protected]>
Tommy Odland <[email protected]> tommyod <[email protected]>
Tommy Odland <[email protected]> Tommy <[email protected]>
Yu Umegaki <[email protected]>
Expand Down
8 changes: 4 additions & 4 deletions doc/README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# pyglmnet documentation

See full documentation page [here](http://pavanramkumar.github.io/pyglmnet/).
Please see the full documentation page [here](http://glm-tools.github.io/pyglmnet/).

We use `sphinx` to generate documentation page.
We use [`sphinx`](https://www.sphinx-doc.org/en/master/) with [`sphinx-rtd-theme`](https://sphinx-rtd-theme.readthedocs.io/en/stable/) to generate a documentation page.

To build documentation page, run `make html`. All static files will be built in
To build the documentation page, run `make html`. All static files will be built in
`_build/html/` where you can open them using the web browser.

To push built documentation page to `gh-pages`, simply run `make install`
To push the built documentation page to the `gh-pages` branch, simply run `make install`
2 changes: 1 addition & 1 deletion doc/contributing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,7 @@ If you think it is ready to merge, prefix with ``[MRG]``.

If it's a complicated feature that can evolve better with feedback, we highly
recommend making the PR when it's a work in progress (WIP). In the PR message box,
it's typically good to associate it with an issue (.e.g. "address #253")
it's typically good to associate it with an issue (e.g. "address #253")
in addition to concisely describing the most salient changes made.

Once your PR is made, the tests will run. If there are errors, they will
Expand Down
11 changes: 6 additions & 5 deletions paper/paper.bib
Original file line number Diff line number Diff line change
Expand Up @@ -57,8 +57,9 @@ @inproceedings{sklearn_api
}

@misc{Dua:2019,
author = "Dua, Dheeru and Graff, Casey",
year = "2019",
title = "{UCI} Machine Learning Repository",
url = "http://archive.ics.uci.edu/ml",
institution = "University of California, Irvine, School of Information and Computer Sciences" }
author = "Dua, Dheeru and Graff, Casey",
year = "2019",
title = "{UCI} Machine Learning Repository",
url = "http://archive.ics.uci.edu/ml",
institution = "University of California, Irvine, School of Information and Computer Sciences"
}
26 changes: 13 additions & 13 deletions paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ authors:
orcid: 0000-0002-3199-9027
affiliation: "1, 2"
- name: Titipat Achakulvisut
orcid: 0000-0002-2124-2979
affiliation: 3
- name: Aid Idrizović
affiliation: 4
Expand Down Expand Up @@ -123,14 +124,14 @@ where $\mathcal{L} (y_i, \beta_0 + \beta^T x_i)$ is the negative log-likelihood
observation ($x_i$, $y_i$), and $\lambda \mathcal{P}(\cdot)$ is the penalty that regularizes the solution,
with $\lambda$ being a hyperparameter that controls the amount of regularization.

Modern datasets can contain a number of predictor variables, and data analysis is often exploratory. Under these conditions it is critically important to regularize the model to avoid overfitting the data. Regularization works by adding penalty terms that penalize the model parameters in a variety of ways and can be used to incorporate prior knowledge about the parameters in a structured form.
Modern datasets can contain a number of predictor variables, and data analysis is often exploratory. To avoid overfitting of the data under these circumstances, it is critically important to regularize the model. Regularization works by adding penalty terms that penalize the model parameters in a variety of ways. It can be used to incorporate our prior knowledge about the parameters' distribution in a structured form.

Despite the attractiveness of regularized GLMs, the available tools in
the Python data science eco-system are highly fragmented. Specifically:
Despite the attractiveness and importance of regularized GLMs, the available tools in
the Python data science eco-system do not serve all common functionalities. Specifically:

- [statsmodels] provides a wide range of noise distributions but no regularization.
- [scikit-learn] provides elastic net regularization but only limited noise distribution options.
- [lightning] provides elastic net and group lasso regularization, but only for linear (Gaussian) and logistic (binomial) regression.
- [statsmodels] provides a wide range of noise distributions but no regularization.
- [scikit-learn] provides elastic net regularization but only limited noise distribution options.
- [lightning] provides elastic net and group lasso regularization, but only for linear (Gaussian) and logistic (binomial) regression.

## Pyglmnet is a response to a fragmented ecosystem

Expand All @@ -154,17 +155,16 @@ distributions. In particular, it implements a broader form of elastic net regula

## Pyglmnet is an extensible pure Python implementation

Pyglmnet implements the algorithm described in [Friedman, J., Hastie, T., & Tibshirani, R. (2010)](https://web.stanford.edu/~hastie/Papers/ESLII.pdf) and the accompanying popular R package [glmnet].
As opposed to [python-glmnet] or [glmnet_python], which are wrappers around this package, pyglmnet is written in pure Python for Python 3.5+. Therefore it is easier to extend and more compatible with the existing data science ecosystem.
Pyglmnet implements the algorithm described in [Friedman, J., Hastie, T., & Tibshirani, R. (2010)](https://web.stanford.edu/~hastie/Papers/ESLII.pdf) and its accompanying popular R package [glmnet].
As opposed to [python-glmnet] or [glmnet_python], which are wrappers around this R package, pyglmnet is written in pure Python for Python 3.5+. Therefore, it is easier to extend and more compatible with the existing data science ecosystem.

## Pyglmnet is unit-tested and documented with examples

Pyglmnet has already been used in published work
[@bertran2018active; @rybakken2019decoding; @hofling2019probing; @benjamin2017modern]. It contains unit tests and includes [documentation] in the form of tutorials, docstrings and examples that are run through continuous integration.
Pyglmnet has already been used in published work [@bertran2018active; @rybakken2019decoding; @hofling2019probing; @benjamin2017modern]. It contains unit tests and includes [documentation] in the form of tutorials, docstrings and examples that are run through continuous integration.

# Example Usage

Here we apply pyglmnet to a real-world example. The Community and Crime dataset, one of 400+ datasets curated by the UC Irvine Machine Learning Repository [@Dua:2019] provides a highly curated set of 128 demographic attributes of US counties that may be used to predict incidence of violent crime. The target variable (violent crime per capita) is normalized to lie in $[0, 1]$. Below, we demonstrate the usage of a binomial-distributed GLM with elastic net regularization.
Here, we apply pyglmnet to predict incidence of violent crime from the Community and Crime dataset, one of 400+ datasets curated by the UC Irvine Machine Learning Repository [@Dua:2019] which provides a highly curated set of 128 demographic attributes of US counties. The target variable (violent crime per capita) is normalized to the range of $[0, 1]$. Below, we demonstrate the usage of a pyglmnet's binomial-distributed GLM with elastic net regularization.

```py
from sklearn.model_selection import train_test_split
Expand All @@ -182,14 +182,14 @@ glm.fit(Xtrain, ytrain)
yhat = glm.predict_proba(Xtest)
```

As illustrated above, pyglmnet's API is designed to be compatible with scikit-learn [@sklearn_api]. Thus, it is possible to use standard idioms such as:
As illustrated above, pyglmnet's API is designed to be compatible with ``scikit-learn`` [@sklearn_api]. Thus, it is possible to use standard idioms such as:

```py
glm.fit(X, y)
glm.predict(X)
```

Further, as a result of this compatibility, ``scikit-learn`` tools for building pipelines, cross-validation and grid search can be employed by pyglmnet users.
Owing to this compatibility, tools from the ``scikit-learn`` ecosystem for building pipelines, applying cross-validation, and performing grid search over hyperparameters can also be employed with pyglmnet's estimators.

# Acknowledgements

Expand Down