-
Notifications
You must be signed in to change notification settings - Fork 9
/
Copy pathREADME.Rmd
99 lines (71 loc) · 3.66 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
```
# twidlr: consistent data.frame and formula API for models <img src="man/figures/logo.png" align="right" />
## Overview
twidlr is an R package that exposes a consistent API for model functions and their corresponding predict methods such that they are specified as:
```{r, eval = F}
fit <- model(data, formula, ...)
predict(fit, data, ...)
```
Where "data" is a **required** data.frame (or able to be coerced to one) and "formula" is a formula (or string able to be coerced to one) that describes the model to be fitted.
twidlr gets its name from the "twiddle" used in R formulas.
## Installation
twidlr is available to install from github by running:
```{r, eval = F}
# install.packages("devtools")
devtools::install_github("drsimonj/twidlr")
```
## Usage
`library(twidlr)` exposes model functions that you're already familiar with, but such that they accept a data.frame first, formula second, and then additional arguments. A robust method to `predict` data is also exposed.
For example, a typical linear model would be `lm(hp ~ mpg * wt, mtcars, ...)`. Once `twidlr` is loaded, the same model would be run via `lm(mtcars, hp ~ mpg * wt, ...)`.
## Motivation
Modelling in R is messy! Some models take formulas and data frames while others require matrices and vectors. The same can be said of corresponding `predict()` methods, which can also be impure, returning unexpected or inconsistent results.
twidlr seeks to overcome these problems be providing:
- **Consistent API** for model functions and their corresponding `predict` methods (helping to improve the generality of tidy modelling packages like [piplearner](https://github.com/drsimonj/pipelearner))
- **Pure and available predictions** by way of `predict` being made available for all methods (including unsupervised algorithms like kmeans) and making "data" a required argument
- **[Tidyverse](http://tidyverse.org/) philosophy** by working with data frames and being pipeable such as `mtcars %>% lm(hp ~ wt)`
- **Leverage formula operators** where they may be valid but not originally available. For example, to specify select variables or include additional terms like interactions and dummy-coded variables with syntax such as `glmnet(iris, Sepal.Width ~ Petal.Width * Petal.Length + Species)`. Formulas created as strings can always be used too!
## twidlr models
Model functions exposed by twidlr:
```{r, echo = F}
x <- data.frame(rbind(
c(Package = "stats", Function = "lm"),
## Add new model functions here as c("Package", "Function") ------ >
c("xgboost", "xgboost"),
c("glmnet", "glmnet"),
c("stats", "glm"),
c("rpart", "rpart"),
c("randomForest", "randomForest"),
c("lme4", "lmer"),
c("lme4", "glmer"),
c("quantreg","rq"),
c("quantreg","nlrq"),
c("quantreg","rqss"),
c("quantreg","crq"),
c("stats", "kmeans"),
c("stats", "t.test (now 'ttest')"),
c("stats", "prcomp"),
c("stats", "aov"),
c("glmnet", "cv.glmnet"),
c("stats", "factanal"),
c("e1071", "svm"),
c("e1071", "naiveBayes"),
c("gamlss","gamlss")
## < ---------------------------------------------------------------
))
x <- x[order(x$Package, x$Function), ]
x <- tapply(x$Function, x$Package, paste, collapse = ", ")
x <- data.frame(Package = names(x), Functions = x, row.names = NULL)
knitr::kable(x[order(x$Package, x$Function), ], row.names = FALSE)
```
## Contributing
For conventions and best-practices when contributing to twidlr, please see [CONTRIBUTING.md](https://github.com/drsimonj/twidlr/blob/master/CONTRIBUTING.md)