Skip to content

Commit

Permalink
FOWARDPORT: Prettier Markdown (and thus better Documenter Support) (#478
Browse files Browse the repository at this point in the history
)

* Prettier Markdown (and thus better Documenter Support) (#476)

* REPL-like output for our objects

* describe show functionality

* model and var corr now as pretty markdown

* use Markdown stdlib for LRT

* nicer optsum markdown

* remove dead comments

* fix test

* fix doc build

* more use of Markdown stdlib

* append! version compat

* example of raw markdown

* The Zen of Alex

* woops

* fix exponent alignment

* update documenter ci

* remove unused option

* one more attempt at correct rendering

* maybe it's 1.6?

* The Zen Continues

Co-authored-by: Alex Arslan <[email protected]>

* slightly more complicated example (and bugfix!)

* patch bump

Co-authored-by: Alex Arslan <[email protected]>
(cherry picked from commit c7ff101)

* small edits

* Update al;lowable versions of dependencies

* Remove unnecessary using statement

* contrast coding

* statsmodels for statsmodels example

Co-authored-by: Douglas Bates <[email protected]>
  • Loading branch information
palday and dmbates authored Mar 2, 2021
1 parent 26a84fc commit 8d92489
Show file tree
Hide file tree
Showing 14 changed files with 293 additions and 199 deletions.
5 changes: 4 additions & 1 deletion .github/workflows/documenter.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,12 @@ on:
jobs:
Documenter:
name: Documentation
runs-on: ubuntu-latest
runs-on: ubuntu-20.04
steps:
- uses: actions/checkout@v2
- uses: julia-actions/setup-julia@v1
with:
version: 1.6-nightly
- uses: julia-actions/julia-buildpkg@latest
- uses: julia-actions/julia-docdeploy@latest
env:
Expand Down
3 changes: 2 additions & 1 deletion Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ DataAPI = "9a962f9c-6df0-11e9-0e5d-c546b8b5ee8a"
Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
GLM = "38e38edf-8417-5370-95a0-9cbb8c7f171a"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
Markdown = "d6f4376e-aef5-505a-96c1-9c027394607a"
NLopt = "76087f3c-5699-56af-9a33-bf431cd00edd"
Pkg = "44cfe95a-1eb2-52ea-b672-e2afdf69b78f"
PooledArrays = "2dfb63ee-cc39-5dd5-95bd-886bf059d720"
Expand All @@ -28,7 +29,7 @@ DataAPI = "1"
Distributions = "0.21, 0.22, 0.23, 0.24"
GLM = "1"
NLopt = "0.5, 0.6"
PooledArrays = "0.5, 1.0"
PooledArrays = "0.5, 1"
ProgressMeter = "1"
StaticArrays = "0.11, 0.12, 1"
StatsBase = "0.31, 0.32, 0.33"
Expand Down
6 changes: 4 additions & 2 deletions docs/Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,19 @@
BenchmarkTools = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
DataFramesMeta = "1313f7d8-7da2-5740-9ea0-a2ca25f37964"
DisplayAs = "0b91fe84-8a4c-11e9-3e1d-67c38462b6d6"
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
FreqTables = "da1fdf0e-e0ff-5433-a45f-9bb5ff651cb1"
Gadfly = "c91e804a-d5a3-530f-b6f0-dfbca275c004"
MixedModels = "ff71e718-51f3-5ec2-a782-8ffcbfa3c316"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
StatsModels = "3eaba693-59b7-5ba5-a881-562e759f1c8d"

[compat]
BenchmarkTools = "0.5"
DataFrames = "0.21"
Documenter = "0.25"
DataFrames = "0.22"
Documenter = "0.26"
FreqTables = "0.4"
Gadfly = "1"
StatsBase = "0.33"
1 change: 1 addition & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ makedocs(
"GaussHermite.md",
"bootstrap.md",
"rankdeficiency.md",
"mime.md",
# "SimpleLMM.md",
# "MultipleTerms.md",
# "SingularCovariance.md",
Expand Down
4 changes: 4 additions & 0 deletions docs/src/GaussHermite.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,9 @@ More formally, a `k`th order rule is exact when `f` is a `k-1` order polynomial.

In the [*Golub-Welsch algorithm*](https://en.wikipedia.org/wiki/Gaussian_quadrature#The_Golub-Welsch_algorithm) the abscissae for a particular Gaussian quadrature rule are determined as the eigenvalues of a symmetric tri-diagonal matrix and the weights are derived from the squares of the first row of the matrix of eigenvectors.
For a `k`th order normalized Gauss-Hermite rule the tridiagonal matrix has zeros on the diagonal and the square roots of `1:k-1` on the super- and sub-diagonal, e.g.
```@setup Main
using DisplayAs
```
```@example Main
using DataFrames, LinearAlgebra, Gadfly
sym3 = SymTridiagonal(zeros(3), sqrt.(1:2))
Expand Down Expand Up @@ -112,6 +115,7 @@ A model with fixed-effects for age, age squared, number of live children and urb
```@example Main
const form1 = @formula use ~ 1 + age + abs2(age) + livch + urban + (1|dist);
m1 = fit(MixedModel, form1, contra, Bernoulli(), fast=true)
DisplayAs.Text(ans) # hide
```

For a model such as `m1`, which has a single, scalar random-effects term, the unscaled conditional density of the spherical random effects variable, $\mathcal{U}$,
Expand Down
6 changes: 6 additions & 0 deletions docs/src/bootstrap.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,10 @@ parameter, `θ`, that defines the variance-covariance matrices of the random eff
For example, a simple linear mixed-effects model for the `Dyestuff` data in the [`lme4`](http://github.com/lme4/lme4)
package for [`R`](https://www.r-project.org) is fit by

```@setup Main
using DisplayAs
```

```@example Main
using DataFrames
using DataFramesMeta # dplyr-like operations
Expand All @@ -35,6 +39,7 @@ using Random
```@example Main
dyestuff = MixedModels.dataset(:dyestuff)
m1 = fit(MixedModel, @formula(yield ~ 1 + (1 | batch)), dyestuff)
DisplayAs.Text(ans) # hide
```

To bootstrap the model parameters, first initialize a random number generator then create a bootstrap sample
Expand Down Expand Up @@ -101,6 +106,7 @@ m2 = fit(
@formula(reaction ~ 1+days+(1+days|subj)),
sleepstudy,
)
DisplayAs.Text(ans) # hide
```
```@example Main
samp2 = parametricbootstrap(rng, 10_000, m2, use_threads=true);
Expand Down
34 changes: 28 additions & 6 deletions docs/src/constructors.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,11 @@ MixedModels.dataset

```@setup Main
using Test
using DisplayAs
```

```@example Main
using DataFrames, MixedModels
using StatsBase: describe
using DataFrames, MixedModels, StatsModels
dyestuff = DataFrame(MixedModels.dataset(:dyestuff))
describe(dyestuff)
```
Expand All @@ -37,7 +37,7 @@ MixedModels.jl builds on the the *Julia* formula language provided by [StatsMode
The second way is to combine `Term`s with operators like `+`, `&`, `~`, and others at "run time". This is especially useful if you wish to create a formula from a list a variable names. For instance, the following are equivalent:

```@example Main
@formula(y ~ 1 + a + b + a & b) == term(:y) ~ term(1) + term(:a) + term(:b) + term(:a) & term(:b)
@formula(y ~ 1 + a + b + a & b) == (term(:y) ~ term(1) + term(:a) + term(:b) + term(:a) & term(:b))
```

MixedModels.jl provides additional formula syntax for representing *random-effects terms*. Most importantly, `|` separates random effects and their grouping factors (as in the formula extension used by the *R* package [`lme4`](https://cran.r-project.org/web/packages/lme4/index.html). Much like with the base formula language, `|` can be used within the `@formula` macro and to construct a formula programmatically:
Expand All @@ -60,13 +60,14 @@ A basic model with simple, scalar random effects for the levels of `batch` (the
```@example Main
fm = @formula(yield ~ 1 + (1|batch))
fm1 = fit(MixedModel, fm, dyestuff)
DisplayAs.Text(ans) # hide
```

```@setup Main
@testset "fm1" begin
@test deviance(fm1)327.32706
@test varest(fm1)2451.2500
@test VarCorr(fm1).σρ.batch.σ[1]37.260345
@test isapprox(deviance(fm1), 327.32706, atol=1.e-3)
@test isapprox(varest(fm1), 2451.250, atol=1.e-3)
@test isapprox(VarCorr(fm1).σρ.batch.σ[1], 37.260, atol=1.e-3)
end
```

Expand All @@ -81,6 +82,7 @@ dyestuff2 = MixedModels.dataset(:dyestuff2)
By default, the model is fit by maximum likelihood. To use the `REML` criterion instead, add the optional named argument `REML=true` to the call to `fit`
```@example Main
fm1reml = fit(MixedModel, fm, dyestuff, REML=true)
DisplayAs.Text(ans) # hide
```
```@setup Main
@testset "fm1reml" begin
Expand Down Expand Up @@ -120,6 +122,7 @@ A model with random intercepts and random slopes for each subject, allowing for
```@example Main
sleepstudy = MixedModels.dataset(:sleepstudy)
fm2 = fit(MixedModel, @formula(reaction ~ 1 + days + (1 + days|subj)), sleepstudy)
DisplayAs.Text(ans) # hide
```
```@setup Main
@testset "fm2" begin
Expand All @@ -137,6 +140,7 @@ As every sample is used on every plate these two factors are *crossed*.
```@example Main
penicillin = MixedModels.dataset(:penicillin)
fm3 = fit(MixedModel, @formula(diameter ~ 1 + (1|plate) + (1|sample)), penicillin)
DisplayAs.Text(ans) # hide
```
```@setup Main
@testset "fm3" begin
Expand All @@ -155,12 +159,14 @@ describe(pastes)
This can be expressed using the solidus (the "`/`" character) to separate grouping factors, read "`cask` nested within `batch`":
```@example Main
fm4a = fit(MixedModel, @formula(strength ~ 1 + (1|batch/cask)), pastes)
DisplayAs.Text(ans) # hide
```

If the levels of the inner grouping factor are unique across the levels of the outer grouping factor, then this nesting does not need to expressed explicitly in the model syntax. For example, defining `sample` to be the combination of `batch` and `cask`, yields a naming scheme where the nesting is apparent from the data even if not expressed in the formula. (That is, each level of `sample` occurs in conjunction with only one level of `batch`.) As such, this model is equivalent to the previous one.
```@example Main
pastes.sample = (string.(pastes.cask, "&", pastes.batch))
fm4b = fit(MixedModel, @formula(strength ~ 1 + (1|sample) + (1|batch)), pastes)
DisplayAs.Text(ans) # hide
```
```@setup Main
@testset "implicit and explicit nesting" begin
Expand All @@ -176,6 +182,7 @@ Additional covariates include the academic department, `dept`, in which the cour
```@example Main
insteval = MixedModels.dataset(:insteval)
fm5 = fit(MixedModel, @formula(y ~ 1 + service * dept + (1|s) + (1|d)), insteval)
DisplayAs.Text(ans) # hide
```
```@setup Main
@testset "fm5" begin
Expand All @@ -196,24 +203,28 @@ This often arises when there are many random effects you want to estimate (as is
The special syntax `zerocorr` can be applied to individual random effects terms inside the `@formula`:
```@example Main
fm2zerocorr_fm = fit(MixedModel, @formula(reaction ~ 1 + days + zerocorr(1 + days|subj)), sleepstudy)
DisplayAs.Text(ans) # hide
```

Alternatively, correlations between parameters can be removed by including them as separate random effects terms:
```@example Main
fit(MixedModel, @formula(reaction ~ 1 + days + (1|subj) + (days|subj)), sleepstudy)
DisplayAs.Text(ans) # hide
```

Finally, for predictors that are categorical, MixedModels.jl will estimate correlations between each level.
Notice the large number of correlation parameters if we treat `days` as a categorical variable by giving it contrasts:
```@example Main
fit(MixedModel, @formula(reaction ~ 1 + days + (1 + days|subj)), sleepstudy,
contrasts = Dict(:days => DummyCoding()))
DisplayAs.Text(ans) # hide
```

Separating the `1` and `days` random effects into separate terms removes the correlations between the intercept and the levels of `days`, but not between the levels themselves:
```@example Main
fit(MixedModel, @formula(reaction ~ 1 + days + (1|subj) + (days|subj)), sleepstudy,
contrasts = Dict(:days => DummyCoding()))
DisplayAs.Text(ans) # hide
```
(Notice that the variance component for `days: 1` is estimated as zero, so the correlations for this component are undefined and expressed as `NaN`, not a number.)

Expand All @@ -224,6 +235,7 @@ fulldummy
```@example Main
fit(MixedModel, @formula(reaction ~ 1 + days + (1 + fulldummy(days)|subj)), sleepstudy,
contrasts = Dict(:days => DummyCoding()))
DisplayAs.Text(ans) # hide
```
This fit produces a better fit as measured by the objective (negative twice the log-likelihood is 1610.8) but at the expense of adding many more parameters to the model.
As a result, model comparison criteria such, as `AIC` and `BIC`, are inflated.
Expand All @@ -232,14 +244,17 @@ But using `zerocorr` on the individual terms does remove the correlations betwee
```@example Main
fit(MixedModel, @formula(reaction ~ 1 + days + zerocorr(1 + days|subj)), sleepstudy,
contrasts = Dict(:days => DummyCoding()))
DisplayAs.Text(ans) # hide
```
```@example Main
fit(MixedModel, @formula(reaction ~ 1 + days + (1|subj) + zerocorr(days|subj)), sleepstudy,
contrasts = Dict(:days => DummyCoding()))
DisplayAs.Text(ans) # hide
```
```@example Main
fit(MixedModel, @formula(reaction ~ 1 + days + zerocorr(1 + fulldummy(days)|subj)), sleepstudy,
contrasts = Dict(:days => DummyCoding()))
DisplayAs.Text(ans) # hide
```

## Fitting generalized linear mixed models
Expand All @@ -254,6 +269,7 @@ the distribution family for the response, and possibly the link function, must b
verbagg = MixedModels.dataset(:verbagg)
verbaggform = @formula(r2 ~ 1 + anger + gender + btype + situ + mode + (1|subj) + (1|item));
gm1 = fit(MixedModel, verbaggform, verbagg, Bernoulli())
DisplayAs.Text(ans) # hide
```

The canonical link, which is `LogitLink` for the `Bernoulli` distribution, is used if no explicit link is specified.
Expand Down Expand Up @@ -422,6 +438,7 @@ coeftable
```
```@example Main
coeftable(fm2)
DisplayAs.Text(ans) # hide
```

## Covariance parameter estimates
Expand All @@ -432,9 +449,11 @@ VarCorr
```
```@example Main
VarCorr(fm2)
DisplayAs.Text(ans) # hide
```
```@example Main
VarCorr(gm1)
DisplayAs.Text(ans) # hide
```

Individual components are returned by other extractors
Expand Down Expand Up @@ -533,6 +552,7 @@ sum(leverage(fm2))
When a model converges to a singular covariance, such as
```@example Main
fm3 = fit(MixedModel, @formula(yield ~ 1+(1|batch)), MixedModels.dataset(:dyestuff2))
DisplayAs.Text(ans) # hide
```
the effective degrees of freedom is the lower bound.
```@example Main
Expand All @@ -543,6 +563,7 @@ Models for which the estimates of the variances of the random effects are large
```@example Main
fm4 = fit(MixedModel, @formula(diameter ~ 1+(1|plate)+(1|sample)),
MixedModels.dataset(:penicillin))
DisplayAs.Text(ans) # hide
```
```@example Main
sum(leverage(fm4))
Expand All @@ -552,6 +573,7 @@ Also, a model fit by the REML criterion generally has larger estimates of the va
```@example Main
fm4r = fit(MixedModel, @formula(diameter ~ 1+(1|plate)+(1|sample)),
MixedModels.dataset(:penicillin), REML=true)
DisplayAs.Text(ans) # hide
```
```@example Main
sum(leverage(fm4r))
Expand Down
1 change: 1 addition & 0 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ Pages = [
"GaussHermite.md",
"bootstrap.md",
"rankdeficiency.md",
"mime.md",
]
Depth = 2
```
74 changes: 74 additions & 0 deletions docs/src/mime.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# Alternative display and output formats

In the documentation, we have presented the output from MixedModels.jl in the same format you will see when working in the REPL.
You may have noticed, however, that output from other packages received pretty printing.
For example, DataFrames are converted into nice HTML tables.
In MixedModels, we recently (v3.2.0) introduced limited support for such pretty printing.
(For more details on how the print and display system in Julia works, check out [this NextJournal post](https://nextjournal.com/sdanisch/julias-display-system).)

In particular, we have defined Markdown output, i.e. `show` methods, for our types, which can be easily translated into HTML, LaTeX or even a MS Word Document using tools such as [pandoc](https://pandoc.org/).
Packages like `IJulia` and `Documenter` can often detect the presence of these display options and use them automatically.


```@example Main
using MixedModels
form = @formula(rt_trunc ~ 1 + spkr * prec * load +
(1 + load | item) +
(1 + spkr + prec + load | subj))
contr = Dict(:spkr => EffectsCoding(),
:prec => EffectsCoding(),
:load => EffectsCoding(),
:item => Grouping(),
:subj => Grouping())
kbm = fit(MixedModel, form, MixedModels.dataset(:kb07); contrasts=contr)
```

Note that the display here is more succinct than the standard REPL display:

```@example Main
using DisplayAs
kbm |> DisplayAs.Text
```

This brevity is intentional: we wanted these types to work well with traditional academic publishing constraints on tables.
The summary for a model fit presented in the REPL does not mesh well with being treated as a single table (with columns shared between the random and fixed effects).
In our experience, this leads to difficulties in typesetting the resulting tables.
We nonetheless encourage users to report fit statistics such as the log likelihood or AIC as part of the caption of their table.
If the correlation parameters in the random effects are of interest, then [`VarCorr`](@ref) can also be pretty printed:

```@example Main
VarCorr(kbm)
```

Similarly for [`BlockDescription`](@ref), `OptSummary` and `MixedModels.likelihoodratiotest`:

```@example Main
BlockDescription(kbm)
```

```@example Main
kbm.optsum
```

```@example Main
m0 = fit(MixedModel, @formula(reaction ~ 1 + (1|subj)), MixedModels.dataset(:sleepstudy))
m1 = fit(MixedModel, @formula(reaction ~ 1 + days + (1+days|subj)), MixedModels.dataset(:sleepstudy))
MixedModels.likelihoodratiotest(m0,m1)
```

To explicitly invoke this behavior, we must specify the right `show` method:
```julia
show(MIME("text/markdown"), m1)
```
```@example Main
println(sprint(show, MIME("text/markdown"), kbm)) # hide
```
(The raw and not rendered output is intentionally shown here.)

In the future, we may directly support HTML and LaTeX as MIME types.

This output can also be written directly to file:

```julia
show(open("model.md", "w"), MIME("text/markdown"), kbm)
```
Loading

0 comments on commit 8d92489

Please sign in to comment.