-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix IC's email, add again the model diagnostics appendix, and add not…
…es in formula
- Loading branch information
1 parent
816d424
commit d0a26d4
Showing
12 changed files
with
261 additions
and
39 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,13 +1,12 @@ | ||
FROM rocker/rstudio:4.2.2 | ||
|
||
|
||
LABEL "about" = "A Docker container for the cognate-begininings study" \ | ||
"author" = "Gonzalo Garcia-Castro <[email protected]>"\ | ||
"github" = "https://github.com/gongcastro/cognate-beginnings" \ | ||
"osf" = "https://osf.io/hy984/" \ | ||
"source"="https://github.com/gongcastro/cognate-beginnings/blob/main/Dockerfile" | ||
|
||
# add C++ dependencies | ||
# add system | ||
RUN apt-get update && \ | ||
apt-get install -y libxml2-dev \ | ||
libglpk-dev \ | ||
|
@@ -34,9 +33,7 @@ RUN apt-get update && \ | |
libicu-dev | ||
|
||
# copy the whole directory to /rstudio (working directory in Posit Cloud) | ||
#RUN mkdir /cognate-beginnings/ && chown -c rstudio /cognate-beginnings/ | ||
COPY . /home/rstudio/ | ||
#RUN cd /cognate-beginnings/ | ||
WORKDIR /home/rstudio/ | ||
|
||
# install Quarto | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,153 @@ | ||
We used Stan [@carpenter2017stan] as the probabilistic language behind the estimation of our Bayesian models in this study, with `brms` as its R interface [@burkner2017brms]. This language implements the Markov Chain Monte-Carlo (MCMC) algorithm to explore the posterior distribution of the model. Broadly, this algorithm is used to iteratively sample the joint sampling space of the parameters to be estimated in the model, and compute, for each value sampled, its likelihood under some probability distribution previously defined. We run `r dim(model_fit$fit)[2]` MCMC chains, each `r dim(model_fit$fit)[1]` iterations long each. The correct performance of this algorithm is critical to the quality of the statistical evidence to which the outcomes of the model lead. | ||
|
||
One way to diagnose the behaviour of MCMC is to inspect whether the different MCMC chains (if more than one) have converged to a similar region of the posterior. The Gelman-Rubin diagnostic [$\hat{R}$ or R-hat @gelman1992inference] provides a measure of chain convergence by comparing the variance within each chain *versus* the variance between each chain. Both are expected to be identical when chains have perfectly converged, so that $\hat{R} = 1$. Values lower than 1.01 are recommended, while values higher than 1.05 indicate that chains might have trouble converging and therefore the estimated parameters must be taken with caution. @fig-rhats-neffs (A) shows the distribution of $\hat{R}$ values for the coefficients of the fixed effect of our models, which we used for statistical inference. Most values are lower than 1.01, and never higher than 1.05, which provides evidence of successful MCMC convergence. | ||
|
||
Another diagnostic of good MCMC converge is the ratio of effective sample size to total sample size ($N_{eff}/N$), which indicates the proportion of samples in the chain that resulted from a non-divergent transition. Values closer to 1 are ideal, as they indicate that all posterior samples from the MCMC were used to estimate the posterior distribution of the parameter. Values larger than 0.1 are recommended. @fig-rhats-neffs (B) shows the distribution of the effective sample sizes of the coefficients of the fixed effects in our models. Most values are larger than 0.1, although model 0 ($\mathcal{M}_0$) accumulates most effective sample sizes close to 0.1. | ||
|
||
```{r fig-rhats-neffs} | ||
#| label: fig-rhats-neffs | ||
#| fig-cap: "MCMC convergence diagnostic of all parameters in the model. Each dot represents the score of one parameter. (A) Distribution of the Gelman-Rubin (R-hat) scores. (B) Distribution of the ratio of effective sample size." | ||
#| echo: false | ||
#| message: false | ||
#| warning: false | ||
#| fig-height: 3 | ||
#| fig-width: 6 | ||
#| out-width: 100% | ||
model_convergence |> | ||
ggplot(aes(.rhat)) + | ||
geom_dots(colour = "black") + | ||
geom_vline(xintercept = 1.01, linetype = "dashed") + | ||
labs( | ||
x = "R-hat", | ||
y = "MCMC samples" | ||
) + | ||
model_convergence |> | ||
ggplot(aes(.neff)) + | ||
geom_dots(colour = "black") + | ||
geom_vline(xintercept = 1.01, linetype = "dashed") + | ||
labs( | ||
x = "Effective sample size ratio", | ||
y = "MCMC samples" | ||
) + | ||
plot_layout(ncol = 1) & | ||
plot_annotation(tag_levels = "A") & | ||
theme( | ||
legend.position = "none", | ||
panel.grid = element_blank() | ||
) | ||
``` | ||
|
||
{{< pagebreak >}} | ||
|
||
### Posterior draws and bi-variate scatterplots {.unnumbered} | ||
|
||
```{r} | ||
#| label: fig-model-pairs | ||
#| fig-cap: "Marginal distribution and bi-variate scatterplot of posterior samples for the fixed regression coefficients in Model 3." | ||
#| fig-height: 8 | ||
#| fig-width: 10 | ||
#| out-width: 100% | ||
#| out-height: 100% | ||
#| echo: false | ||
#| warning: false | ||
#| message: false | ||
titles <- c( | ||
"Comprehension", | ||
"Production", | ||
"Age", | ||
"Length", | ||
"Exposure", | ||
"Cognateness", | ||
"Age × Exposure", | ||
"Age × Cognateness", | ||
"Exposure × Cognateness", | ||
"Age × Exposure × Cognateness" | ||
) | ||
posterior <- as_draws_df(model_fit) |> | ||
select(`b_Intercept[1]`:`b_age_std:exposure_std:lv_std`) | ||
my_diag <- function(data, mapping, ...) { | ||
ggplot(data = data, mapping = mapping) + | ||
stat_slab( | ||
colour = "white", | ||
fill = "grey15" | ||
) | ||
} | ||
my_upper <- function(data, mapping, ...) { | ||
ggplot(data = data, mapping = mapping) + | ||
stat_density_2d(aes(fill = ..density..), | ||
geom = "raster", | ||
contour = FALSE | ||
) + | ||
scale_fill_distiller( | ||
palette = "Greys", | ||
direction = 1 | ||
) | ||
} | ||
posterior |> | ||
GGally::ggpairs( | ||
upper = list(continuous = my_upper), | ||
diag = list(continuous = my_diag), | ||
lower = NULL, | ||
columnLabels = c( | ||
"Intercept\n(Comprehension)", | ||
"Intercept\n(Production)", | ||
"Age", "Length", "Exposure", | ||
"Levenshtein", "Age × Exposure", | ||
"Age × Levenshtein", "Exposure × Leveshtein", | ||
"Age × Exposure ×\nLevenshtein" | ||
), | ||
axisLabels = "none" | ||
) + | ||
theme( | ||
strip.text = element_text(size = 5), | ||
panel.grid = element_blank(), | ||
panel.border = element_blank() | ||
) | ||
``` | ||
|
||
{{< pagebreak >}} | ||
|
||
### Posterior-predictive checks {.unnumbered} | ||
|
||
```{r tbl-ppc} | ||
#| label: fig-ppc | ||
#| fig-cap: "Model posterior predictive checks (PPC). Bars indicate the observed proportion of responses to each category (No, Understands, and Understands and Says). Blue dots and error bars represent the mean proportion of responses simulated from the posterior for each category, and its 95\\% interval." | ||
#| eval: true | ||
#| echo: false | ||
#| warning: false | ||
#| message: false | ||
#| fig-width: 6 | ||
#| fig-height: 3 | ||
#| out-width: 80% | ||
#| out-height: 80% | ||
y_int <- as.integer(as.integer(model_fit$data$response)) | ||
bayesplot::color_scheme_set(c(rep("grey60", 2), rep("black", 4))) | ||
bayesplot::ppc_bars( | ||
y_int, | ||
model_ppcs, | ||
prob = 0.95, | ||
linewidth = 1, | ||
size = 1.25, | ||
fatten = 2, | ||
freq = FALSE | ||
) + | ||
scale_x_continuous( | ||
breaks = 1:3, | ||
labels = c( | ||
"No", | ||
"Understands", | ||
"Understands and Says" | ||
) | ||
) + | ||
theme( | ||
legend.position = "none", | ||
axis.text = element_text(size = 11), | ||
panel.grid = element_blank() | ||
) | ||
``` |
Large diffs are not rendered by default.
Oops, something went wrong.
Binary file modified
BIN
+0 Bytes
(100%)
manuscript/_freeze/manuscript/figure-pdf/fig-marginal-1.pdf
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -43,7 +43,7 @@ author: | |
city: Lisbon, 1600-214 | ||
- name: Ignacio Castillejo | ||
orcid: 0000-0001-7445-0416 | ||
email: [email protected] | ||
email: [email protected] | ||
affiliations: | ||
- id: uam | ||
name: Universidad Autónoma de Madrid | ||
|
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
Oops, something went wrong.