forked from mcfrank/sop
-
Notifications
You must be signed in to change notification settings - Fork 0
/
sop_model.Rmd
291 lines (217 loc) · 14.2 KB
/
sop_model.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
---
title: "A performance model for early cognitive development"
author: "Michael C. Frank"
output:
html_document:
toc: true
number_sections: true
fig_captions: true
bibliography: sop.bib
---
```{r, include=FALSE, echo=FALSE, warning=FALSE}
# Code preliminaries.
library(dplyr)
library(ggplot2)
library(langcog)
library(kmr)
theme_set(theme_bw())
knitr::opts_chunk$set(echo=FALSE, cache=TRUE, warning=FALSE)
knitr::knit_hooks$set(htmlcap = function(before, options, envir) {
if(!before) {
paste('<p class="caption" align="center">',options$htmlcap,"</p>",sep="")
}
})
```
# Introduction
Human beings begin their lives as helpless infants yet quickly become children who are able to perceive, act, and communicate. External, physical differences account for only a small amount of this fundamental change: Without a mature mind, a mature body cannot navigate the world. Thus, the fundamental question of developmental psychology is how external behavioral differences come about via internal processes of developmental change.
These processes are the focus of the current paper. What causes developmental changes over timescales of weeks, months, or years? While such timescales can be observed longitudinally, such studies are almost always correlational and hence confounded with other factors. In contrast, controlled, laboratory interventions that allow strong causal inference are typically limited to the timescale of minutes, hours, or at best a matter of weeks [@smith2002].
...
My goal in this paper is to describe a first stab at a simple performance model for early cognitive development, one that assumes that children's observed behavior is a probabilistic reflection of their knowledge and intentions to act. I'll first describe a simple model of this type. I'll then show how this kind of model might predict striking developmental changes in behavior in the absence of any change in representation. I'll end by trying to apply this model to the issue of early word learning and whether any representational change underlies the salient emergence of children's language comprehension abilities after the first birthday.
# Model
The basic assumptions of our performance model are similar to many other process models, namely that every cognitive operation has a processing time and a probability of failure [@anderson1996].
As is immediately clear, complex actions are describable at many different grain sizes. I see this as a strength rather than a weaknesses of our framework, which can be applied to units at any grain size. @smith2013 describe a similar analysis of reading times for words where they make clear that
We remain agnostic about the precise break
Our model assumes that nearly every mental process---especially those involved in learning---requires multiple operations. These operations are chained in a sequence. Some examples of such sequences:
+ _Word learning_: Follow an agent's gaze to a target, then associate a word that the agent speaks with the identity of the target.
+ _Imitation_: Store an agent's action, then carry out the same action.
+ _Object play_: Search for partially occluded target object, move aside occluder, grasp target object.
## Details
Consider a set of interacting mental processes. We assume that each of these has a Bernoulli success probability $s_p$, implying that they each fail with probability $1-s_p$, they fail. For simplicity, in this analysis, we assume chains where all of the operations have the same probability $s$, though with the appropriate data, we could estimate the failability of individual operations. We further assume that one failure in a series of operations leads to the failure of the chain $c$. Thus, the probability of a sequence of failures is exponential such that $P_{success} = s^{n}$, where $n$ is the length of the chain.
We also associate a time to completion with each operation. We assume that these are "reaction times" and are sampled from a log-normal distribution:
$$RT(s_p) \sim exp(N(\mu,\sigma))$$
The arithmetic mean of a lognormal is $e^{\mu + \sigma^2/2}$. Unfortunately, there is not a known parametric form for the sum of multiple lognormals. A variety of analytic approximations for these sums exist [@fenton1960], but they have some limitatios. I will use a numerical simulation approach here for simplicity.
Imagine a time-sensitive operation with a temporal threshold $\theta$ such that if the operations are not completed within this threhold, then there is no possibility of learning. Via simulation, we can approximate the probability that a chain is successful within the appropriate time period for learning. For siplicity, we assume a standard lognormal with $\mu = 0$, $\sigma = 1$.
```{r, htmlcap="Figure 1. Probability p of successsfully executing a chain of operations with $n$ steps, plotted by the probability of success $s$. Facets show diferent temporal thresholds $\theta$."}
thetas <- seq(1, 6)
ns <- 1:5
ss <- seq(.3,.9,.1)
nsims <- 1000
sims <- expand.grid(n = ns,
s = ss,
theta = thetas) %>%
group_by(n, s, theta) %>%
do(data.frame(rt = rlnorm(nsims, meanlog = 0, sdlog = 1)*.$n,
success = rbinom(nsims, .$n, .$s) == .$n)) %>%
mutate(relevant = rt < theta & success) %>%
group_by(n, s, theta) %>%
summarise(p = mean(relevant)) %>%
ungroup() %>%
mutate(s = factor(s))
ggplot(sims, aes(x = n, y = p, col = s)) +
geom_line() +
facet_wrap(~theta)
# geom_hline(yintercept = .5, lty = 2)
```
The simulatins in Figure 1 show a simple result: long chains of operations are unlikely to succeed unless individual operations are very fast and very accurate. Even assuming the median time to execution for a single step is 1s ($e^{\mu}=1$), successful and timely execution of multi-step sequences is very unlikely within a reasonable time frame. For example, with a cutoff of $\theta = 6s$, chains of length 3 were only successful half the time if each separate sub-action was extremely likely to succeed ($s > .9$). At shorter values of $\theta$, virtually no chains of length 3 were successful.
Of course, to be meaningful, these simulations require estimates for the values of $\mu$, $\sigma$, $n$, $s$, and $\theta$.
## Development within the model
The two posited capacities in our model are speed and accuracy. It seems clear that, for an individual cognitive operation or set of operations, both of these should change dramatically across development. How does developmental change look for these abilities?
We first consider the development of processing speed. Pioneering work by Kail [see e.g., @kail1991] describes the developmental trajectory of reaction times for individual tasks. He notes that empirically, the slope of these reaction times follows an exponential, such that
$$Y = a + b e^{-ci}$$
Where Y is the predicted variable, $a$ is the eventual (adult) asymptote, $b$ is the multiplier for the (infant) intercept, $c$ is the rate of development, and $i$ is age.
We replot the slowing function from @kail1991:
```{r fig.width=4, fig.height=3}
a <- 1
b <- 5.16
c <- 0.21
xs <- seq(4,20,.1)
d <- data.frame(x = xs,
y = a + b * exp(-c * xs))
qplot(x, y, geom=c("line"),
data=d) +
geom_hline(aes(yintercept=1), lty=2) +
xlab("Age (years)") +
ylab("Slope (sec)") +
ylim(c(0,6)) +
xlim(c(4,20)) +
ggtitle("Trajectory of RT multipliers (Kail, 1991)")
```
Consider when this model is projected into infancy:
```{r fig.width=4, fig.height=3}
xs <- seq(0,20,.1)
d <- data.frame(x = xs,
y = a + b * exp(-c * xs))
qplot(x, y, geom=c("line"),
data=d) +
geom_hline(aes(yintercept=1), lty=2) +
xlab("Age (years)") +
ylab("Slope (sec)") +
ylim(c(0,6)) +
xlim(c(0,20)) +
ggtitle("Trajectory of RT multipliers (infancy)")
```
This is a model of RT multipliers. By hypothesis, these multipliers should apply to individual operations or to chains of operations, since times are additive; if the multiplier is constant then it can be factored out.
How should we consider the probability of success on a single operation changing across time? For now let's consider this to be a logistic function, constrained to be 0 at infancy.
```{r fig.width=4, fig.height=3}
inv.logit <- boot::inv.logit # get inv.logit from boot library
a <- -2
b <- .3
xs <- seq(0,20,.1)
d <- data.frame(x = xs,
y = inv.logit(a + b * xs))
qplot(x, y, geom=c("line"),
data=d) +
xlab("Age (years)") +
ylab("Accuracy") +
ylim(c(0,1)) +
xlim(c(0,20)) +
ggtitle("Accuracy function")
```
So now let's generalize these functions so that we can look at their joint effects on performance.
```{r}
inv.logit <- boot::inv.logit # get inv.logit from boot library
accuracy <- function(x, a = -2, b = .3) {
inv.logit(a + b * x)
}
rt.multiplier <- function (x, a = 1, b = 5.16, c = .21) {
a + b * exp(-c * x)
}
thetas <- seq(1, 5, .5)
ns <- 1:5
nsims <- 1000
ages <- seq(0, 21, .5)
sims <- expand.grid(n = ns,
age = ages,
theta = thetas) %>%
group_by(n, age, theta) %>%
do(data.frame(rt = rlnorm(nsims, meanlog = rt.multiplier(.$age - 1))*.$n,
success = rbinom(nsims, .$n, accuracy(.$age)) == .$n)) %>%
mutate(relevant = rt < theta & success) %>%
group_by(n, age, theta) %>%
summarise(p = mean(relevant))
ggplot(sims, aes(x = age, y = p, col = factor(n))) +
geom_line() +
facet_wrap(~theta) +
geom_hline(yintercept = .5, lty = 2) +
ggtitle("Probability of successsfully executing a chain within a temporal threshold")
```
So this is nice, we get a joint yoking of speed and accuracy to age, with the ability to make generalization about the probability of success in chains of length $n$ within a temporal threshold.
To review: we now have a theoretical model that can make predictions about developmental changes in the speed and accuracy of executing chains of cognitive operations. The trouble is that to constrain these predictions, we require quite a bit of external information about changes in RT and accuracy.
# Case Study: Early Word Learning
Let's try applying this framework to early social word learning. One of the big puzzles of early language learning is *why* children begin to show evidence of language learning at the approximate developmental time they do, and not earlier or later [@tomasello1999]. Our framework provides a simple alternative: children may be trying to use language from very early, but the basic cognitive components may be too slow and too challenging to allow for consistent use (and consistent measurement by psychologists) until around the 1st birthday. The recent literature on early word learning gives some support for this contention, as more careful measurement has revealed some aspects of both receptive and productive language prior to the first birthday [@bergelson2012, @schneider2015].
We focus here on learning a word that is presented ostensively via a social cue like gaze or pointing, which we refer to as "social word learning." For simplicity, we decompose the task of social word learning into two abilities: 1) gaze following, and 2) word recognition. This task analysis is clearly incomplete or incorrect - for example, pointing is not the same as gaze following, and recognition is not the same as learning and retention. But it nevertheless captures some aspects of the task - following a socialcue to a distal target and processing some language associated with that target. And it has the major benefit for our purposes of providing data on development, since each of these tasks is well-studied.
Let's begin by attempting to estimate developmental changes in the speed of processing a word. Of course there are many factors involved, including the frequency and general familiarity of the word. But the literature on infants' early language processing generally attempts to select simple, easy words that should be accessible to almost all children, so we can use this literature to get a rough-and-ready estimate of declines in reaction time across ages.
```{r}
d <- read.csv("data/data.csv")
d.kid <- d %>%
filter(age < 5) %>%
mutate(weight = 1/n)
d.adult <- d %>%
filter(age > 20) %>%
summarise(rt = mean(rt))
ggplot(d.kid, aes(x = age, y = rt)) +
geom_point(aes(size = n, col = study),
position=position_dodge(width=.05)) +
geom_linerange(aes(ymin = rt - cih,
ymax = rt + cil,
col = study),
position=position_dodge(width=.05)) +
geom_smooth(aes(weight = weight),
method="lm", formula=y ~ log(x),
col="black", lty=3) +
geom_hline(aes(yintercept=d.adult$rt), lty=2) +
ylim(c(0,2)) +
xlim(c(0,5)) +
xlab("Age (years)") +
ylab("Reaction Time (s)")
```
Now fit a Kail curve to these data. Note that this is a simpler analysis - we're not fitting multipliers on the slope of a mental-rotation function, we're just fitting multipliers on the actual adult RT. The Kail curve looks almost identical to the log curve we fit to the kid data - although this is partially just that they are both exponential function fits, the asymptote didn't have to be the same (e.g., didn't have to be the adult RT).
```{r}
adult.rt <- d.adult$rt
fn <- function(x, a) {
a = .55
b = x[1]
c = x[2]
df <- d.kid
df$pred <- a + b * exp(-c * df$age)
err <- mean((df$rt - df$pred)^2)
return(err)
}
opt <- optim(c(1,1.5), fn, adult.rt)
d.kid$pred <- a + opt$par[1] * exp(-opt$par[2] * d.kid$age)
xs <- seq(0,5,.1)
d.pred <- data.frame(x = xs,
y = a + opt$par[1] * exp(-opt$par[2] * xs))
ggplot(d.kid, aes(x = age, y = rt)) +
geom_point(aes(size = n, col = study),
position=position_dodge(width=.05)) +
geom_linerange(aes(ymin = rt - cih,
ymax = rt + cil,
col = study),
position=position_dodge(width=.05)) +
geom_hline(aes(yintercept=adult.rt), lty=2) +
geom_line(data=d.pred, aes(x = x, y = y), col="red", lty=3) +
geom_smooth(aes(weight = weight),
method="lm", formula=y ~ log(x),
col="black", lty=3) +
ylim(c(0,2)) +
xlim(c(0,5)) +
xlab("Age (years)") +
ylab("Reaction Time (s)")
```
# General Discussion
+ Relationship to other models
+ Limitations
+ Relationship to neural development
+ Relationship to _g_ and other measures of intelligence
# References