-
Notifications
You must be signed in to change notification settings - Fork 1
/
10_StochasticContinuous_practical.qmd
335 lines (258 loc) · 10.7 KB
/
10_StochasticContinuous_practical.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
---
title: "10. Stochastic continuous models"
---
In this practical, we will implement a continuous-time, stochastic compartmental model of influenza transmission using the Gillespie algorithm.
## Practical 1. Stochastic simulation with the Gillespie algorithm
In this practical session, we will simulate the SIR model using the Gillespie algorithm.
### Setting up the model algorithm
The first bit of code below is a function that simulates an SIR model using the Gillespie algorithm.
```{r, output = FALSE}
library(ggplot2) ## for plotting
library(dplyr) ## for manipulation of results
library(tidyr) ## for storing multiple simulation runs in a data frame
## Function SIR_gillespie.
## This takes three arguments:
## - init_state: the initial state
## (a named vector containing the number in S, I and R)
## - parms: the parameters
## (a named vector containing the rates beta and gamma)
## - tf: the end time
SIR_gillespie <- function(init_state, parms, tf) {
time <- 0 ## initialise time to 0
## assign parameters to easy-access variables
beta <- parms[["beta"]]
gamma <- parms[["gamma"]]
## assign states to easy-access variables
S <- init_state[["S"]]
I <- init_state[["I"]]
R <- init_state[["R"]]
N <- S + I + R
## create results data frame
results_df <- data.frame(time = 0, t(init_state))
## loop until end time is reached
while (time < tf) {
## update current rates
rates <- c(
infection = beta * S * I / N,
recovery = gamma * I
)
if (sum(rates) > 0) { ## check if any event can happen
## time of next event
time <- time + rexp(n = 1, rate = sum(rates))
## check if next event is supposed to happen before end time
if (time <= tf) {
## determine the type of the next event
next_event <- sample(x = length(rates), size = 1, prob = rates)
## change to name
next_event <- names(rates)[next_event]
## determine type of next event
if (next_event == "infection") {
## infection
S <- S - 1
I <- I + 1
} else if (next_event == "recovery") {
## recovery
I <- I - 1
R <- R + 1
}
} else { ## next event happens after end time
time <- tf
}
} else { ## no event can happen - go straight to end time
time <- tf
}
## add new row to results data frame
results_df <- rbind(results_df, c(time = time, S = S, I = I, R = R))
}
## return results data frame
return(results_df)
}
```
Take 10 minutes to familiarise yourself with the `SIR_gillespie()` function and make sure you understand its inner workings.
Now, run the `SIR_gillespie()` function using the commands below.
```{r, output = FALSE}
init.values <- c(S = 249, I = 1, R = 0) ## initial state
parms <- c(beta = 1, gamma = 0.5) ## parameter vector
tmax <- 20 ## end time
## run Gillespie simulation
r <- SIR_gillespie(init_state = init.values, parms = parms, tf = tmax)
```
The data frame `r` now contains the results of the simulation.
::: {.callout-tip}
#### Exercise 1
Inspect the contents of the data frame and write some code to plot the results; you can use `plot` or `ggplot` for this. Run the simulation and plot the results multiple times to convince yourself the output is different every time.
:::
### Summarising multiple stochastic runs
Next, we'll run multiple simulations at once.
```{r, output = FALSE}
## Run multiple simulation runs and plot a few of them
nsim <- 100 ## number of trial simulations
traj <- tibble(i = 1:nsim) %>%
rowwise() %>%
mutate(trajectory = list(as.data.frame(
SIR_gillespie(init.values, parms, tmax)))) %>%
unnest(trajectory)
## convert to long data frame
mlr <- traj %>%
gather(compartment, value, 3:ncol(.))
```
The code above runs the simulation 100 times and stores the resulting model trajectories in a data frame, `traj`, which contains the results from multiple simulation runs and an additional column that represents the simulation index.
::: {.callout-tip}
#### Exercise 2
Write some code to plot the multiple simulation runs on the same plot.
:::
You'll notice that some outbreaks die out very quickly, while some others grow to affect large parts of the population.
::: {.callout-tip}
#### Exercise 3
Plot the distribution of overall outbreak sizes. What proportion of outbreaks dies out quickly?
:::
Next, we calculate the mean and standard deviation of each state across the multiple runs. To do that, we need the value of the different states at pre-defined time steps, whereas SIR_gillespie only returns the times at which certain events happened. In order to, for example, extract the values of the trajectory at integer time points, we can use the following:
```{r, output = FALSE}
## Extract the values of the trajectory at integer time points
timeTraj <- mlr %>%
group_by(i, compartment) %>%
summarise(traj = list(data.frame(
time = seq(0, tmax, by = 0.1),
value = approx(x = time, y = value, xout = seq(0, tmax, by = 0.1),
method="constant")$y))) %>%
unnest(traj)
```
Now, we calculate a summary trajectory containing the mean and standard deviation (sd) of the number of infectious people at every time step:
```{r, output = TRUE}
## Calculate summary trajectory with mean & sd of infectious people over time
sumTraj <- timeTraj %>%
filter(compartment == "I") %>%
group_by(time) %>%
summarise(mean = mean(value),
sd = sd(value))
## plot
ggplot(sumTraj, aes(x = time, y = mean, ymin = pmax(0, mean-sd), ymax = mean+sd)) +
geom_line() +
geom_ribbon(alpha = 0.3)
```
Your plot should look something like the above.
As a second summary, we can also consider only trajectories that have not gone extinct, that is, where $I>0$:
```{r, output = TRUE}
## Only consider trajectories that have not gone extinct:
sumTrajAll <- sumTraj %>%
mutate(trajectories="all")
sumTrajGr0 <- timeTraj %>%
filter(compartment == "I", value > 0) %>%
group_by(time) %>%
summarise(mean = mean(value),
sd = sd(value)) %>%
mutate(trajectories="greater_than_zero")
iTraj <- bind_rows(sumTrajAll, sumTrajGr0)
## plot
ggplot(iTraj, aes(x = time, y = mean, ymin = pmax(0, mean-sd), ymax = mean+sd,
colour = trajectories, fill = trajectories)) +
geom_line() +
geom_ribbon(alpha = 0.3) +
scale_color_brewer(palette="Set1")
```
::: {.callout-tip}
#### Exercise 4
We have plotted the mean $\pm$ standard deviation; is this a reasonable summary statistic? What else could one plot? Implement your own plot of trajectory plus uncertainty.
:::
### Comparing the stochastic model to the equivalent deterministic ODE model
We will now compare the stochastic model runs and their averages to the equivalent deterministic ODE model trajectory.
::: {.callout-tip}
#### Exercise 5
Using code from the ODE practical, load the **deSolve** package, define the model function (e.g. `SIR_model` from the ODE practical), run the model with the same parameters as the Gillespie model, and store the output in a data frame called `ode_output` (remember, you need to convert the matrix returned by **deSolve**'s `ode()` to a data frame using `as.data.frame()`).
:::
You can then plot both the stochastic and deterministic model runs using the following code:
```{r, eval = FALSE}
## Combine into one big data frame
allTraj <- ode_output %>%
gather(compartment, value, 2:ncol(.)) %>% ## convert to long format
filter(compartment == "I") %>%
rename(mean = value) %>% ## in deterministic, mean = value
mutate(trajectories="deterministic", ## label trajectories
sd = 0) %>% ## in deterministic, sd = 0
bind_rows(iTraj)
## plot
ggplot(allTraj, aes(x = time, y = mean, colour = trajectories)) +
geom_line() +
scale_color_brewer(palette="Set1")
```
::: {.callout-tip}
#### Exercise 6
Compare the deterministic trajectory to some of the individual stochastic trajectories and to their mean. Does the deterministic trajectory capture “typical” behaviour of the stochastic trajectories?
:::
::: {.callout-tip}
#### Exercise 7
Repeat the experiment with different values of the parameters. When does deterministic vs stochastic make more or less of a difference?
:::
::: {.callout-tip}
#### Exercise 8
Rewrite the model code to be an SEIR model. How do the results differ?
:::
## Practical 2. The `adaptivetau` package
In this practical session, we will use the `adaptivetau` package to run simulations. Start in a new file by loading the required packages:
```{r, output = FALSE, message = FALSE}
library(adaptivetau) ## for stochastic simulations
```
For adaptivetau, we first need to define the "transitions", or events that can happen in the model. In the SIR model, these are:
```{r, output = FALSE}
## Define transitions
transitions <- list(
c(S = -1, I = +1),
c(I = -1, R = +1))
```
We then need to specify a rate function:
```{r, output = FALSE}
## Specify rate function, giving rate for each transition
SIRrateF <- function(state, parms, time) {
beta <- parms[["beta"]]
gamma <- parms[["gamma"]]
S <- state[["S"]]
I <- state[["I"]]
R <- state[["R"]]
N <- S + I + R
rates <- c(beta * S * I/N,
gamma * I)
return(rates)
}
```
We then define our initial conditions and parameters:
```{r, output = FALSE}
## Initial values
init.values <- c(S = 249, ## number of susceptibles
I = 10, ## number infectious
R = 0) ## number immune
## Parameters
parms <- c(beta = 2, ## infection rate
gamma = 1) ## recovery rate
```
Now we will use the `ssa.adaptivetau()` function to run the model and plot the results.
```{r, echo = FALSE}
set.seed(123)
```
```{r, output = TRUE}
## Run a trial simulation for 60 time steps
tmax <- 60 ## number of time steps to simulate
r <- ssa.adaptivetau(init.values, transitions, SIRrateF, parms, tf = tmax)
## Plot results
r_df <- as.data.frame(r)
plot(r_df$time, r_df$I, type = "l")
```
Run the above multiple times -- as before, the results should change every time.
We will now again run multiple simulations and store them in a data frame, `traj`, which
contains multiple simulation runs and an additional column `i` that represents the simulation id:
```{r, output = TRUE}
## Run the simulation multiple times
nsim <- 100 ## number of trial simulations
traj <- tibble(i = 1:nsim) %>%
rowwise() %>%
mutate(trajectory = list(as.data.frame(
ssa.adaptivetau(init.values, transitions, SIRrateF, parms, tf = tmax)))) %>%
unnest(trajectory)
## Plot the resulting trajectories
ggplot(traj) +
geom_line(aes(x = time, y = I, group = i, colour = i))
```
::: {.callout-tip}
#### Exercise 1
Compare the run time of the command above to running 100 simulations using our Gillespie algorithm from Practical 1. How much is the speed gain?
:::
**Solutions to this practical can be accessed [here](10_StochasticContinuous_solutions.qmd).**