-
Notifications
You must be signed in to change notification settings - Fork 96
/
Copy pathstochastic_differential_equations.Rmd
1137 lines (945 loc) · 42.3 KB
/
stochastic_differential_equations.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "An Infinitesimal Introduction to Stochastic Differential Equation Modeling"
author: "Michael Betancourt"
date: "October 2021"
bibliography: stochastic_differential_equations.bib
csl: institute-of-mathematical-statistics.csl
link-citations: yes
linkcolor: blue
output:
html_document:
fig_caption: yes
theme: spacelab #sandstone #spacelab #flatly
highlight: pygments
toc: TRUE
toc_depth: 3
number_sections: TRUE
toc_float:
smooth_scroll: FALSE
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(comment=NA)
knitr::opts_knit$set(global.par = TRUE)
```
In this case study I introduce the very basics of modeling and inference with
stochastic differential equations. To demonstrate the concepts I also provide
examples of both prior and posterior inference in Stan.
# Stochastic Differential Equations
Stochastic differential equations and stochastic processes are rich and
sophisticated mathematics and in this section we will introduce only the main
concepts. We will begin by framing ordinary differential equations as a way
to generate deterministic evolution and then introduce stochastic differential
equations as a way to generate probabilistic evolution. Finally we will discuss
how that probabilistic evolution can be approximated in practice.
There are many references for further exploration although I have found that
@Pavliotis:2006 is a particularly comprehensive resource.
## Deterministic Differential Equations
A first-order, ordinary differential equation defines how a real-valued _state_
$x \in X \subseteq \mathbb{R}^{D}$ varies with respect to infinitesimal changes
in some auxiliary, real-valued quantity $t \in \mathbb{R}$,
$$
\frac{ \mathrm{d} x }{ \mathrm{d} t } = f(x, t).
$$
Here I will consider this auxiliary quantity as time so that the differential
equation defines the temporal evolution of some state.
Writing this first-order, ordinary differential equation in terms of
differentials,
$$
\mathrm{d} x = \mathrm{d} t \, f(x, t),
$$
allows us to integrate the differential equation from some initial state
$x_{0}$ at time $t = 0$ to the configuration of the state at a later time $T$,
$$
\begin{align*}
\mathrm{d} x &= \mathrm{d} t \, f(x, t)
\\
\int_{x_{0}}^{x_{T}} \mathrm{d} x' &= \int_{0}^{T} \mathrm{d} t' \, f(x(t'), t')
\\
x_{T} - x_{0} &= \int_{0}^{T} \mathrm{d} t' \, f(x(t'), t')
\\
x_{T} &= x_{0} + \int_{0}^{T} \mathrm{d} t' \, f(x(t'), t')
\end{align*}
$$
In other words the differential equation defines a deterministic map from some
input state at time $t = 0$ to some output state at time $t = T$.
By separating the integration into smaller intervals we can also recover
intermediate states. Consider for example the sequence of states
$x_{1}, \ldots, x_{N}$ at the times $t_{1}, \ldots, t_{N}$ with
$$
0 = t_{0} \lt t_{1} \lt \ldots \lt t_{N} = T.
$$
Given the initial state $x_{0}$ at $t = t_{0} = 0$ we can derive the first state
$x_{1}$ by integrating to $t_{1}$,
$$
x_{1} = x_{0} + \int_{t_{0}}^{t_{1}} \mathrm{d} t' \, f(x(t'), t').
$$
At that point we can derive the second state by integrating from $t_{1}$ to
$t_{2}$,
$$
x_{2} = x_{1} + \int_{t_{1}}^{t_{2}} \mathrm{d} t' \, f(x(t'), t'),
$$
and so on until we derive the final state from
$$
x_{N} = x_{N - 1} + \int_{t_{N - 1}}^{T} \mathrm{d} t' \, f(x(t'), t').
$$
Consequently the differential equation defines a deterministic map from the
input state to any sequence of later states.
If we consider an infinite number of intermediate times then we can recover all
of the intermediate states at the times time $t \in [0, T]$ while integrating
from $t = 0$ to $t = T$. From this perspective the differential equation and
initial state define a function from time to state configurations,
$$
\begin{alignat*}{6}
x :\; &[0, T]& &\rightarrow& \; &X&
\\
&t& &\mapsto& &x(t) = x_{0} + \int_{0}^{t} \mathrm{d} t' \, f(x(t'), t')&,
\end{alignat*}
$$
which we refer to as a _trajectory_.
<center>
<br>
```{r, out.width = "100%", echo=FALSE}
knitr::include_graphics("figures/states/states.png")
```
<br><br>
</center>
## Stochastic Differential Equations
A first-order _stochastic_ differential equation introduces a probabilistic
element to the evolution of a system along some auxiliary quantity. In analogy
to the deterministic differential equation that we considered above we can
define a stochastic differential equation as
$$
\mathrm{d} x = \mathrm{d} t \, f(x, t) + \mathrm{d} W_{t} \, g(x, t),
$$
where $\mathrm{d} W_{t}$ in the new term denotes a _Wiener process_ that defines
an independent distribution of perturbing values at each time $t$.
Technically there are two common ways to specify stochastic differential
equations, depending on how that second probabilistic term is treated. The form
used above, and which we will use through this study, is known as the _Itô_ form
of a stochastic differential equation; a common alternative is the
_Stratonovich_ form. Both forms can be used to implement the same probabilistic
evolutions, but the precise mathematical details of those implementations are
different.
The words "stochastic" and "probabilistic" derive from the same linguistic root,
the former through Greek and the latter through Latin. They also refer to the
same underlying mathematics. Although "stochastic differential equation" has
become conventional in the academic literature these mathematical systems can
just as well be referred to as "probabilistic differential equations". In fact
"probabilistic" more directly describes how these systems transcend ordinary
differential equations.
Although even defining the integration of these systems is complicated, it can
be done with enough mathematical fortitude. What results are probabilistic
variants of the deterministic evolution we derived from an ordinary differential
equation.
Integrating a stochastic differential equation from the initial state $x'$ at
time $t'$ to the time $t$ yields not a single new state configuration but rather
a _probability distribution_ of state configurations represented by the
conditional probability density function
$$
\pi(x_{t} \mid x_{t'}) \equiv p(x, t; x', t').
$$
Formally this conditional probability density function is implicitly defined by
the _Fokker-Planck_ partial differential equation,
$$
\frac{ \partial}{\partial t} p(x, t; x', t')
=
- \frac{ \partial}{\partial x} \left( p(x, t; x', t') \cdot f(x, t) \right)
+ \frac{1}{2}
\frac{ \partial^{2}}{\partial x^{2}} \left( p(x, t; x', t')\cdot g(x, t) \right)
$$
with initial value
$$
p(x, t = t'; x', t') = \delta(x - x').
$$
Similarly integrating across multiple time intervals defines a joint probability
distribution over the corresponding intermediate states,
$$
\pi(x_{1}, \ldots, x_{N} \mid x_{0})
=
\prod_{n = 1}^{N} \pi(x_{n} \mid x_{n - 1}).
$$
In the same way integrating over infinitesimal time intervals defines a
probability distribution over the infinite-dimensional space of trajectories!
These infinite-dimensional probability distributions are referred to as
_stochastic processes_.
Equivalently we can consider the stochastic differential equation as first
defining an infinite-dimensional probability distribution over trajectories
which we can then marginalize to a finite-dimensional probability distribution
over the states at a finite set of times. The similarity to
[gaussian processes](https://betanalpha.github.io/assets/case_studies/gaussian_processes.html)
here is no coincidence; gaussian processes are special case of stochastic
processes. Many gaussian process have not only useful kernel representations
but also useful stochastic differential equation representations
[@LindgrenEtAl:2021].
In either case we never actually work with that infinite-dimensional model
directly. Instead we only ever consider the marginalization to a finite number
of states,
$$
\pi(x_{1}, \ldots, x_{N} \mid x_{0})
=
\prod_{n = 1}^{N} \pi(x_{n} \mid x_{n - 1}).
$$
## Numerical Integration of Stochastic Differential Equations
While the Fokker-Planck equation formally defines the conditional probability
density functions that specifies the probabilistic evolution of a stochastic
differential equation, in practice we can rarely solve for that probabilistic
evolution in closed form.
In order to incorporate a stochastic differential equation
$$
\mathrm{d} x = \mathrm{d} t \, f(x, t) + \mathrm{d} W_{t} \, g(x, t),
$$
into a model we often have to consider tractable approximations to
$\pi(x_{1}, \ldots, x_{I} \mid x_{0})$. The most common approximation is the
_Euler-Maruyama approximation_ which models each conditional distribution as
$$
\pi(x_{n} \mid x_{n - 1})
\approx
\text{normal} \left(
x_{n - 1} + f(x_{n - 1}, t_{n - 1}) \cdot (t_{n} - t_{n - 1}),
g(x_{n - 1}, t_{n - 1}) \cdot \sqrt{ t_{n} - t_{n - 1} } \right).
$$
We can interpret this approximation as a first-order Euler approximation to
the deterministic evolution,
$$
\xi_{n} \approx x_{n - 1} + f(x_{n - 1}, t_{n - 1}) \cdot (t_{n} - t_{n - 1}),
$$
convolved with a distribution of perturbations,
$$
\begin{align*}
\delta_{n} &\sim \text{normal} \left(0,
g(x_{n - 1}, t_{n - 1}) \cdot \sqrt{ t_{n} - t_{n - 1} } \right)
\\
x_{n} &\rightarrow \xi_{n - 1} + \delta_{n}.
\end{align*}
$$
While not the most sophisticated method the Euler-Maruyama approximation is
straightforward to implement in practice and can be reasonably accurate if the
time differences $t_{n} - t_{n - 1}$ are small enough. If the states of
practical interest are too far apart to achieve a reasonable accuracy then the
only way to utilize the Euler-Maruyama approximation is to introduce
intermediate states to fill in the gaps.
In other words we have to model
$$
\begin{align*}
\pi(x_{1}, \ldots, x_{N} \mid x_{0})
&=
\prod_{n = 1}^{I} \pi(x_{n} \mid x_{n - 1})
\\
&\approx
\text{normal} \left(
x_{n - 1} + f(x_{n - 1}, t_{n - 1}) \cdot (t_{n} - t_{n - 1}),
g(x_{n - 1}, t_{n - 1}) \cdot \sqrt{ t_{n} - t_{n - 1} } \right).
\end{align*}
$$
for large enough $N$ such that each $t_{n} - t_{n - 1}$ is small enough for the
Euler-Maruyama approximation to be sufficiently accurate. Only afterwards can
we select the subset of $x_{n}$ that we'll need to evaluate the rest of the
model.
# Implementing Stochastic Differential Equations
In this section we'll discuss how to incorporate stochastic differential
equations into probabilistic models and _infer_ which state configurations are
consistent with observed data, both theoretically and in practice with Stan.
In preparation let's set up our `R` environment for running Stan,
```{r, warning=FALSE, message=FALSE}
library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
parallel:::setDefaultClusterOptions(setup_strategy = "sequential")
util <- new.env()
source('stan_utility.R', local=util)
set.seed(5584339)
```
and configure our graphical settings,
```{r}
library(colormap)
c_light <- c("#DCBCBC")
c_light_highlight <- c("#C79999")
c_mid <- c("#B97C7C")
c_mid_highlight <- c("#A25050")
c_dark <- c("#8F2727")
c_dark_highlight <- c("#7C0000")
nom_colors <- c("#DCBCBC", "#C79999", "#B97C7C", "#A25050", "#8F2727", "#7C0000")
c_light_teal <- c("#6B8E8E")
c_mid_teal <- c("#487575")
c_dark_teal <- c("#1D4F4F")
par(family="CMU Serif", las=1, bty="l", cex.axis=1, cex.lab=1, cex.main=1,
xaxs="i", yaxs="i", mar = c(5, 5, 3, 1))
```
## The Ornstein–Uhlenbeck Process
To demonstrate the basic concepts of stochastic differential equations let's
consider a mean-reverting _Ornstein–Uhlenbeck process_ defined over the state
space $X = \mathbb{R}$ with the stochastic differential equation
$$
\mathrm{d} x = \mathrm{d} t \, \gamma \, (\mu - x) + \mathrm{d} W_{t} \, \tau.
$$
Here the deterministic evolution defined by
$$
\mathrm{d} x = \mathrm{d} t \, \gamma \, (\mu - x)
$$
alone eventually converges to $\mu$ with $\gamma$ determining the speed of that
convergence. The probabilistic contribution introduces perturbations around
that deterministic behavior that propagation along the evolution. Because of
these perturbations almost all of the trajectories that can be realized from
this process are continuous but not differentiable.
A huge benefit of the mean-reverting Ornstein–Uhlenbeck process is that the
corresponding Fokker-Planck equation can be solved in closed form to give an
exact description of the probabilistic evolution from a fixed initial point,
$$
\pi(x_{t} \mid x_{t'})
=
\text{normal} \left( x_{t} \mid
\mu + (x_{t'} - \mu) e^{- \gamma \, (t - t')},
\frac{\tau}{\sqrt{2}} \sqrt{ \frac{ 1 - e^{- 2 \, \gamma \, (t - t')}}{\gamma} }
\right),
$$
or equivalently
$$
\pi(x_{t + \delta t} \mid x_{t})
=
\text{normal} \left( x_{t + \delta t} \mid
\mu + (x_{t} - \mu) e^{-\gamma \, \delta t},
\frac{\tau}{\sqrt{2}} \sqrt{ \frac{ 1 - e^{- 2 \, \gamma \, \delta t}}{\gamma} }
\right).
$$
From this onditional probability density function between neighboring states we
can then build up the probability density function for all states given the
initial state.
The Euler-Maruyama approximation utilizes the conditional density function
with
$$
\pi(x_{t + \delta t} \mid x_{t})
=
\text{normal} \left( x_{t + \delta t} \mid
x_{t} + \gamma \, (\mu - x_{t}) \, \delta t,
\tau \, \sqrt{\delta t}
\right).
$$
In this case the Euler-Maruyama approximation is given by the same normal
family of probability density functions as the exact solution. Moreover the
location and scale parameters of the approximation with the normal family are
equivalent to the best linear approximations of the exact location and scale
parameters,
$$
\begin{align*}
\mu + (x_{t} - \mu) e^{-\gamma \, \delta t}
&\approx \mu + (x_{t} - \mu) \, (1 - \gamma \, \delta t)
\\
&\approx \mu + x_{t} - \mu - \gamma \, (x_{t} - \mu) \, \delta t
\\
&\approx x_{t} + \gamma \, (\mu - x_{t}) \, \delta t,
\end{align*}
$$
and
$$
\begin{align*}
\frac{\tau}{\sqrt{2}} \sqrt{ \frac{ 1 - e^{-2 \gamma \, \delta t}}{\gamma} }
&=
\tau \, \sqrt{ \frac{ 1 - e^{-2 \gamma \, \delta t}}{2 \, \gamma} }
\\
&\approx
\tau \, \sqrt{ \frac{ 1 - 1 + 2 \gamma \, \delta t}{2 \, \gamma} }
\\
&\approx
\tau \, \sqrt{ \frac{ 2 \gamma \, \delta t}{2 \, \gamma} }
\\
&\approx
\tau \, \sqrt{ \delta t }.
\end{align*}
$$
This coincidence, however, will not hold in general.
## Simulating Probabilistic Evolution
In order to simulate the probabilistic evolution defined by a stochastic
differential equation we have to sample from the corresponding joint
distribution of states. Here let's consider states on a relatively fine
partition of times between $t = 0$ and $t = 10$.
```{r}
N <- 100
t <- 10 * (0:N) / N
```
For convenience I've explicitly included the initial state so that there are
$N + 1$ total times.
Fixing the initial condition $x_{0}$ allows us to evolve to the $N$ states that
follow in time. If we also fix the Ornstein-Uhlenbeck parameters $\mu$,
$\gamma$, and $\tau$ then we can use the conditional density function derived
from the Fokker-Planck equation to construct a joint density function for all of
the following states,
$$
\begin{align*}
\pi(x_{1}, \ldots, x_{N} \mid x_{0}, \mu, \gamma, \tau)
&=
\prod_{n = 1}^{N} \pi(x_{n} \mid x_{n - 1}, \mu, \gamma, \tau)
\\
&=
\prod_{n = 1}^{N}
\text{normal} \left( x_{n} \mid
\mu + (x_{n - 1} - \mu) e^{-\gamma \, (t_{n} - t_{n - 1})},
\frac{\tau}{\sqrt{2}} \sqrt{ \frac{ 1 - e^{- 2 \gamma \, (t_{n} - t_{n - 1})}}{\gamma} }
\right).
\end{align*}
$$
Conveniently this joint density function is straightforward to implement in
Stan.
```{r}
writeLines(readLines("stan_programs/ou_analytic.stan"))
```
Given the Stan program we can use Markov chain Monte Carlo to simulate
realized trajectories evolving from $x_{0}$.
```{r}
x0 <- -2
mu <- 4
gamma <- 0.75
tau <- 1
simu_data <- list(N=N, t=t, x0=x0, mu=mu, gamma=gamma, tau=tau)
```
```{r, warning=FALSE, message=FALSE}
fit <- stan(file='stan_programs/ou_analytic.stan', data=simu_data,
seed=494838, refresh=1000)
```
The warning here arises because we're using the concatenated array `x` to build
up the joint density function instead of the stated parameters `x_free`. In
this case the warning is a false positive.
```{r}
util$check_all_diagnostics(fit)
```
Because `x[1]` here is just the fixed initial condition we can ignore the
effective sample size and $\hat{R}$ warnings and move on.
```{r}
ou_anal_mcmc_samples <- extract(fit)
```
Because our times are so close together we might be tempted to interpolate
between each sampled state configuration with a line in order to approximately
visualize the trajectories supported by this Ornstein-Uhlenbeck. Technically
the Ornstein-Uhlenbeck process allocates zero probability to piecewise
continuous trajectories, including these linear interpolations, but this
visualization device is still common despite it not being formally correct. We
just have to be careful to recognize that if we zoomed into any line segment
then we would see more variation.
```{r}
plot_realizations <- function(t, x, name, ylim=c(-3, 7)) {
I <- length(x[,1])
plot_idx <- seq(1, I, 80)
J <- length(plot_idx)
line_colors <- colormap(colormap=nom_colors, nshades=J)
plot(1, type="n", xlab="t", ylab="x", main=name,
xlim=c(t[1], t[length(t)]), ylim=ylim)
for (j in 1:J)
lines(t, x[plot_idx[j],], col=line_colors[j], lwd=2)
}
plot_marginal_quantiles <- function(t, x, name, ylim=c(-3, 7)) {
probs = c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9)
cred <- sapply(1:length(t), function(n) quantile(x[,n], probs=probs))
plot(1, type="n", main=name,
xlab="t", ylab="x", xlim=c(t[1], t[length(t)]), ylim=ylim)
polygon(c(t, rev(t)), c(cred[1,], rev(cred[9,])),
col = c_light, border = NA)
polygon(c(t, rev(t)), c(cred[2,], rev(cred[8,])),
col = c_light_highlight, border = NA)
polygon(c(t, rev(t)), c(cred[3,], rev(cred[7,])),
col = c_mid, border = NA)
polygon(c(t, rev(t)), c(cred[4,], rev(cred[6,])),
col = c_mid_highlight, border = NA)
lines(t, cred[5,], col=c_dark, lwd=2)
}
par(mfrow=c(1, 2))
plot_realizations(simu_data$t, ou_anal_mcmc_samples$x, "Trajectory Realizations")
plot_marginal_quantiles(simu_data$t, ou_anal_mcmc_samples$x, "Marginal Quantiles")
```
While Markov chain Monte Carlo worked well here we can also exploit the
conditional structure of the joint density function to generate exact samples
with ancestral sampling, allowing us to use pure Monte Carlo.
```{r}
writeLines(readLines("stan_programs/simu_ou_analytic.stan"))
```
```{r, warning=FALSE, message=FALSE}
fit <- stan(file='stan_programs/simu_ou_analytic.stan', data=simu_data,
warmup=0, iter=4000, chains=1, seed=494838,
algorithm="Fixed_param", refresh=4000)
ou_anal_mc_samples <- extract(fit)
```
While the resulting simulations are cheaper to generate they yield the same
qualitative results as the Markov chain Monte Carlo sampling.
```{r}
par(mfrow=c(1, 2))
plot_realizations(simu_data$t, ou_anal_mc_samples$x, "Trajectory Realizations")
plot_marginal_quantiles(simu_data$t, ou_anal_mc_samples$x, "Marginal Quantiles")
```
With the states so close in time we should also be able to approximate this
probabilistic evolution with the Euler-Maruyama approximation. As with the
analytic model we can implement the Euler-Maruyama joint density function in
Stan and then use Markov chain Monte Carlo to generate samples.
```{r}
writeLines(readLines("stan_programs/ou_em.stan"))
```
```{r, warning=FALSE, message=FALSE}
fit <- stan(file='stan_programs/ou_em.stan', data=simu_data,
seed=494838, refresh=1000)
util$check_all_diagnostics(fit)
```
These warnings are safe to ignore here for the same reasons as above.
```{r}
ou_em_mcmc_samples <- extract(fit)
par(mfrow=c(1, 2))
plot_realizations(simu_data$t, ou_em_mcmc_samples$x, "Trajectory Realizations")
plot_marginal_quantiles(simu_data$t, ou_em_mcmc_samples$x, "Marginal Quantiles")
```
The iterative nature of the Euler-Maruyama approximation also naturally admits
ancestral sampling.
```{r}
writeLines(readLines("stan_programs/simu_ou_em.stan"))
```
```{r, warning=FALSE, message=FALSE}
fit <- stan(file='stan_programs/simu_ou_em.stan', data=simu_data,
warmup=0, iter=4000, chains=1, seed=494838,
algorithm="Fixed_param", refresh=4000)
ou_em_mc_samples <- extract(fit)
```
```{r}
par(mfrow=c(1, 2))
plot_realizations(simu_data$t, ou_em_mc_samples$x, "Trajectory Realizations")
plot_marginal_quantiles(simu_data$t, ou_em_mc_samples$x, "Marginal Quantiles")
```
Because we have analytic results we can visually confirm that the Euler-Maruyama
approximation is doing quite well for this time discretization, although in
general validating the accuracy of the approximation is significantly more
challenging.
```{r}
par(mfrow=c(2, 2))
plot_marginal_quantiles(simu_data$t, ou_anal_mcmc_samples$x,
"Analytic Markov Chain Monte Carlo")
plot_marginal_quantiles(simu_data$t, ou_anal_mc_samples$x,
"Analytic Monte Carlo")
plot_marginal_quantiles(simu_data$t, ou_em_mcmc_samples$x,
"Euler-Maruyama Markov Chain Monte Carlo")
plot_marginal_quantiles(simu_data$t, ou_em_mc_samples$x,
"Euler-Maruyama Monte Carlo")
```
When we're interested in inferring the initial condition and the
Ornstein-Uhlenbeck parameters we need to give them their own probabilistic
structure. This results in a full latent model that covers all of the states
and the stochastic process parameters,
$$
\begin{align*}
\pi(x_{0}, x_{1}, \ldots, x_{N}, \mu, \gamma, \tau)
&=
\pi(x_{1}, \ldots, x_{N} \mid x_{0}, \mu, \gamma, \tau) \, \pi(x_{0}) \,
\pi(\mu, \gamma, \tau)
\\
&=
\left[ \prod_{n = 1}^{N} \pi(x_{n} \mid x_{n - 1}, \mu, \gamma, \tau) \right] \,
\pi(x_{0}) \, \pi(\mu, \gamma, \tau).
\end{align*}
$$
With just a few modifications of our initial program we can implement this
expanded model in Stan.
```{r}
writeLines(readLines("stan_programs/simu_ou_analytic_full_latent.stan"))
```
```{r, warning=FALSE, message=FALSE}
fit <- stan(file='stan_programs/simu_ou_analytic_full_latent.stan', data=simu_data,
warmup=0, iter=4000, chains=1, seed=494838,
algorithm="Fixed_param", refresh=4000)
prior_samples <- extract(fit)
```
By incorporating an entire space of initial conditions and Ornstein-Uhlenbeck
configurations we drastically increase the diversity of trajectory behaviors
that we can integrate into inferential models.
```{r}
par(mfrow=c(1, 2))
plot_realizations(simu_data$t, prior_samples$x, "Trajectory Realizations", ylim=c(-30, 30))
plot_marginal_quantiles(simu_data$t, prior_samples$x, "Marginal Quantiles", ylim=c(-30, 30))
```
```{r, eval=FALSE, echo=FALSE}
s <- 84
mu <- prior_samples$mu[s]
gamma <- prior_samples$gamma[s]
tau <- prior_samples$tau[s]
x0 <- prior_samples$x0[s]
x <- prior_samples$x[s,]
stan_rdump(c("mu", "gamma", "tau", "x0", "x"), file="truth.data.R")
N <- simu_data$N
t <- simu_data$t
N_obs <- 20
obs_idx <- c(2, 4, 15, 18, 20, 24, 28, 29, 36, 37, 63, 66, 67, 68, 69, 76, 87, 89, 91, 96)
t_obs <- simu_data$t[obs_idx]
sigma <- 0.25
y_obs <- x[obs_idx] + rnorm(N_obs, 0, sigma)
stan_rdump(c("N", "t", "N_obs", "obs_idx", "t_obs", "y_obs", "sigma"), file="obs.data.R")
```
## Inferring Stochastic Processes
Speaking of inferential models let's consider observations made at different
times, and hence generated from some of the latent states evolving according to
our assumed stochastic differential equation,
$$
\pi(y_{1}, \ldots, y_{K} | x_{n(1)}, \ldots x_{n(K)}).
$$
In this context we can ask which configurations of the latent stochastic process
are compatible with realized observations
$\tilde{y}_{1}, \ldots, \tilde{y}_{K}$.
To quantify this consistency with Bayesian inference we first construct the full
Bayesian model,
$$
\begin{align*}
\pi(y_{1}, \ldots, y_{K}, x_{0}, x_{1}, \ldots, x_{N}, \mu, \tau, \gamma)
&= \;\;
\pi(y_{1}, \ldots, y_{K} | x_{n(1)}, \ldots x_{n(K)})
\\
&\quad \cdot
\pi(x_{0}, x_{1}, \ldots, x_{N}, \mu, \tau, \gamma),
\end{align*}
$$
and then let Stan condition on the observed data and generate samples from the
corresponding posterior distribution,
$$
\pi(x_{0}, x_{1}, \ldots, x_{N}, \mu, \gamma, \tau
\mid \tilde{y}_{1}, \ldots, \tilde{y}_{K}).
$$
Even if we are interested in only the Ornstein-Uhlenbeck parameters $\mu$,
$\tau$, and $\gamma$ we first have to construct the full posterior distribution
and then marginalize,
$$
\begin{align*}
\pi(\mu, \gamma, \tau \mid y_{1}, \ldots, y_{K})
&=
\int \prod_{n = 0}^{N} \mathrm{d} x_{n} \,
\pi(x_{1}, \ldots, x_{N}, x_{0}, \mu, \gamma, \tau
\mid \tilde{y}_{1}, \ldots, \tilde{y}_{K})
\\
&=
\int \prod_{k = 0}^{K} \mathrm{d} x_{n(k)} \,
\pi(x_{n(1)}, \ldots, x_{n(K)}, x_{0}, \mu, \gamma, \tau
\mid \tilde{y}_{1}, \ldots, \tilde{y}_{K})
\end{align*}
$$
Whenever we condition on observed data only the state configurations
$x_{1}, \ldots, x_{N}$ realized from the _posterior_ stochastic process, not the
_prior_ stochastic process, are relevant. In particular the simulations we
generated from $\pi(x_{0}, x_{1}, \ldots, x_{N}, \mu, \gamma, \tau)$ above will
not be helpful in quantifying our posterior inferences.
To demonstrate proper posterior inference let's bring in some simulated data,
along with the true realization from which the data were simulated for
comparison.
```{r}
data <- read_rdump("data/obs.data.R")
truth <- read_rdump("data/truth.data.R")
par(mfrow=c(1, 1))
plot(data$t, truth$x, type="l", lwd=2, main="", col="black",
xlab="t", ylab="x", xlim=c(0, 10), ylim=c(-8, 10))
points(data$t_obs, data$y_obs, pch=16, cex=1.00, col="white")
points(data$t_obs, data$y_obs, pch=16, cex=0.75, col=c_mid_teal)
```
Here we will assume that the observations are given by independent normal
density functions,
$$
\pi(y_{1}, \ldots, y_{K} | x_{n(1)}, \ldots x_{n(K)})
=
\prod_{k = 1}^{K} \text{normal}(y_{k} \mid x_{n(k)}, \sigma).
$$
Because the measurement variability $\sigma$ is often difficult to distinguish
from the Ornstein-Uhlenbeck variability $\tau$ I will fix $\sigma$ to its true
value to simplify the demonstration here. The potential degeneracies of common
stochastic processes are their own long discussion.
With the exact probabilistic evolution model we can construct the full Bayesian
model in Stan using only the states at the observed times. This is useful if we
are interested in only the Ornstein-Uhlenbeck parameters and not the underlying
trajectories.
```{r}
writeLines(readLines("stan_programs/fit_ou_analytic_obs_t.stan"))
```
```{r, warning=FALSE, message=FALSE}
data$t0 <- data$t[1]
fit <- stan(file='stan_programs/fit_ou_analytic_obs_t.stan',
data=data, seed=5838299, refresh=1000)
util$check_all_diagnostics(fit)
posterior_samples <- extract(fit)
```
The marginal posterior distribution demonstrate that the posterior captures the
true configuration of the underlying Ornstein-Uhlenbeck process well.
```{r}
plot_marginal <- function(name, prior_samples, posterior_samples, display_xlims) {
prior_values <- prior_samples[name][[1]]
posterior_values <- posterior_samples[name][[1]]
bin_lims <- range(c(prior_values, posterior_values))
delta <- diff(range(posterior_values)) / 50
breaks <- seq(bin_lims[1], bin_lims[2]+ delta, delta)
y_max <- max(hist(prior_values, breaks=breaks, plot=F)$counts,
hist(posterior_values, breaks=breaks, plot=F)$counts)
hist(prior_values, breaks=breaks,
main="", xlab=name, xlim=display_xlims,
ylab="", ylim=c(0, 1.05 * y_max), yaxt='n',
col=c_light, border=c_light_highlight)
hist(posterior_values, breaks=breaks,
col=c_dark, border=c_dark_highlight, add=T)
abline(v=truth[name], col="white", lty=1, lwd=3)
abline(v=truth[name], col="black", lty=1, lwd=2)
}
par(mfrow=c(2, 2))
plot_marginal("mu", prior_samples, posterior_samples, c(-20, 20))
plot_marginal("gamma", prior_samples, posterior_samples, c(0, 3))
plot_marginal("tau", prior_samples, posterior_samples, c(0, 4))
plot_marginal("x0", prior_samples, posterior_samples, c(-10, 10))
```
If we want to investigate the latent trajectories consistent with the
observations, and not just the state behavior at the observed times, then we
need to consider a fine temporal grid and the many state configurations along
that grid.
```{r}
writeLines(readLines("stan_programs/fit_ou_analytic_all_t.stan"))
```
As this case study is meant to be a brief discussion I won't walk us through
every detail of this more sophisticated Stan program. In summary the analytic
probabilistic evolution model introduces many latent normal density functions
which can be implemented with the centered and non-centered parameterizations we
discussed in my
[hierarchical modeling case study](https://betanalpha.github.io/assets/case_studies/hierarchical_modeling.html). The best geometry is given by centering only those
states that are strongly informed by the realized likelihood function, which in
this case include the states at the observed times.
```{r, warning=FALSE, message=FALSE}
data$unobs_idx <- setdiff(1:data$N + 1, data$obs_idx)
fit <- stan(file='stan_programs/fit_ou_analytic_all_t.stan',
data=data, seed=5838299, refresh=1000)
util$check_all_diagnostics(fit)
anal_post_samples <- extract(fit)
```
With a normal density function connecting each latent state there are many
opportunities for funnel degeneracies, but our careful parameterization seems to
have been sufficient here. Again a full discussion of the degeneracies inherent
to Ornstein-Uhlenbeck processes, let alone other common stochastic processes, is
beyond the scope of this case study.
Without any indication of computational problems we can move on to examining our
inferences. The marginal posterior distributions are consistent with the
previous fit.
```{r}
par(mfrow=c(2, 2))
plot_marginal("mu", prior_samples, anal_post_samples, c(-20, 20))
plot_marginal("gamma", prior_samples, anal_post_samples, c(0, 3))
plot_marginal("tau", prior_samples, anal_post_samples, c(0, 4))
plot_marginal("x0", prior_samples, anal_post_samples, c(-10, 10))
```
Now, however, we can visualize the posterior stochastic process using the
technically-incorrect-but-practically-common linear interpolation heuristic.
```{r}
par(mfrow=c(2, 2))
plot_realizations(data$t, prior_samples$x,
"Prior Realizations", ylim=c(-12, 12))
plot_realizations(data$t, anal_post_samples$x,
"Posterior Realizations", ylim=c(-12, 12))
points(data$t_obs, data$y_obs, pch=16, cex=1.00, col="white")
points(data$t_obs, data$y_obs, pch=16, cex=0.75, col=c_mid_teal)
plot_marginal_quantiles(data$t, prior_samples$x,
"Prior Quantiles", yli=c(-12, 12))
plot_marginal_quantiles(data$t, anal_post_samples$x,
"Posterior Quantiles", yli=c(-12, 12))
points(data$t_obs, data$y_obs, pch=16, cex=1.00, col="white")
points(data$t_obs, data$y_obs, pch=16, cex=0.75, col=c_mid_teal)
```
Because we have access to the actual realized trajectory we can also see that
our inferred posterior distribution recovers that latent truth.
```{r}
par(mfrow=c(1, 2))
plot_realizations(data$t,anal_post_samples$x,
"Posterior Realizations", ylim=c(-6, 11))
lines(data$t, truth$x, type="l", lwd=3, main="", col="white")
lines(data$t, truth$x, type="l", lwd=2, main="", col="black")
points(data$t_obs, data$y_obs, pch=16, cex=1.00, col="white")
points(data$t_obs, data$y_obs, pch=16, cex=0.75, col=c_mid_teal)
plot_marginal_quantiles(data$t, anal_post_samples$x,
"Posterior Quantiles", yli=c(-6, 11))
lines(data$t, truth$x, type="l", lwd=3, main="", col="white")
lines(data$t, truth$x, type="l", lwd=2, main="", col="black")
points(data$t_obs, data$y_obs, pch=16, cex=1.00, col="white")
points(data$t_obs, data$y_obs, pch=16, cex=0.75, col=c_mid_teal)
```
Finally let's consider how well we can infer these behaviors using the
Euler-Maruyama approximation to which we're often limited in practice. In this
case the fine temporal grid between the observation times isn't just a
convenience but rather a requirement for reasonable accuracy.
```{r}
writeLines(readLines("stan_programs/fit_ou_em_all_t.stan"))
```
No matter the structure of the generating stochastic differential equation the
Euler-Maruyama approximation always yields latent normal density functions, and
the parameterization considerations from above will always be relevant.
```{r, warning=FALSE, message=FALSE}
fit <- stan(file='stan_programs/fit_ou_em_all_t.stan', data=data, seed=5838299, refresh=1000)
util$check_all_diagnostics(fit)
em_post_samples <- extract(fit)
```
The diagnostics look clean, allowing us to move on to investigation of the
posterior distribution. Despite the approximate model the resulting marginal
posterior distributions agree with the those from the exact model.
```{r}
par(mfrow=c(2, 2))
plot_marginal("mu", prior_samples, em_post_samples, c(-20, 20))
plot_marginal("gamma", prior_samples, em_post_samples, c(0, 3))
plot_marginal("tau", prior_samples, em_post_samples, c(0, 4))
plot_marginal("x0", prior_samples, em_post_samples, c(-10, 10))
```
Similarly we are able to recover the true trajectory realization.
```{r}
par(mfrow=c(1, 2))
plot_realizations(data$t, em_post_samples$x, "Posterior Realizations", ylim=c(-6, 11))
lines(data$t, truth$x, type="l", lwd=3, main="", col="white")
lines(data$t, truth$x, type="l", lwd=2, main="", col="black")
points(data$t_obs, data$y_obs, pch=16, cex=1.00, col="white")
points(data$t_obs, data$y_obs, pch=16, cex=0.75, col=c_mid_teal)
plot_marginal_quantiles(data$t, em_post_samples$x, "Posterior Quantiles", yli=c(-6, 11))
lines(data$t, truth$x, type="l", lwd=3, main="", col="white")
lines(data$t, truth$x, type="l", lwd=2, main="", col="black")
points(data$t_obs, data$y_obs, pch=16, cex=1.00, col="white")
points(data$t_obs, data$y_obs, pch=16, cex=0.75, col=c_mid_teal)
```
Visually comparing inferences from the exact and approximate models side by side
we don't see any substantial differences, indicating that the Euler-Maruyama
approximation is doing well.
```{r}
par(mfrow=c(2, 2))
plot_realizations(data$t,anal_post_samples$x,
"Analytic Posterior Realizations", ylim=c(-6, 11))
plot_realizations(data$t, em_post_samples$x,
"Euler-Maruyama Posterior Realizations", ylim=c(-6, 11))
plot_marginal_quantiles(data$t, anal_post_samples$x,
"Analytic Posterior Quantiles", yli=c(-6, 11))
plot_marginal_quantiles(data$t, em_post_samples$x,
"Euler-Maruyama Posterior Quantiles", yli=c(-6, 11))
```
When we observe only a few states and aren't interested in inferring latent
trajectories the fine temporal grid needed for the Euler-Maruyama approximation
to be reasonably accurate can be an expensive burden. If we are interested in
recovering the realized trajectory to high temporal resolution anyways, however,
then the Euler-Maruyama approximation is quite natural.
That said quantifying the accuracy of the Euler-Maruyama approximation, and how
finely we need to discretize the time intervals between observed states, is a
much more subtle and challenging problem when we can't compare to the exact
model.
# Alternative Computational Methods
In this introduction I have considered only Monte Carlo and Markov chain Monte
Carlo methods for quantifying the prior and posterior distributions induced by a
a stochastic differential equation model, but there are a variety of other
computational methods that can be useful in certain circumstances.
For example if we are interested not in the full posterior distribution,
$$
\pi(x_{0}, x_{1}, \ldots, x_{N}, \mu, \gamma, \tau
\mid \tilde{y}_{1}, \ldots, \tilde{y}_{K}),
$$
but only the marginal posterior distribution,
$$
\pi(\mu, \gamma, \tau \mid y_{1}, \ldots, y_{K})
=