-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy path02-small-and-large-worlds.qmd
3345 lines (2661 loc) · 117 KB
/
02-small-and-large-worlds.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
format: html
execute:
cache: true
filters:
- quarto
- nameref
---
# Small and Large Worlds {#sec-chap02}
## ORIGINAL {.unnumbered}
::: my-objectives
::: my-objectives-header
Learning Objectives
:::
::: my-objectives-container
> "This chapter focuses on the small world. It explains probability
> theory in its essential form: counting the ways things can happen.
> Bayesian inference arises automatically from this perspective. Then
> the chapter presents the stylized components of a Bayesian statistical
> model, a model for learning from data. Then it shows you how to
> animate the model, to produce estimates.
> All this work provides a foundation for the next chapter, in which
> you'll learn to summarize Bayesian estimates, as well as begin to
> consider large world obligations." ([McElreath, 2020, p.
> 20](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=39&annotation=I8UG2HET))
:::
:::
The text references many times to the notions of small and large worlds:
- The **small world** is the self-contained logical world of the
model.
- The **large world** is the broader context in which one deploys a
model.
## TIDYVERSE {.unnumbered}
The work by Solomon Kurz has many references to R specifics, so that
people new to R can follow the course. Most of these references are not
new to me, so I will not include them in my personal notes. There are
also very important references to other relevant articles I do not know.
But I will put these kind of references for now aside and will me mostly
concentrate on the replication and understanding of the code examples.
One challenge for Kurz was to replicate *all* the graphics of the
original version, even if they were produced just for understanding of
procedures and argumentation without underlying R code. By contrast I
will use only those code lines that are essential to display Bayesian
results. Therefore I will not replicate for example the very extensive
explication how to produce with `tidyverse` means the graphics of the
garden of forking data.
## The Garden of Forking Data {#sec-chap02-forking-data}
### ORIGINAL
**Bayesian inference is counting of possibilities**
> "Bayesian inference is really just counting and comparing of
> possibilities." ([McElreath, 2020, p.
> 20](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=39&annotation=JYNC25AE))
> "In order to make good inference about what actually happened, it
> helps to consider everything that could have happened. A Bayesian
> analysis is a garden of forking data, in which alternative sequences
> of events are cultivated. As we learn about what did happen, some of
> these alternative sequences are pruned. In the end, what remains is
> only what is logically consistent with our knowledge." ([McElreath,
> 2020, p. 21](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=40&annotation=Q2R6DERJ))
#### Counting possibilities
> "Suppose there's a bag, and it contains four marbles. These marbles
> come in two colors: blue and white. We know there are four marbles in
> the bag, but we don't know how many are of each color. We do know that
> there are five possibilities: (1) \[⚪⚪⚪⚪\], (2) \[⚫⚪⚪⚪\],
> (3)\[⚫⚫⚪⚪\], (4) \[⚫⚫⚫⚪\], (5) \[⚫⚫⚫⚫\]. These are t These
> These are the only possibilities consistent with what we know about
> the contents of the bag. Call these five possibilities the
> *conjectures*.
>
> Our goal is to figure out which of these conjectures is most
> plausible, given some evidence about the contents of the bag. We do
> have some evidence: A sequence of three marbles is pulled from the
> bag, one at a time, replacing the marble each time and shaking the bag
> before drawing another marble. The sequence that emerges is: ⚫⚪⚫,
> in that order. These are the data.
>
> So now let's plant the garden and see how to use the data to infer
> what's in the bag. Let's begin by considering just the single
> conjecture, \[ \], that the bag contains one blue and three white
> marbles." ([McElreath, 2020, p.
> 21](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=40&annotation=RRD6HU9Y))
| | |
|--------------|------------------------|
| Conjecture | Ways to produce ⚫⚪⚫ |
| \[⚪⚪⚪⚪\] | 0 × 4 × 0 = 0 |
| \[⚫⚪⚪⚪\] | 1 × 3 × 1 = 3 |
| \[⚫⚫⚪⚪\] | 2 × 2 × 2 = 8 |
| \[⚫⚫⚫⚪\] | 3 × 1 × 3 = 9 |
| \[⚫⚫⚫⚫\] | 4 × 0 × 4 = 0 |
I have bypassed the counting procedure related with the step-by-step
visualization of the garden of forking data. It is important to
understand that the multiplication in the above table is still a
summarized counting:
::: my-important
::: my-important-header
Multiplication is just a shortcut for counting
:::
::: my-important-container
> "Notice that the number of ways to produce the data, for each
> conjecture, can be computed by first counting the number of paths in
> each"ring" of the garden and then by multiplying these counts
> together." ([McElreath, 2020, p.
> 23](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=42&annotation=R9UIIW9R))
> "Multiplication is just a shortcut to enumerating and counting up all
> of the paths through the garden that could produce all the
> observations." ([McElreath, 2020, p.
> 25](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=44&annotation=KUC4ZHP4))
> "Multiplication is just compressed counting." ([McElreath, 2020, p.
> 37](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=56&annotation=IH6782AE))
:::
:::
The multiplication in the table above has to be interpreted the
following way:
1. The possibility of the conjecture that the bag contains four white
marbles is zero because the result shows also black marbles. This is
the other way around for the last conjecture of four blue/black
marbles.
2. The possibility of the conjecture that the bag contains one black
and three white marbles is calculated the following way: The first
marble of the result is black and --- according to our conjecture
--- there is only one way (=1) to produce this black marble. The
next marble we have drawn is white. This is consistent with three
(=3) different ways( marbles) of our conjecture. The last drawn
marble is again black which corresponds again with just one way
(possibility) following our conjecture. So we get as result of the
garden of forking data: `1 x 3 x 1`.
3. The calculation of the other conjectures follows the same pattern.
#### Combining Other Information
> "We may have additional information about the relative plausibility of
> each conjecture. This information could arise from knowledge of how
> the contents of the bag were generated. It could also arise from
> previous data. Whatever the source, it would help to have a way to
> combine different sources of information to update the plausibilities.
> Luckily there is a natural solution: Just multiply the counts.
>
> To grasp this solution, suppose we're willing to say each conjecture
> is equally plausible at the start. So we just compare the counts of
> ways in which each conjecture is compatible with the observed data.
> This comparison suggests that \[⚫⚫⚫⚪\] is slightly more plausible
> than \[⚫⚫⚪⚪\], and both are about three times more plausible than
> \[⚫⚪⚪⚪\]. Since these are our initial counts, and we are going to
> update them next, let's label them *prior*." ([McElreath, 2020, p.
> 25](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=44&annotation=WECXDWEV))
::: my-procedure
::: my-procedure-header
::: {#prp-bayesian-updating}
: Bayesian Updating
:::
:::
::: my-procedure-container
1. First we count the numbers of ways each conjecture could produce the
new observation, for instance drawing a blue marble.
2. Then we multiply each of these new counts by the prior numbers of
ways for each conjecture.
:::
:::
| | | | | | | | | |
|--------|--------|--------|--------|--------|--------|--------|--------|--------|
| | | | | | | | | |
| | Conjecture | | Ways to produce ⚫ | | Prior counts | | New count | |
| | \[⚪⚪⚪⚪\] | | 0 | | 0 | | 0 × 0 = 0 | |
| | \[⚫⚪⚪⚪\] | | 1 | | 3 | | 3 × 1 = 3 | |
| | \[⚫⚫⚪⚪\] | | 2 | | 8 | | 8 × 2 = 16 | |
| | \[⚫⚫⚫⚪\] | | 3 | | 9 | | 9 × 3 = 27 | |
| | \[⚫⚫⚫⚫\] | | 4 | | 0 | | 0 × 4 = 0 | |
::: callout-caution
In the book the table header "Ways to produce" includes ⚪ instead of
--- as I think is correct --- ⚫.
:::
#### From Counts to Probability
::: my-important
::: my-important-header
Principle of honest ignorance (McElreath)
:::
::: my-important-container
> "When we don't know what caused the data, potential causes that may
> produce the data in more ways are more plausible." ([McElreath, 2020,
> p. 26](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=45&annotation=9EKVZNHR))
:::
:::
Two reasons for using probabilities instead of counts:
1. Only relative value matters.
2. Counts will fast grow very large and difficult to manipulate.
$$
\begin{align*}
\text{plausibility of p after } D_{New} = \frac{\text{ways p can produce }D_{New} \times \text{prior plausibility p}}{\text{sum of product}}
\end{align*}
$$
::: my-example
::: my-example-header
Constructing plausibilities by standardizing
:::
::: my-example-container
| Possible composition | p | Ways to produce data | Plausibility |
|----------------------|------|----------------------|--------------|
| \[⚪⚪⚪⚪\] | 0 | 0 | 0 |
| \[⚫⚪⚪⚪\] | 0.25 | 3 | 0.15 |
| \[⚫⚫⚪⚪\] | 0.5 | 8 | 0.40 |
| \[⚫⚫⚫⚪\] | 0.75 | 9 | 0.45 |
| \[⚫⚫⚫⚫\] | 1 | 0 | 0 |
:::
:::
::: my-r-code
::: my-r-code-header
::: {#cnj-2-1}
: Compute these plausibilities in R
:::
:::
::: my-r-code-container
```{r}
#| label: code-2-1
## R code 2.1 #############
ways <- c(0, 3, 8, 9, 0)
ways / sum(ways)
```
:::
:::
> "These plausibilities are also *probabilities*---they are non-negative
> (zero or positive) real numbers that sum to one. And all of the
> mathematical things you can do with probabilities you can also do with
> these values. Specifically, each piece of the calculation has a direct
> partner in applied probability theory. These partners have stereotyped
> names, so it's worth learning them, as you'll see them again and
> again.
>
> - A conjectured proportion of blue marbles, p, is usually called a
> `r glossary("parameter")` value. It's just a way of indexing
> possible explanations of the data.
> - The relative number of ways that a value p can produce the data is
> usually called a `r glossary("likelihood")`. It is derived by
> enumerating all the possible data sequences that could have
> happened and then eliminating those sequences inconsistent with
> the data.
> - The prior plausibility of any specific p is usually called the
> `r glossary("prior probability")`.
> - The new, updated plausibility of any specific p is usually called
> the `r glossary("posterior probability")`." ([McElreath, 2020, p.
> 27](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=46&annotation=ZDYLMTTI))
### TIDYVERSE (empty)
## Building a Model
### ORIGINAL
We are going to use a toy example, but it has the same structure as a
typical statistical analyses. The first nine samples produce the
following data:
`W L W W W L W L W` (W indicates water and L indicates land.)
::: my-procedure
::: my-procedure-header
::: {#prp-designing-bayesian-model}
: Designing a simple Bayesian model
:::
:::
::: my-procedure-container
> "Designing a simple Bayesian model benefits from a design loop with
> three steps.
>
> 1. Data story: Motivate the model by narrating how the data might
> arise.
> 2. Update: Educate your model by feeding it the data.
> 3. Evaluate: All statistical models require supervision, leading to
> model revision." ([McElreath, 2020, p.
> 28](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=47&annotation=KFNYKUC2))
:::
:::
#### A Data Story
> "Bayesian data analysis usually means producing a story for how the
> data came to be. This story may be *descriptive*, specifying
> associations that can be used to predict outcomes, given observations.
> Or it may be *causal*, a theory of how some events produce other
> events." ([McElreath, 2020, p.
> 28](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=47&annotation=D2IR5ZX3))
> "... all data stories are complete, in the sense that they are
> sufficient for specifying an algorithm for simulating new data."
> ([McElreath, 2020, p.
> 28](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=47&annotation=A3EQXFBY))
::: my-example
::: my-example-header
Data story for our case
:::
::: my-example-container
> "The data story in this case is simply a restatement of the sampling
> process:
>
> (1) The true proportion of water covering the globe is $p$.
> (2) A single toss of the globe has a probability p of producing a
> water ($W$) observation. It has a probability $1 − p$ of producing
> a land ($L$) observation.
> (3) Each toss of the globe is independent of the others." ([McElreath,
> 2020, p. 29](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=48&annotation=YPPK73D9))
:::
:::
Data stories are important:
> "Most data stories are much more specific than are the verbal
> hypotheses that inspire data collection. Hypotheses can be vague, such
> as"it's more likely to rain on warm days." When you are forced to
> consider sampling and measurement and make a precise statement of how
> temperature predicts rain, many stories and resulting models will be
> consistent with the same vague hypothesis. Resolving that ambiguity
> often leads to important realizations and model revisions, before any
> model is fit to data." ([McElreath, 2020, p.
> 29](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=48&annotation=852Q3WN5))
#### Bayesian Updating
> "Each possible proportion may be more or less plausible, given the
> evidence. A Bayesian model begins with one set of plausibilities
> assigned to each of these possibilities. These are the prior
> plausibilities. Then it updates them in light of the data, to produce
> the posterior plausibilities. This updating process is a kind of
> learning, called `r glossary("Bayesian updating")`." ([McElreath,
> 2020, p. 29](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=48&annotation=XIYIZHVD))
::: my-important
::: my-important-header
How a Bayesian model learns
:::
::: my-important-container
@fig-2-5-book-copy helps to understand the Bayesian updating process. In
@cnj-fig-bayesian-update we will learn how to write R code to reproduce
the book's figure as @fig-bayesian-update. To inspect the different
steps of the updating process is essential to understand Bayesian
statistics!
:::
:::
![Copy of Figure 2.5: **How a Bayesian model learns**. In each plot,
previous plausibilities (dashed curve) are updated in light of the
latest observation to produce a new set of plausibilities (solid
curve).](img/bayesian_model_learns_step_by_step-min.png){#fig-2-5-book-copy
fig-alt="Nine small diagrams to show the relationship between plausibility against proportion of water after each sample."}
#### Evaluate
Keep in mind two cautious principles:
> "First, the model's certainty is no guarantee that the model is a good
> one." ([McElreath, 2020, p.
> 31](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=50&annotation=QYALLSPY))
> "Second, it is important to supervise and critique your model's work."
> ([McElreath, 2020, p.
> 31](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=50&annotation=M7UKAL3D))
::: my-watch-out
::: my-watch-out-header
Test Before You Est(imate)
:::
::: my-watch-out-container
In the planned third version of the book McElreath wants to include from
the beginning the evaluation part of the process. It is mentioned in the
book already in this chapter 2 but without practical implementation and
code examples. But we can find some remarks in his [Statistical
Rethinking Videos 2023b
37:40](https://www.youtube.com/watch?v=R1vcdhPBlXA&list=PLDcUM9US4XdPz-KxHM4XHt7uUVGWWVSus&index=2&t=37m40s).
I will not go into details here, because my focus is on the second
edition. But I will add as a kind of summary general advises for the
testing procedure:
::: my-procedure
::: my-procedure-header
<div>
: Model evaluation
</div>
:::
::: my-procedure-container
1. Code a generative simulation
2. Code an estimator
3. Test the estimator with (1)
- where the answer is known
- at extreme values
- explore different sampling designs
- develop generally an intuition for sampling and estimation
:::
:::
Here are some references to get some ideas of the necessary R Code:
::: my-resource
::: my-resource-header
R Code snippets for testing procedure
:::
::: my-resource-container
- **Simulating globe tossing**: Statistical Rethinking 2023 - Lecture
02, [Slide
54](https://speakerdeck.com/rmcelreath/statistical-rethinking-2023-lecture-02?slide=54)
- **Simulate the experiment arbitrary times for any particular
proportion**: This is a way to explore the design of an experiment
as well as debug the code. ([Video 2023-02
39:55](https://www.youtube.com/watch?v=R1vcdhPBlXA&list=PLDcUM9US4XdPz-KxHM4XHt7uUVGWWVSus&index=2&t=39m55s))
- **Test the simulation at extreme values**: R code snippets at [Slide
57](https://speakerdeck.com/rmcelreath/statistical-rethinking-2023-lecture-02?slide=57).
:::
:::
:::
:::
### TIDYVERSE
Instead of vectors the tidyverse approach works best with data frames
respectively tibbles. So let's save the globe-tossing data
`W L W W W L W L W` into a tibble:
::: my-r-code
::: my-r-code-header
::: {#cnj-globe-tossing-data}
: Save globe tossing data into a tibble
:::
:::
::: my-r-code-container
```{r}
#| label: globe-tossing-data
(d <- tibble::tibble(toss = c("w", "l", "w", "w", "w", "l", "w", "l", "w")))
```
:::
:::
#### A Data Story
#### Bayesian Updating {#sec-expand_grid}
For the updating process we need to add to the data the cumulative
number of trials, `n_trials`, and the cumulative number of successes,
`n_successes` with `toss == "w"`.
::: my-r-code
::: my-r-code-header
::: {#cnj-bayesian-updating-start}
: Bayesian updating preparation
:::
:::
::: my-r-code-container
```{r}
#| label: bayesian-updating-start
(
d <-
d |>
dplyr::mutate(n_trials = 1:9,
n_success = cumsum(toss == "w"))
)
```
:::
:::
The program code for reproducing the Figure 2.5 of the book (here in
this document it is @fig-2-5-book-copy) is pretty complex. I have to
inspect the results line by line. At first I will give a short
introduction what each line does. In the next steps I will explain each
step more in detail and show the result of the corresponding lines of
code.
::: my-watch-out
::: my-watch-out-header
Parameter $k$ in `lag()` changed to $default$
:::
::: my-watch-out-container
In the following listing I had to change in the `lag()` function the
parameter $k$ of the Kurz'sche version to $default$ as it is described
in the corresponding [help
file](https://dplyr.tidyverse.org/reference/lead-lag.html). I don't
understand why $k$ was used. Maybe $k$ was the name of the parameter of
a previous {**dplyr**} version?
:::
:::
::: my-r-code
::: my-r-code-header
::: {#cnj-fig-bayesian-update}
: Bayesian updating: How a model learns
:::
:::
::: my-r-code-container
```{r}
#| label: fig-bayesian-update
#| echo: true
#| fig-cap: "How a Bayesian model learns: Replicating book figure 2.5"
#| code-summary: "Code: **How a Bayesian model learns**"
## (0) starting with tibble from the previous two code chunks #####
d <- tibble::tibble(toss = c("w", "l", "w", "w", "w", "l", "w", "l", "w")) |>
dplyr::mutate(n_trials = 1:9, n_success = cumsum(toss == "w"))
## (1) create tibble from all input combinations ################
sequence_length <- 50
d |>
tidyr::expand_grid(p_water = seq(from = 0, to = 1,
length.out = sequence_length)) |>
## (2) group data by the `p-water` parameter ####################
dplyr::group_by(p_water) |>
## (3) create columns filled with the value of the previous rows #####
dplyr::mutate(lagged_n_trials = dplyr::lag(n_trials, default = 1),
lagged_n_success = dplyr::lag(n_success, default = 1)) |>
## (4) restore the original ungrouped data structure #######
dplyr::ungroup() |>
## (5) calculate prior and likelihood & store values ######
dplyr::mutate(prior = ifelse(n_trials == 1, .5,
dbinom(x = lagged_n_success,
size = lagged_n_trials,
prob = p_water)),
likelihood = dbinom(x = n_success,
size = n_trials,
prob = p_water),
strip = stringr::str_c("n = ", n_trials)) |>
## (6) normalize prior and likelihood ##########################
dplyr::group_by(n_trials) |>
dplyr::mutate(prior = prior / sum(prior),
likelihood = likelihood / sum(likelihood)) |>
## (7) plot the result ########################################
ggplot2::ggplot(ggplot2::aes(x = p_water)) +
ggplot2::geom_line(ggplot2::aes(y = prior), linetype = 2) +
ggplot2::geom_line(ggplot2::aes(y = likelihood)) +
ggplot2::scale_x_continuous("proportion water",
breaks = c(0, .5, 1)) +
ggplot2::scale_y_continuous("plausibility", breaks = NULL) +
ggplot2::theme(panel.grid = ggplot2::element_blank()) +
ggplot2::facet_wrap(~ strip, scales = "free_y") +
ggplot2::theme_bw()
```
:::
:::
##### Annotation (1): `tidyr::expand_grid()` {#sec-annotation-1-expand-grid}
`tidyr::expand_grid()` creates a tibble from all combinations of inputs.
Input are generalized vectors in contrast to `tidyr::expand()` that
generates all combination of variables as well but needs as input a
dataset. The range between 0 and 1 is divided into 50 part and then it
generates all combinations by varying all columns from left to right.
The first column is the slowest, the second is faster and so on.). It
generates 450 data points ($50 \times 9$ trials).
::: my-r-code
::: my-r-code-header
::: {#cnj-bayesian-update-1}
: Bayesian Update: Create tibble with all Input combinations
:::
:::
::: my-r-code-container
```{r}
#| label: bayesian-update-1
tbl <- tibble::tibble(toss = c("w", "l", "w", "w", "w", "l", "w", "l", "w")) |>
dplyr::mutate(n_trials = 1:9, n_success = cumsum(toss == "w"))
sequence_length <- 50
tbl |>
tidyr::expand_grid(p_water = seq(from = 0, to = 1,
length.out = sequence_length))
```
:::
:::
##### Annotation (2): `dplyr::group_by()` {#sec-annotation-2-group_by}
At first I did not understand the line `group_by(p_water)`. Why has the
data to be grouped when every row has a different value --- as I have
thought from a cursory inspection of the result? But it turned out that
after 50 records the parameter `p_water` is repeating its value.
::: my-r-code
::: my-r-code-header
::: {#cnj-bayesian-update-2}
: Bayesian Updating: Lagged with grouping
:::
:::
::: my-r-code-container
```{r}
#| label: bayesian-update-2
tbl1 <- tbl |>
tidyr::expand_grid(p_water = seq(from = 0, to = 1,
length.out = sequence_length)) |>
dplyr::group_by(p_water) |>
dplyr::mutate(lagged_n_trials = dplyr::lag(n_trials, default = 1),
lagged_n_success = dplyr::lag(n_success, default = 1)) |>
dplyr::ungroup() |>
# add new column ID with row numbers
# and relocate it to be the first column
dplyr::mutate(ID = dplyr::row_number()) |>
dplyr::relocate(ID, .before = toss)
# show 2 records from different groups
tbl1[c(1:2, 50:52, 100:102, 150:152), ]
```
:::
:::
I want to see the differences in detail. So I will provide also the
ungrouped version.
::: my-r-code
::: my-r-code-header
::: {#cnj-bayesian-update-2a}
: Bayesian Updating: Lagged without grouping
:::
:::
::: my-r-code-container
```{r}
#| label: bayesian-update-2a
tbl2 <- tbl |>
tidyr::expand_grid(p_water = seq(from = 0, to = 1,
length.out = sequence_length)) |>
dplyr::mutate(lagged_n_trials = dplyr::lag(n_trials, default = 1),
lagged_n_success = dplyr::lag(n_success, default = 1)) |>
# add new column ID with row numbers and relocate it as first column
dplyr::mutate(ID = dplyr::row_number()) |>
dplyr::relocate(ID, .before = toss)
# show the same records without grouping
tbl2[c(1:2, 50:52, 100:102, 150:152), ]
```
:::
:::
It turned out that the two version differ after 51 records in the lagged
variables. (Not after 50 as I would have assumed. Apparently this has to
do with the `lag()` command because the first 51 records are identical.
Beginning with row number 52 there are differences in the column
`lagged_n_trials` and `lagged_n_success`. This pattern is repeated:
Original version always changes after 100 records. The version without
grouping changes after 50 rows starting with row 51.
##### Annotation (3): `dplyr::lag()` {#sec-annotation-3-lag}
The function `dplyr::lag()` finds the "previous" values in a vector
(time series). This is useful for comparing values behind of the current
values. See [Compute lagged or leading
values](https://dplyr.tidyverse.org/reference/lead-lag.html).
We need to get the immediately previous values for drawing the prior
probabilities in the current graph (= dashed line or `linetype = 2` in
{**ggplot2**} parlance). In the relation with the posterior
probabilities the difference form the prior possibility is always $1$
(this is the option `default = 1` in the `dplyr::lag()` function. This
is now the correct explanation for the differences starting after rows
51 (and not 50).
##### Annotation (4): `dplyr::ungroup()` {#sec-annotation-4-ungroup}
This is just the reversion of the grouping command `group_by(p_water)`.
##### Annotation (5): `dbinom()` {#sec-annotation-5-dbinom}
This is the core of the prior and likelihood calculation. It uses
`base::dbinom()`, to calculate two alternative events. `dbinom()` is the
R function for the binomial distribution, a distribution provided by the
probability theory for "coin tossing" problems.
::: my-resource
::: my-resource-header
Exploring the `dbinom()` function
:::
::: my-resource-container
The [distribution zoo](https://ben18785.shinyapps.io/distribution-zoo/)
is a nice resource to explore the shapes and properties of important
Bayesian distributions. This interactive {**shiny**} app was developed
by [Ben Lambert](https://ben-lambert.com/bayesian/) and [Fergus
Cooper](https://www.cs.ox.ac.uk/people/fergus.cooper/site/).
Choose "Discrete Univariate" as the "Category of Distribution" and then
select as "Binomial" as the "Distribution Type".
See also the two blog entries [An Introduction to the Binomial
Distribution](https://www.statology.org/binomial-distribution/) and [A
Guide to dbinom, pbinom, qbinom, and rbinom in
R](https://www.statology.org/dbinom-pbinom-qbinom-rbinom-in-r/) of the
Statology website.
:::
:::
The "`d`" in `dbinom()` stands for *density*. Functions named in this
way almost always have corresponding partners that begin with "`r`" for
random samples and that begin with "`p`" for cumulative probabilities.
See for example the [help
file](https://stat.ethz.ch/R-manual/R-devel/library/stats/html/Binomial.html).
The results of each of the different calculation (prior and likelihood)
are collected with `dplyr::mutate()` into two new generated columns.
There is no prior for the first trial, so it is assumed that it is 0.5.
The formula for the binomial distribution uses for the prior the
lagged-version whereas the likelihood uses the current version. These
two lines provide the essential calculations: They match the 50 grid
points as assumed water probabilities of every trial to their trial
outcome (`W` or `L`) probabilities.
The last `dplyr::mutate()` command generates the $strip$ variable
consisting of the prefix $n =$ followed by the counts of the number of
trials. This will later provide the title for the the different facets
of the plot.
::: my-checklist
::: my-checklist-header
To-do: Construct labelling specification for toss results
:::
::: my-checklist-container
To get a better replication I would need to change the labels for the
facets from `n = n_trials` to the appropriate string length of the
results, e.g., for $n = 3$ I would need $W, L, W$.
This was not done by Kurz. I have tried it but didn't succeed up to now
(2023-10-25).
:::
:::
::: my-r-code
::: my-r-code-header
::: {#cnj-bayesian.update-5}
: Bayesian Updating: Calculating prior and likelihood
:::
:::
::: my-r-code-container
```{r}
#| label: bayesian-update-5
tbl5 <- tbl1 |>
dplyr::mutate(prior = ifelse(n_trials == 1, .5,
dbinom(x = lagged_n_success,
size = lagged_n_trials,
prob = p_water)),
likelihood = dbinom(x = n_success,
size = n_trials,
prob = p_water),
strip = stringr::str_c("n = ", n_trials))
tbl5
```
:::
:::
##### Annotation (6): Normalizing {#sec-annotation-6-normalize}
The code lines in annotation 6 normalize the prior and the likelihood by
grouping the data by `n\_trials`. Dividing every prior and likelihood
values by their respective sum puts them both in a probability metric.
This metric is important for the comparisons of different probabilities.
> If you don't normalize (i.e., divide the density by the sum of the
> density), their respective heights don't match up with those in the
> text. Furthermore, it's the normalization that makes them directly
> comparable. (Kurz in section [Bayesian
> Updating](https://bookdown.org/content/4857/small-worlds-and-large-worlds.html#bayesian-updating.)
> of chapter 2)
::: my-r-code
::: my-r-code-header
::: {#cnj-bayesian-update-6}
: Bayesian Updating: Normalizing prior and likelihood
:::
:::
::: my-r-code-container
```{r}
#| label: bayesian-update-6
tbl6 <- tbl5 |>
dplyr::group_by(n_trials) |>
dplyr::mutate(prior = prior / sum(prior),
likelihood = likelihood / sum(likelihood))
tbl6
```
:::
:::
##### Annotation (7): Graphical demonstration of Bayesian updating {#sec-annotation-7-ggplot}
The remainder of the code prepares the plot by using the 50 grid points
in the range from $0$ to $1$ as the x-axis; prior and likelihood as
y-axis. To distinguish the prior from the likelihood it uses a dashed
line for the prior (`linetyp = 2`) and a full line (default) for the
likelihood. The x-axis has three breaks ($0, 0.5, 1$) whereas the y-axis
has no break and no scale (`scales = "free_y"`).
::: my-r-code
::: my-r-code-header
::: {#cnj-bayesian-update-7}
: Bayesian updating: Graphical demonstration
:::
:::
::: my-r-code-container
```{r}
#| label: fig-bayesian-update-7
#| fig-cap: "Graphical demonstration: 9 steps of Bayesian updating"
tbl6 |>
ggplot2::ggplot(ggplot2::aes(x = p_water)) +
ggplot2::geom_line(ggplot2::aes(y = prior),
linetype = 2) +
ggplot2::geom_line(ggplot2::aes(y = likelihood)) +
ggplot2::scale_x_continuous("proportion water", breaks = c(0, .5, 1)) +
ggplot2::scale_y_continuous("plausibility", breaks = NULL) +
ggplot2::theme(panel.grid = ggplot2::element_blank()) +
ggplot2::theme_bw() +
ggplot2::facet_wrap(~ strip, scales = "free_y")
```
:::
:::
## Components of the Model
### ORIGINAL
We observed three components of the model:
> "(1) The number of ways each conjecture could produce an observation
> (2) The accumulated number of ways each conjecture could produce the
> entire data (3) The initial plausibility of each conjectured cause of
> the data" ([McElreath, 2020, p.
> 32](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=51&annotation=R4M54D6K))
#### Variables
> "Variables are just symbols that can take on different values. In a
> scientific context, variables include things we wish to infer, such as
> proportions and rates, as well as things we might observe, the data.
> ... Unobserved variables are usually called
> `r glossary("parameter", "parameters")`." ([McElreath, 2020, p.
> 32](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=51&annotation=C7C4PCB6))
Take as example the globe tossing models: There are three variables: `W`
and `L` (water or land) and the proportion of water and land `p`. We
observe the events of water or land but we calculate (do not observe
directly) the proportion of water and land. So `p` is a parameter as
defined above.
#### Definitions
> "Once we have the variables listed, we then have to define each of
> them. In defining each, we build a model that relates the variables to
> one another. Remember, the goal is to count all the ways the data
> could arise, given the assumptions. This means, as in the globe
> tossing model, that for each possible value of the unobserved
> variables, such as $p$, we need to define the relative number of
> ways---the probability---that the values of each observed variable
> could arise. And then for each unobserved variable, we need to define
> the prior plausibility of each value it could take." ([McElreath,
> 2020, p. 33](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=52&annotation=EWFRB9S7))
::: my-note
::: my-note-header
::: {#cor-expand-grid}
: Calculate the prior plausibility of the values of each observed
variable in R
:::
:::
::: my-note-container
There are different functions in R that generate all combination of
variables supplied by vectors, factors or data columns:
`base::expand.grid()` or in the tidyverse `tidyr::expand()`,
`tidyr::expand_grid()` and `tidyr::crossing()`. These functions can be
used to calculate and generate the relative number of ways (= the
probability) that the values of each observed variable could arise. We
have seen a first demonstration in @cnj-bayesian-update-1. We will see
many more examples in later sections and chapters.
:::
:::
##### Observed Variables
> "For the count of water W and land L, we define how plausible any
> combination of W and L would be, for a specific value of p. This is
> very much like the marble counting we did earlier in the chapter. Each
> specific value of p corresponds to a specific plausibility of the
> data, as in @fig-bayesian-update-7.
>
> So that we don't have to literally count, we can use a mathematical
> function that tells us the right plausibility. In conventional
> statistics, a distribution function assigned to an observed variable
> is usually called a `r glossary("likelihood")`. That term has special
> meaning in nonBayesian statistics, however." ([McElreath, 2020, p.
> 33](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=52&annotation=TLDIVS8K))
For our globe tossing procedure we use instead of counting a
mathematical function to calculate the probability of all combinations.
> "In this case, once we add our assumptions that (1) every toss is
> independent of the other tosses and (2) the probability of W is the
> same on every toss, probability theory provides a unique answer, known
> as the `r glossary("binomial distribution")`. This is the common"coin
> tossing" distribution." ([McElreath, 2020, p.
> 33](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=52&annotation=E5HEDDW4))
::: my-resource
::: my-resource-container
See also @sec-annotation-5-dbinom for a resource to explore the binomial