forked from rstudio-education/hopr
-
Notifications
You must be signed in to change notification settings - Fork 42
/
loops.qmd
640 lines (464 loc) · 27.3 KB
/
loops.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
# Loops {#sec-loops}
Loops are R's method for repeating a task, which makes them a useful tool for programming simulations. This chapter will teach you how to use R's loop tools.
Let's use the `score` function to solve a real-world problem.
Your slot machine is modeled after real machines that were accused of fraud. The machines appeared to pay out 40 cents on the dollar, but the manufacturer claimed that they paid out 92 cents on the dollar. You can calculate the exact payout rate of your machine with the `score` program. The payout rate will be the expected value of the slot machine's prize.
## Expected Values
The expected value of a random event is a type of weighted average; it is the sum of each possible outcome of the event, weighted by the probability that each outcome occurs:
$$
E(x) = \sum_{i = 1}^{n}\left( x_{i} \cdot P(x_{i}) \right)
$$
You can think of the expected value as the average prize that you would observe if you played the slot machine an infinite number of times. Let's use the formula to calculate some simple expected values. Then we will apply the formula to your slot machine.
Do you remember the `die` you created in [Project 1: Weighted Dice](#sec-project-dice)?
``` r
die <- c(1, 2, 3, 4, 5, 6)
```
Each time you roll the die, it returns a value selected at random (one through six). You can find the expected value of rolling the die with the formula:
$$
E(\text{die}) = \sum_{i = 1}^{n}\left( \text{die}_{i} \cdot P(\text{die}_{i}) \right)
$$
The $\text{die}_{i}$s are the possible outcomes of rolling the die: 1, 2, 3, 4, 5, and 6; and the $P(\text{die}_{i})$'s are the probabilities associated with each of the outcomes. If your die is fair, each outcome will occur with the same probability: 1/6. So our equation simplifies to:
$$
\begin{array}{rl}
E(\text{die}) & = \sum_{i = 1}^{n}\left( \text{die}_{i} \cdot P(\text{die}_{i}) \right)\\
& = 1 \cdot \frac{1}{6} + 2 \cdot \frac{1}{6} + 3 \cdot \frac{1}{6} + 4 \cdot \frac{1}{6} + 5 \cdot \frac{1}{6} + 6 \cdot \frac{1}{6}\\
& = 3.5\\
\end{array}
$$
Hence, the expected value of rolling a fair die is 3.5. You may notice that this is also the average value of the die. The expected value will equal the average if every outcome has the same chance of occurring.
But what if each outcome has a different chance of occurring? For example, we weighted our dice in [Packages and Help Pages](#sec-packages-and-help) so that each die rolled 1, 2, 3, 4, and 5 with probability 1/8 and 6 with probability 3/8. You can use the same formula to calculate the expected value in these conditions:
$$
\begin{array}{rl}
E(die) & = 1 \cdot \frac{1}{8} + 2 \cdot \frac{1}{8} + 3 \cdot \frac{1}{8} + 4 \cdot \frac{1}{8} + 5 \cdot \frac{1}{8} + 6 \cdot \frac{3}{8}\\
& = 4.125\\
\end{array}
$$
Hence, the expected value of a loaded die does not equal the average value of its outcomes. If you rolled a loaded die an infinite number of times, the average outcome would be 4.125, which is higher than what you would expect from a fair die.
Notice that we did the same three things to calculate both of these expected values. We have:
- Listed out all of the possible outcomes
- Determined the *value* of each outcome (here just the value of the die)
- Calculated the probability that each outcome occurred
The expected value was then just the sum of the values in step 2 multiplied by the probabilities in step 3.
You can use these steps to calculate more sophisticated expected values. For example, you could calculate the expected value of rolling a pair of weighted dice. Let's do this step by step.
First, list out all of the possible outcomes. A total of 36 different outcomes can appear when you roll two dice. For example, you might roll (1, 1), which notates one on the first die and one on the second die. Or, you may roll (1, 2), one on the first die and two on the second. And so on. Listing out these combinations can be tedious, but R has a function that can help.
## expand.grid
The `expand.grid` function in R provides a quick way to write out every combination of the elements in *n* vectors. For example, you can list every combination of two dice. To do so, run `expand.grid` on two copies of `die`:
``` r
rolls <- expand.grid(die, die)
```
`expand.grid` will return a data frame that contains every way to pair an element from the first `die` vector with an element from the second `die` vector. This will capture all 36 possible combinations of values:
``` r
rolls
## Var1 Var2
## 1 1 1
## 2 2 1
## 3 3 1
## ...
## 34 4 6
## 35 5 6
## 36 6 6
```
You can use `expand.grid` with more than two vectors if you like. For example, you could list every combination of rolling three dice with `expand.grid(die, die, die)` and every combination of rolling four dice with `expand.grid(die, die, die, die)`, and so on. `expand.grid` will always return a data frame that contains each possible combination of *n* elements from the *n* vectors. Each combination will contain exactly one element from each vector.
You can determine the value of each roll once you've made your list of outcomes. This will be the sum of the two dice, which you can calculate using R's element-wise execution:
``` r
rolls$value <- rolls$Var1 + rolls$Var2
head(rolls, 3)
## Var1 Var2 value
## 1 1 2
## 2 1 3
## 3 1 4
```
R will match up the elements in each vector before adding them together. As a result, each element of `value` will refer to the elements of `Var1` and `Var2` that appear in the same row.
Next, you must determine the probability that each combination appears. You can calculate this with a basic rule of probability:
*The probability that* n *independent, random events all occur is equal to the product of the probabilities that each random event occurs*.
Or more succinctly:
$$
P(A \& B \& C \& ...) = P(A) \cdot P(B) \cdot P(C) \cdot ...
$$
So the probability that we roll a (1, 1) will be equal to the probability that we roll a one on the first die, 1/8, times the probability that we roll a one on the second die, 1/8:
$$
\begin{array}{rl}
P(1 \& 1) & = P(1) \cdot P(1) \\
& = \frac{1}{8} \cdot \frac{1}{8}\\
& = \frac{1}{64}
\end{array}
$$
And the probability that we roll a (1, 2) will be:
$$
\begin{array}{rl}
P(1 \& 2) & = P(1) \cdot P(2) \\
& = \frac{1}{8} \cdot \frac{1}{8}\\
& = \frac{1}{64}
\end{array}
$$
And so on.
Let me suggest a three-step process for calculating these probabilities in R. First, we can look up the probabilities of rolling the values in `Var1`. We'll do this with the lookup table that follows:
``` r
prob <- c("1" = 1/8, "2" = 1/8, "3" = 1/8, "4" = 1/8, "5" = 1/8, "6" = 3/8)
prob
## 1 2 3 4 5 6
## 0.125 0.125 0.125 0.125 0.125 0.375
```
If you subset this table by `rolls$Var1`, you will get a vector of probabilities perfectly keyed to the values of `Var1`:
``` r
rolls$Var1
## 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6
prob[rolls$Var1]
## 1 2 3 4 5 6 1 2 3 4 5 6
## 0.125 0.125 0.125 0.125 0.125 0.375 0.125 0.125 0.125 0.125 0.125 0.375
## 1 2 3 4 5 6 1 2 3 4 5 6
## 0.125 0.125 0.125 0.125 0.125 0.375 0.125 0.125 0.125 0.125 0.125 0.375
## 1 2 3 4 5 6 1 2 3 4 5 6
## 0.125 0.125 0.125 0.125 0.125 0.375 0.125 0.125 0.125 0.125 0.125 0.375
rolls$prob1 <- prob[rolls$Var1]
head(rolls, 3)
## Var1 Var2 value prob1
## 1 1 2 0.125
## 2 1 3 0.125
## 3 1 4 0.125
```
Second, we can look up the probabilities of rolling the values in `Var2`:
``` r
rolls$prob2 <- prob[rolls$Var2]
head(rolls, 3)
## Var1 Var2 value prob1 prob2
## 1 1 2 0.125 0.125
## 2 1 3 0.125 0.125
## 3 1 4 0.125 0.125
```
Third, we can calculate the probability of rolling each combination by multiplying `prob1` by `prob2`:
``` r
rolls$prob <- rolls$prob1 * rolls$prob2
head(rolls, 3)
## Var1 Var2 value prob1 prob2 prob
## 1 1 2 0.125 0.125 0.015625
## 2 1 3 0.125 0.125 0.015625
## 3 1 4 0.125 0.125 0.015625
```
It is easy to calculate the expected value now that we have each outcome, the value of each outcome, and the probability of each outcome. The expected value will be the summation of the dice values multiplied by the dice probabilities:
``` r
sum(rolls$value * rolls$prob)
## 8.25
```
So the expected value of rolling two loaded dice is 8.25. If you rolled a pair of loaded dice an infinite number of times, the average sum would be 8.25. (If you are curious, the expected value of rolling a pair of fair dice is 7, which explains why 7 plays such a large role in dice games like craps.)
Now that you've warmed up, let's use our method to calculate the expected value of the slot machine prize. We will follow the same steps we just took:
- We will list out every possible outcome of playing the machine. This will be a list of every combination of three slot symbols.
- We will calculate the probability of getting each combination when you play the machine.
- We will determine the prize that we would win for each combination.
When we are finished, we will have a data set that looks like this:
``` r
## Var1 Var2 Var3 prob1 prob2 prob3 prob prize
## DD DD DD 0.03 0.03 0.03 0.000027 800
## 7 DD DD 0.03 0.03 0.03 0.000027 0
## BBB DD DD 0.06 0.03 0.03 0.000054 0
## ... and so on.
```
The expected value will then be the sum of the prizes multiplied by their probability of occuring:
$$
E(\text{prize}) = \sum_{i = 1}^{n}\left( \text{prize}_{i} \cdot P(\text{prize}_{i}) \right)
$$
Ready to begin?
::: callout
## Exercise: List the Combinations
Use `expand.grid` to make a data frame that contains every possible combination of *three* symbols from the `wheel` vector:
``` r
wheel <- c("DD", "7", "BBB", "BB", "B", "C", "0")
```
Be sure to add the argument `stringsAsFactors = FALSE` to your `expand.grid` call; otherwise, `expand.grid` will save the combinations as factors, an unfortunate choice that will disrupt the `score` function.
:::
To create a data frame of each combination of *three* symbols, you need to run `expand.grid` and give it *three* copies of `wheel`. The result will be a data frame with 343 rows, one for each unique combination of three slot symbols:
``` r
combos <- expand.grid(wheel, wheel, wheel, stringsAsFactors = FALSE)
combos
## Var1 Var2 Var3
## 1 DD DD DD
## 2 7 DD DD
## 3 BBB DD DD
## 4 BB DD DD
## 5 B DD DD
## 6 C DD DD
## ...
## 341 B 0 0
## 342 C 0 0
## 343 0 0 0
```
Now, let's calculate the probability of getting each combination. You can use the probabilities contained in the `prob` argument of `get_symbols` to do this. These probabilities determine how frequently each symbol is chosen when your slot machine generates symbols. They were calculated after observing 345 plays of the Manitoba video lottery terminals. Zeroes have the largest chance of being selected (0.52) and cherries the least (0.01):
``` r
get_symbols <- function() {
wheel <- c("DD", "7", "BBB", "BB", "B", "C", "0")
sample(wheel, size = 3, replace = TRUE,
prob = c(0.03, 0.03, 0.06, 0.1, 0.25, 0.01, 0.52)
}
```
::: callout
## Exercise: Make a Lookup Table
Isolate the previous probabilities in a lookup table. What names will you use in your table?
:::
Your names should match the input that you want to look up. In this case, the input will be the character strings that appear in `Var1`, `Var2`, and `Var3`. So your lookup table should look like this:
``` r
prob <- c("DD" = 0.03, "7" = 0.03, "BBB" = 0.06,
"BB" = 0.1, "B" = 0.25, "C" = 0.01, "0" = 0.52)
```
Now let's look up our probabilities.
::: callout
## Exercise: Lookup the Probabilities
Look up the probabilities of getting the values in `Var1`. Then add them to `combos` as a column named `prob1`. Then do the same for `Var2` (`prob2`) and `Var3` (`prob3`).
:::
Remember that you use R's selection notation to look up values in a lookup table. The values that result will be keyed to the index that you use:
``` r
combos$prob1 <- prob[combos$Var1]
combos$prob2 <- prob[combos$Var2]
combos$prob3 <- prob[combos$Var3]
head(combos, 3)
## Var1 Var2 Var3 prob1 prob2 prob3
## DD DD DD 0.03 0.03 0.03
## 7 DD DD 0.03 0.03 0.03
## BBB DD DD 0.06 0.03 0.03
```
Now how should we calculate the total probability of each combination? Our three slot symbols are all chosen independently, which means that the same rule that governed our dice probabilities governs our symbol probabilities:
$$
P(A \& B \& C \& ...) = P(A) \cdot P(B) \cdot P(C) \cdot ...
$$
::: callout
## Exercise: Calculate Probabilities for Each Combination
Calculate the overall probabilities for each combination. Save them as a column named `prob` in `combos`, then check your work.
You can check that the math worked by summing the probabilities. The probabilities should add up to one, because one of the combinations *must* appear when you play the slot machine. In other words, a combination will appear, with probability of one.
:::
You can calculate the probabilities of every possible combination in one fell swoop with some element-wise execution:
``` r
combos$prob <- combos$prob1 * combos$prob2 * combos$prob3
head(combos, 3)
## Var1 Var2 Var3 prob1 prob2 prob3 prob
## DD DD DD 0.03 0.03 0.03 0.000027
## 7 DD DD 0.03 0.03 0.03 0.000027
## BBB DD DD 0.06 0.03 0.03 0.000054
```
The sum of the probabilities is one, which suggests that our math is correct:
``` r
sum(combos$prob)
## 1
```
You only need to do one more thing before you can calculate the expected value: you must determine the prize for each combination in `combos`. You can calculate the prize with `score`. For example, we can calculate the prize for the first row of `combos` like this:
``` r
symbols <- c(combos[1, 1], combos[1, 2], combos[1, 3])
## "DD" "DD" "DD"
score(symbols)
## 800
```
However there are 343 rows, which makes for tedious work if you plan to calculate the scores manually. It will be quicker to automate this task and have R do it for you, which you can do with a `for` loop.
## for Loops
A `for` loop repeats a chunk of code many times, once for each element in a set of input. `for` loops provide a way to tell R, "Do this for every value of that." In R syntax, this looks like:
``` r
for (value in that) {
this
}
```
The `that` object should be a set of objects (often a vector of numbers or character strings). The for loop will run the code in that appears between the braces once for each member of `that`. For example, the for loop below runs `print("one run")` once for each element in a vector of character strings:
``` r
for (value in c("My", "first", "for", "loop")) {
print("one run")
}
## "one run"
## "one run"
## "one run"
## "one run"
```
The `value` symbol in a for loop acts like an argument in a function. The for loop will create an object named `value` and assign it a new value on each run of the loop. The code in your loop can access this value by calling the `value` object.
What values will the for loop assign to `value`? It will use the elements in the set that you run the loop on. `for` starts with the first element and then assigns a different element to `value` on each run of the for loop, until all of the elements have been assigned to `value`. For example, the for loop below will run `print(value)` four times and will print out one element of `c("My", "second", "for", "loop")` each time:
``` r
for (value in c("My", "second", "for", "loop")) {
print(value)
}
## "My"
## "second"
## "for"
## "loop"
```
On the first run, the for loop substituted `"My"` for `value` in `print(value)`. On the second run it substituted `"second"`, and so on until `for` had run `print(value)` once with every element in the set:
If you look at `value` after the loop runs, you will see that it still contains the value of the last element in the set:
``` r
value
## "loop"
```
I've been using the symbol `value` in my for loops, but there is nothing special about it. You can use any symbol you like in your loop to do the same thing as long as the symbol appears before `in` in the parentheses that follow `for`. For example, you could rewrite the previous loop with any of the following:
``` r
for (word in c("My", "second", "for", "loop")) {
print(word)
}
for (string in c("My", "second", "for", "loop")) {
print(string)
}
for (i in c("My", "second", "for", "loop")) {
print(i)
}
```
::: callout-warning
## Choose your symbols carefully
R will run your loop in whichever environment you call it from. This is bad news if your loop uses object names that already exist in the environment. Your loop will overwrite the existing objects with the objects that it creates. This applies to the value symbol as well.
:::
::: callout-tip
## For loops run on sets
In many programming languages, `for` loops are designed to work with integers, not sets. You give the loop a starting value and an ending value, as well as an increment to advance the value by between loops. The `for` loop then runs until the loop value exceeds the ending value.
You can recreate this effect in R by having a `for` loop execute on a set of integers, but don't lose track of the fact that R's `for` loops execute on members of a set, not sequences of integers.
:::
`for` loops are very useful in programming because they help you connect a piece of code with each element in a set. For example, we could use a `for` loop to run `score` once for each row in `combos`. However, R's `for` loops have a shortcoming that you'll want to know about before you start using them: `for` loops do not return output.
`for` loops are like Las Vegas: what happens in a `for` loop stays in a `for` loop. If you want to use the products of a `for` loop, you must write the `for` loop so that it saves its own output as it goes.
Our previous examples appeared to return output, but this was misleading. The examples worked because we called `print`, which always prints its arguments in the console (even if it is called from a function, a `for` loop, or anything else). Our `for` loops won't return anything if you remove the `print` call:
``` r
for (value in c("My", "third", "for", "loop")) {
value
}
##
```
To save output from a `for` loop, you must write the loop so that it saves its own output as it runs. You can do this by creating an empty vector or list before you run the `for` loop. Then use the `for` loop to fill up the vector or list. When the `for` loop is finished, you'll be able to access the vector or list, which will now have all of your results.
Let's see this in action. The following code creates an empty vector of length 4:
``` r
chars <- vector(length = 4)
```
The next loop will fill it with strings:
``` r
words <- c("My", "fourth", "for", "loop")
for (i in 1:4) {
chars[i] <- words[i]
}
chars
## "My" "fourth" "for" "loop"
```
This approach will usually require you to change the sets that you execute your `for` loop on. Instead of executing on a set of objects, execute on a set of integers that you can use to index both your object and your storage vector. This approach is very common in R. You'll find in practice that you use `for` loops not so much to run code, but to fill up vectors and lists with the results of code.
Let's use a `for` loop to calculate the prize for each row in `combos`. To begin, create a new column in `combos` to store the results of the `for` loop:
``` r
combos$prize <- NA
head(combos, 3)
## Var1 Var2 Var3 prob1 prob2 prob3 prob prize
## DD DD DD 0.03 0.03 0.03 0.000027 NA
## 7 DD DD 0.03 0.03 0.03 0.000027 NA
## BBB DD DD 0.06 0.03 0.03 0.000054 NA
```
The code creates a new column named prize and fills it with `NA`s. R uses its recycling rules to populate every value of the column with `NA`.
::: callout
## Exercise: Build a Loop
Construct a `for` loop that will run `score` on all 343 rows of `combos`. The loop should run `score` on the first three entries of the \_i_th row of `combos` and should store the results in the \_i_th entry of `combos$prize`.
:::
You can score the rows in `combos` with:
``` r
for (i in 1:nrow(combos)) {
symbols <- c(combos[i, 1], combos[i, 2], combos[i, 3])
combos$prize[i] <- score(symbols)
}
```
After you run the for loop, `combos$prize` will contain the correct prize for each row. This exercise also tests the `score` function; `score` appears to work correctly for every possible slot combination:
``` r
head(combos, 3)
## Var1 Var2 Var3 prob1 prob2 prob3 prob prize
## DD DD DD 0.03 0.03 0.03 0.000027 800
## 7 DD DD 0.03 0.03 0.03 0.000027 0
## BBB DD DD 0.06 0.03 0.03 0.000054 0
```
We're now ready to calculate the expected value of the prize. The expected value is the sum of `combos$prize` weighted by `combos$prob`. This is also the payout rate of the slot machine:
``` r
sum(combos$prize * combos$prob)
## 0.538014
```
Uh oh. The expected prize is about 0.54, which means our slot machine only pays 54 cents on the dollar over the long run. Does this mean that the manufacturer of the Manitoba slot machines *was* lying?
No, because we ignored an important feature of the slot machine when we wrote `score`: a diamond is wild. You can treat a `DD` as any other symbol if it increases your prize, with one exception. You cannot make a `DD` a `C` unless you already have another `C` in your symbols (it'd be too easy if every `DD` automatically earned you \$2).
The best thing about `DD`s is that their effects are cumulative. For example, consider the combination `B`, `DD`, `B`. Not only does the `DD` count as a `B`, which would earn a prize of \$10; the `DD` also doubles the prize to \$20.
Adding this behavior to our code is a little tougher than what we have done so far, but it involves all of the same principles. You can decide that your slot machine doesn't use wilds and keep the code that we have. In that case, your slot machine will have a payout rate of about 54 percent. Or, you could rewrite your code to use wilds. If you do, you will find that your slot machine has a payout rate of 93 percent, one percent higher than the manufacturer's claim. You can calculate this rate with the same method that we used in this section.
::: callout
## Exercise: Challenge
There are many ways to modify `score` that would count `DD`s as wild. If you would like to test your skill as an R programmer, try to write your own version of `score` that correctly handles diamonds.
If you would like a more modest challenge, study the following `score` code. It accounts for wild diamonds in a way that I find elegant and succinct. See if you can understand each step in the code and how it achieves its result.
:::
Here is a version of score that handles wild diamonds:
``` r
score <- function(symbols) {
diamonds <- sum(symbols == "DD")
cherries <- sum(symbols == "C")
# identify case
# since diamonds are wild, only nondiamonds
# matter for three of a kind and all bars
slots <- symbols[symbols != "DD"]
same <- length(unique(slots)) == 1
bars <- slots %in% c("B", "BB", "BBB")
# assign prize
if (diamonds == 3) {
prize <- 100
} else if (same) {
payouts <- c("7" = 80, "BBB" = 40, "BB" = 25,
"B" = 10, "C" = 10, "0" = 0)
prize <- unname(payouts[slots[1]])
} else if (all(bars)) {
prize <- 5
} else if (cherries > 0) {
# diamonds count as cherries
# so long as there is one real cherry
prize <- c(0, 2, 5)[cherries + diamonds + 1]
} else {
prize <- 0
}
# double for each diamond
prize * 2^diamonds
}
```
::: callout
## Exercise: Calculate the Expected Value
Calculate the expected value of the slot machine when it uses the new `score` function. You can use the existing `combos` data frame, but you will need to build a `for` loop to recalculate `combos$prize`.
:::
To update the expected value, just update `combos$prize`:
``` r
for (i in 1:nrow(combos)) {
symbols <- c(combos[i, 1], combos[i, 2], combos[i, 3])
combos$prize[i] <- score(symbols)
}
```
Then recompute the expected value:
``` r
sum(combos$prize * combos$prob)
## 0.934356
```
This result vindicates the manufacturer's claim. If anything, the slot machines seem more generous than the manufacturer stated.
## while Loops
R has two companions to the `for` loop: the `while` loop and the `repeat` loop. A `while` loop reruns a chunk *while* a certain condition remains `TRUE`. To create a `while` loop, follow `while` by a condition and a chunk of code, like this:
``` r
while (condition) {
code
}
```
`while` will rerun `condition`, which should be a logical test, at the start of each loop. If `condition` evaluates to `TRUE`, `while` will run the code between its braces. If `condition` evaluates to `FALSE`, `while` will finish the loop.
Why might `condition` change from `TRUE` to `FALSE`? Presumably because the code inside your loop has changed whether the condition is still `TRUE`. If the code has no relationship to the condition, a `while` loop will run until you stop it. So be careful. You can stop a `while` loop by hitting Escape or by clicking on the stop-sign icon at the top of the RStudio console pane. The icon will appear once the loop begins to run.
Like `for` loops, `while` loops do not return a result, so you must think about what you want the loop to return and save it to an object during the loop.
You can use `while` loops to do things that take a varying number of iterations, like calculating how long it takes to go broke playing slots (as follows). However, in practice, `while` loops are much less common than `for` loops in R:
``` r
plays_till_broke <- function(start_with) {
cash <- start_with
n <- 0
while (cash > 0) {
cash <- cash - 1 + play()
n <- n + 1
}
n
}
plays_till_broke(100)
## 260
```
## repeat Loops
`repeat` loops are even more basic than `while` loops. They will repeat a chunk of code until you tell them to stop (by hitting Escape) or until they encounter the command `break`, which will stop the loop.
You can use a `repeat` loop to recreate `plays_till_broke`, my function that simulates how long it takes to lose money while playing slots:
``` r
plays_till_broke <- function(start_with) {
cash <- start_with
n <- 0
repeat {
cash <- cash - 1 + play()
n <- n + 1
if (cash <= 0) {
break
}
}
n
}
plays_till_broke(100)
## 237
```
## Summary
You can repeat tasks in R with `for`, `while`, and `repeat` loops. To use `for`, give it a chunk of code to run and a set of objects to loop through. `for` will run the code chunk once for each object. If you wish to save the output of your loop, you can assign it to an object that exists outside of the loop.
Repetition plays an important role in data science. It is the basis for simulation, as well as for estimates of variance and probability. Loops are not the only way to create repetition in R (consider `replicate` for example), but they are one of the most popular ways.
Unfortunately, loops in R can sometimes be slower than loops in other languages. As a result, R's loops get a bad rap. This reputation is not entirely deserved, but it does highlight an important issue. Speed is essential to data analysis. When your code runs fast, you can work with bigger data and do more to it before you run out of time or computational power. [Speed](#sec-speed) will teach you how to write fast `for` loops and fast code in general with R. There, you will learn to write vectorized code, a style of lightning-fast code that takes advantage of all of R's strengths.