Selected Reference Solution For Chapter 3

Exercise 3.4

Exercise 3.5

s	a	s'	r	p(s',r / s,a)
high	search	high	r_search	alpha
high	search	low	r_search	1 - alpha
low	search	high	r_search	0
low	search	high	-3	1 - beta
low	search	low	r_search	beta
high	wait	high	r_wait	1
high	wait	low	r_wait	0
low	wait	high	r_wait	0
low	wait	low	r_wait	1
low	recharge	high	0	1
low	recharge	low	r_rechar	0

Exercise 3.6

Exercise 3.7

The return at each time step is still (-1 * gamma ^ K). For episodic pole-balance, 'K' will be reset to 0 after each failure, where continuing task won't but increase whatever.

Exercise 3.9

G5 = 0 G4 = R5 + 0.5G5 = 2
G3 = R4 + 0.5G4 = 3 + 1 = 4
G2 = R3 + 0.5G3 = 6 + 2 = 8
G1 = R2 + 0.5G2 = 2 + 4 = 6
G0 = R1 + 0.5*G1 = -1 + 3 = 2

Exercise 3.10

G1 = 7 / (1-0.9) = 70
G0 = R1 + 0.9*G1 = 2 + 63 = 65

Exercise 3.11

Exercise 3.13

Exercise 3.14

Exercise 3.16

Exercise 3.17

Exercise 3.21

v*(A) = 10 + v*(A') = 10 + 0.9*(0 + 0.9 v*(1, 3)) = .... = 10 + 0.9^5 v*(A)
v*(A) = 10 / (1 - 0.9^5)

Exercise 3.22

Exercise 3.23,24,25,26

Experiments Results For Chapter 3

Grid World

WorldSize = 5, discount = 0.9, Optimal Policy


21.98	24.42	21.98	19.42	17.48
19.78	21.98	19.78	17.80	16.02
17.80	19.78	17.80	16.02	14.42
16.02	17.80	16.02	14.42	12.98
14.42	16.02	14.42	12.98	11.68

State Value _Converge Process_ Using Optimal Policy

Iteration 1 time


0.00	10.00	0.00	5.00	0.00
0.00	0.00	0.00	0.00	0.00
0.00	0.00	0.00	0.00	0.00
0.00	0.00	0.00	0.00	0.00
0.00	0.00	0.00	0.00	0.00

Iteration 10 times


14.31	15.90	14.31	13.24	11.65
12.88	14.31	12.88	11.65	10.44
11.59	12.88	11.59	10.44	8.24
10.44	11.59	10.44	8.24	7.15
5.90	10.44	5.90	7.15	5.10

Iteration 20 times


19.31	21.45	19.31	16.99	15.29
17.37	19.31	17.37	15.64	14.07
15.64	17.37	15.64	14.07	11.99
14.07	15.64	14.07	11.99	10.79
11.45	14.07	11.45	10.79	9.27

Iteration 50 times


21.86	24.29	21.86	19.29	17.36
19.68	21.86	19.68	17.71	15.94
17.71	19.68	17.71	15.94	14.29
15.94	17.71	15.94	14.29	12.86
14.29	15.94	14.29	12.86	11.58

Iteration 80 times


21.97	24.41	21.97	19.41	17.47
19.78	21.97	19.78	17.80	16.02
17.80	19.78	17.80	16.02	14.41
16.02	17.80	16.02	14.41	12.97
14.41	16.02	14.41	12.97	11.68

Iteration 100 times


21.98	24.42	21.98	19.42	17.48
19.78	21.98	19.78	17.80	16.02
17.80	19.78	17.80	16.02	14.42
16.02	17.80	16.02	14.42	12.98
14.42	16.02	14.42	12.98	11.68

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

selected-solutions.md

selected-solutions.md

Selected Reference Solution For Chapter 3

Experiments Results For Chapter 3

Grid World

Iteration 1 time

Iteration 10 times

Iteration 20 times

Iteration 50 times

Iteration 80 times

Iteration 100 times

Files

selected-solutions.md

Latest commit

History

selected-solutions.md

File metadata and controls

Selected Reference Solution For Chapter 3

Experiments Results For Chapter 3

Grid World

Iteration 1 time

Iteration 10 times

Iteration 20 times

Iteration 50 times

Iteration 80 times

Iteration 100 times