-
Notifications
You must be signed in to change notification settings - Fork 1
/
00_01_3_IntroToR_solutions.qmd
239 lines (178 loc) · 6.43 KB
/
00_01_3_IntroToR_solutions.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
---
title: "P01. Introduction to R, part 3: solutions"
---
**A. Read in the same data as before**
This file can be downloaded from [here](data/TB_stats.txt).
```{r eval = F}
myTBdata <- read.table("TB_stats.txt", header=TRUE)
```
```{r echo = F}
myTBdata <- read.table("data/TB_stats.txt", header=TRUE)
```
**B. Plot the mortality in HIV negative against HIV positive** check the plot function help file
```{r eval = F}
?plot
```
`plot` is a generic function, and depending on what type of data you pass the function R will use different sub-functions (you dont need to worry about how it handles this!).
```{r}
# make the plot
plot(x=myTBdata$HIV_neg_TB_mortality, y=myTBdata$HIV_pos_TB_mortality)
```
**C. Add meaningful axes labels**
```{r}
plot(x=myTBdata$HIV_neg_TB_mortality,
y=myTBdata$HIV_pos_TB_mortality,
xlab="Mortality in HIV negative people",
ylab="Mortality in HIV positive people")
```
**D. Add a meaningful title**
```{r}
plot(x=myTBdata$HIV_neg_TB_mortality,
y=myTBdata$HIV_pos_TB_mortality,
xlab="Mortality in HIV negative people",
ylab="Mortality in HIV positive people",
main="Comparison of mortality in HIV negative and positive")
```
**E. Change the colour of the points to red**
```{r}
plot(x=myTBdata$HIV_neg_TB_mortality,
y=myTBdata$HIV_pos_TB_mortality,
xlab="Mortality in HIV negative people",
ylab="Mortality in HIV positive people",
main="Comparison of mortality in HIV negative and positive",
col="red")
```
**F. It's hard to see the numbers because some are small and some very large**
Using a log scale is useful for that.You can either log the values and re-plot, or use the log option in `plot()`
```{r}
plot(x=myTBdata$HIV_neg_TB_mortality,
y=myTBdata$HIV_pos_TB_mortality,
xlab="Mortality in HIV negative people",
ylab="Mortality in HIV positive people",
main="Comparison of mortality in HIV negative and positive",
col="red",
log="xy")
```
**G. Now let's make a different kind of plot**
Show the distribution of Total_TB_mortality in a histogram and then change the x axis label. Note, same options as before.
```{r}
hist(myTBdata$Total_TB_mortality)
hist(myTBdata$Total_TB_mortality,
xlab="Number")
```
Now add a meaningful title to the plot
```{r}
hist(myTBdata$Total_TB_mortality,
xlab="Number",
main = "Total TB mortality")
```
**H. Check what other aspects of the histogram you can change**
```{r eval = F}
?hist
```
Then change the color to "blue" in the last plot
```{r}
hist(myTBdata$Total_TB_mortality,
xlab="Number",
main="Total TB mortality",
col="blue")
```
**I. Now let's plot a histogram of mortality per 1000**
And add a title, and x axis label.
**Hint:** calculate it as in the previous practical
```{r}
myTBdata[,"MortalityPer1000"] <- myTBdata[,"Total_TB_mortality"]*1000/myTBdata[,"Population"]
hist(myTBdata$MortalityPer1000,
xlab = "Mortality per 1000",
main = "TB mortality per 1000 population")
```
change the color to something different hint: to find more colours, run "colors()" or google "Colors in R"
```{r}
hist(myTBdata$MortalityPer1000,
xlab = "Mortality per 1000",
main = "TB mortality per 1000 population",
col = "dodgerblue1")
```
**J. Now let's show both histograms at the same time** you need to make a call to "par", short for parameters, setting the plot parameter "mfrow" (Multi-Figure ROW-wise) gives 1 row, and 2 columns of plot
```{r}
par(mfrow=c(1,2))
hist(myTBdata$Total_TB_mortality,
xlab="Number",
main="Total TB mortality",
col="blue")
hist(myTBdata$MortalityPer1000,
xlab = "Mortality per 1000",
main = "TB mortality per 1000 population",
col = "dodgerblue1")
```
cut and paste your plot code from I. here and run it. Then resize the plot window and see what happens
**K. Export the figure and save it as a PNG with a useful name**
**Hint:** use the Export button in the plot window
**L. R has functions for every kind of plot** for example:
```{r eval = F}
?barplot
?boxplot
?contour
```
and stackoverflow.com has a lot of comments and help on every kind of plot
**Advanced plotting exercises**\
Make a plot where:
- x=Total_TB_mortality and y=HIV_pos_TB_mortality
- both aces are on the on the log scale
- colour the points
- add axis labels and a title
```{r}
plot(x=myTBdata$Total_TB_mortality,
y=myTBdata$HIV_pos_TB_mortality,
xlab="Total TB mortality",
ylab="Mortality in HIV positive people",
main="Total TB mortality vs in HIV positive people",
col="red",
log="xy")
```
add HIV_neg_TB_mortality on the same y axis, in a different colour.
**Hint:** use points(). see `?points` for information
```{r}
plot(x=myTBdata$Total_TB_mortality,
y=myTBdata$HIV_pos_TB_mortality,
xlab="Total TB mortality",
ylab="Mortality in HIV positive people",
main="Total TB mortality vs in HIV positive people",
col="red",
log="xy")
points(x=myTBdata$Total_TB_mortality,
y=myTBdata$HIV_neg_TB_mortality,
col="blue")
```
do you need to change the y axis label? i.e. does it still make sense now that it shows negative and positive mortality?
**Answer:** You'll need to change the y axis label.
Some of the points no longer fit on the graph. Why is this? You need to alter the y limit (ylim), which is an option of plot. What value will you choose?
**Hint:** the maximum value that the data go to change the ylim of the plot.
```{r}
plot(x=myTBdata$Total_TB_mortality,
y=myTBdata$HIV_pos_TB_mortality,
xlab="Total TB mortality",
ylab="Mortality in HIV positive people",
main="Total TB mortality vs in HIV positive people",
col="red",
log="xy",
ylim=c(1, max(myTBdata$HIV_neg_TB_mortality, myTBdata$HIV_pos_TB_mortality)))
points(x=myTBdata$Total_TB_mortality, y=myTBdata$HIV_neg_TB_mortality,
col="blue")
```
The plot now has 2 data sets in different colours, so it needs a legend check the help of legend (there's a lot of options!)
**Hint:** use x="topright" instead of setting the x and y values for location.
**Hint:** use the option "fill" to change the colours
```{r}
plot(x=myTBdata$Total_TB_mortality,
y=myTBdata$HIV_pos_TB_mortality,
xlab="Total TB mortality",
ylab="Mortality in HIV positive people",
main="Total TB mortality vs in HIV positive people",
col="red",
log="xy",
ylim=c(1, max(myTBdata$HIV_neg_TB_mortality, myTBdata$HIV_pos_TB_mortality)))
points(x=myTBdata$Total_TB_mortality, y=myTBdata$HIV_neg_TB_mortality,
col="blue")
legend(x="topright", legend=c("HIV pos", "HIV neg"), fill=c("red", "blue"))
```