-
Notifications
You must be signed in to change notification settings - Fork 7
/
Copy pathsession5_notes.qmd
304 lines (226 loc) · 15.2 KB
/
session5_notes.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
# Reproducible research with RMarkdown
```{r setup, include=FALSE}
pacman::p_load(tidyverse)
knitr::opts_chunk$set(echo = TRUE, collapse = TRUE, message = FALSE)
csp_2020 <- read_csv("data/CSP_2020.csv")
csp_long2 <- read_csv("data/CSP_long_201520.csv")
```
## Introduction to RMarkdown
RMarkdown is a tool that is used to author high-quality documents, making it easy to communicate results efficiently. One of the main appeals of RMarkdown is that it is easy to integrate R code and output seamlessly into a document, encouraging openness and reproducibility in research.
There are a number of ways we can use RMarkdown to enhance the research process, such as:
- Generating reports to show the latest findings in a project, combining research output with interpretations. Reports can be automated within RMarkdown, to ensure outputs show the latest data.
- Keeping track of projects as an alternative to a notebook. Documents can include R code and visualisations alongside thoughts and comments on findings so far.
- Creating a collaborative document that can be shared with colleagues. The inclusion of R code in documents provides an audit trail, making it easy to carry out quality assurance and resolve discrepancies in results.
Before we begin working with RMarkdown in RStudio, we must first download and install the `rmarkdown` package as we would any other package:
```{r install and load rmarkdown, eval = FALSE}
install.packages("rmarkdown")
library(rmarkdown)
```
### Creating an RMarkdown files
RMarkdown files (`.Rmd`) are created and saved separately to the script files we have been using up to now on the course. To create a new RMarkdown file, either use the drop-down menu, following the *File -> New File -> R Markdown...* options, or using the ![new file icon](images/new_file_shortcut.png) icon and selecting *R Markdown...*.
![Rmarkdown new file](images/rmarkdown_new.png)
When creating a new RMarkdown file, we are given the option to set the title, author and date of the new document. We are also given options to select the type of document, presentation, or Shiny app we would like to create. This does not give a comprehensive list of documents available within RMarkdown and can be changed later. We will discuss output document types in more detail later in the session.
Clicking 'OK' on this window will produce an RMarkdown file (`.Rmd`) with some example code. If we do not want this, there is an option to 'Create Empty Document' on the bottom left of the window.
### Rmarkdown content
RMarkdown files contain three main types of content:
- A YAML header (this sets the global options for the document)
- Text, or syntax (this includes headings and comments)
- Code chunks containing R code
#### The YAML header
The first part of an RMarkdown script, surrounded by '`---`' is known as the **YAML header**. This sets global options for the document that will be produced by the script. YAML headers can include the title, author and date of a document, the output document type, table of contents options, and can include code to edit the appearance of text and figures.
For this course, we will just use the YAML to define the `title`, `author`, `date`, and `output` of our document:
````markdown
---
title: "Introduction to R with Tidyverse"
author: My name
date: 2024-07-15
output: html_document
---
````
There are many output document types that can be produced using RMarkdown. Some of the most common include:
- `html_document`:HTML document,`.html`
- `pdf_document`: PDF document,`.pdf`, created using a LaTeX template
- `word_document`: Microsoft word document (`.docx`)
- `odt_document`: OpenDocument text document (`.odt`), similar to Microsoft Word/Google Docs but compatible with free word processors)
- `github_document`: Github document (`.md`, markdown files that are compatible with Github and are converted to HTML when viewed there)
- `powerpoint_presentation`: Powerpoint presentation (`.pptx`)
RMarkdown can also be combined with other R packages to create books (via `bookdown`), websites (via `blogdown`) and interactive dashboards (via `flexdashboard`).
#### RMarkdown syntax
RMarkdown text, or syntax, will generally make up the majority of a RMarkdown file. This can include headers and subheadings, equations, and any other text or comments in the document. Text is formatted using **markdown syntax**. A detailed list of syntax commands are given in the RMarkdown [cheatsheet](https://rstudio.github.io/cheatsheets/html/rmarkdown.html#write-with-markdown). Common syntax commands that may be used in an RMarkdown document include:
- `*italic*`
- `**bold**`
- Add ``` `code` ``` into text
- ```# Header 1```
- ```## Header 2```
- ...
- ```###### Header 6```
- ``` [This is a link](link url)```
- ``` ![caption](image.png) ```
```
- Unordered list
- List with indent
```
```
1. Ordered list
- With indent
2. Second item
```
```Equation: $r^2 = (x - a)^2 + (y - b)^2$```
*RMarkdown equations are built using the same language as LaTeX. [See here](https://oeis.org/wiki/List_of_LaTeX_mathematical_symbols) for a list of mathematical symbols that can be used in these equations.*
#### Code chunks
Code chunks allow us to embed R code and outputs into our documents. This is one of the main draw of RMarkdown as it removes the need to copy and paste or import results from R into another document.
Code chunks are pieces of code that begin ` ```{r} `
and end ` ``` `. For example,
````{verbatim}
```{r}
1 + 1
```
````
Code chunks can be created by manually typing these wrappers, by clicking the ![code chunk icon](images/code_chunk_icon.png) icon and selecting 'R', or using the keyboard shortcut `ctrl + alt + i` on Windows, and `Command + Option + i` on Mac.
Code chunks can be given titles to make an RMarkdown script easier to navigate (these will appear under the script window where lists of subheadings appear in script files). These are added inside the opening of the code chunk: ` ```{r chunk title} `.
There are a number of chunk options to customise which code/output to display in the document. These are included in the opening of a chunk, after the title, separated by commas `,`. Some of the most common, include:
- `echo = TRUE/FALSE`: whether to display code in the output document
- `eval = TRUE/FALSE`: whether to run the code in the chunk or not
- `include = TRUE/FALSE`: whether to include anything from the chunk (code or output) in the document
- `error/warning/message = TRUE/FALSE`: whether to display error/warning/other messages in the document
**Top tip:** It may be useful to add a setup code chunk at the beginning of an RMarkdown file that loads any packages and datasets that are required for the rest of the document. These can also include universal options for future code chunks to avoid repeating the code, using the `knitr::opts_chunk$set` function.
For example:
````{verbatim}
```{r setup, include = FALSE}
# Set global options for code chunks
# Do not show any R code or messages unless specified
knitr::opts_chunk$set(echo = FALSE, message = FALSE)
# Load in the tidyverse package
library(tidyverse)
```
````
### Compiling RMarkdown documents
Compiling RMarkdown actually requires multiple steps and programmes. Luckily for us, this process takes place in the background so we don't need to be aware of these steps happening!
Generating an output file from RMarkdown is know **knitting** a document. This process sends the `.Rmd` file to another R package `knitr` (which is installed alongside `rmarkdown`), which executes all the code chunks in the document and creates a markdown `.md` file including the code and output. This markdown file is then processed by another programme **pandoc** which converts markdown code into the finished document.
![RMarkdown to document process](images/rmarkdown_pipe.png)
To **knit** an RMarkdown file in RStudio is very simple. Either click the ![knit icon](images/knit_icon.png) icon above the RMarkdown script, or use the keyboard shortcut `ctrl + shift + k` on Windows or `Command + shift + k` on Mac. This initiates the process above and will return an output document (if there are no errors!) in the requested format to the working directory.
### Data visualisation in RMarkdown
Output such as graphs and tables can be embedded in code chunks, the code used to create them will be the same as it would be in any other R script.
:::{.callout-note}
Often, when providing output in RMarkdown, we often do not want to show the code that was used to create this. Make sure to add `echo = FALSE` to the opening of the code chunk.
:::
#### Graphs in RMarkdown
`ggplot` can be used to create graphs that are embedded within code chunks and included in an output document. For example, we could use the data from previous sections to show the relationship between Settlement Funding Assessment (SFA) and council tax total in English local authorities in 2020, colour code by regions in a scatterplot:
````{verbatim}
```{r scatterplot sfa_2020 and ct_total_2020 by region, message = FALSE}
# Load and tidy the 2020 data
read_csv("data/CSP_2020.csv") %>%
# Remove the Greater London Authority row
filter(authority != "Greater London Authority") %>%
# Convert region variable to factor
mutate(region_fct = factor(region,
levels = c("L", "NW", "NE", "YH", "WM",
"EM", "EE", "SW", "SE"))) %>%
# Create a ggplot
ggplot() +
# Scatterplot definition
geom_point(aes(x = ct_total_2020, y = sfa_2020, colour = region_fct)) +
# Add colour palette for region
scale_colour_manual(values = c("aquamarine2", "blue", "chartreuse2",
"coral", "orchid", "firebrick",
"gold3", "violetred", "grey50")) +
# Change axis/label titles
labs(x = "Council tax total (£millions)",
y = "Settlement funding assessment (£milliongs)",
colour = "Region") +
theme_light()
```
````
```{r scatterplot sfa_2020 and ct_total_2020 by region, echo = FALSE, message = FALSE}
# Load and tidy the 2020 data
read_csv("data/CSP_2020.csv") %>%
# Remove the Greater London Authority row
filter(authority != "Greater London Authority") %>%
# Convert region variable to factor
mutate(region_fct = factor(region,
levels = c("L", "NW", "NE", "YH", "WM",
"EM", "EE", "SW", "SE"))) %>%
# Create a ggplot
ggplot() +
# Scatterplot definition
geom_point(aes(x = ct_total_2020, y = sfa_2020, colour = region_fct)) +
# Add colour palette for region
scale_colour_manual(values = c("aquamarine2", "blue", "chartreuse2",
"coral", "orchid", "firebrick",
"gold3", "violetred", "grey50")) +
# Change axis/label titles
labs(x = "Council tax total (£millions)",
y = "Settlement funding assessment (£milliongs)",
colour = "Region") +
theme_light()
```
#### Tables in RMarkdown
There are a number of ways to include tables within RMarkdown which can either be entered manually, or generated using an R package. The choice of approach to creating tables depends on the format and size of the data, the amount of flexibility you would like to customise the output, the type of output document you are creating, and personal preference of how it should look.
In this course, we will look at how tables can be created using RMarkdown syntax (without the need for additional packages), and using the `kable` function within the `knitr` package.
**Manually creating tables**
Tables can be created in RMarkdown syntax, using the `|` symbol to separate columns, and dashes `-` to separate column headings from the body of the table. These are created outside of code chunks within the text. For example,
```` {verbatim}
Header 1 | Header 2 | Header 3 |
|----------|----------|----------|
| This | Is | A |
| Very | Simple | Table |
````
produces the following output:
| Header 1 | Header 2 | Header 3 |
|----------|----------|----------|
| This | Is | A |
| Very | Simple | Table |
Colons can be added to the header/body separator row of the table to control the justification of the text in each column. For example,
```` {verbatim}
| Left | Right | Center | Default |
|:-----|------:|:-------:|-----------|
| This | Is | Another | Simple |
| Table | But | It is | Justified |
````
produces the following output:
| Left | Right | Center | Default |
|:-----|------:|:-------:|-----------|
| This | Is | Another | Simple |
| Table | But | It is | Justified |
**Creating tables from data frames**
The `knitr` package that compiles RMarkdown files contains the `kable` function that can be used to create simple data tables. The `kable` function requires data to be stored as a matrix, data frame, or tibble object (although these can be easily created using the `matrix`, `data.frame` or `tibble` functions). Accessing the help file `?knitr::kable` gives a list of arguments that can be used to customise these tables.
:::{.callout-note}
As these tables are created using R functions, they must be generated within a code chunk.
:::
For example, we can create a simple data table using `kable` showing the first 6 rows of the `mtcars` dataset (a dataset pre-loaded into base R):
````{verbatim}
```{r mtcars table, echo = FALSE}
knitr::kable(head(mtcars))
```
````
which produces the following output:
```{r mtcars table, echo = FALSE}
knitr::kable(head(mtcars))
```
Where data are not already saved as an object, we need to create them first before generating a table. For example, the table we created manually earlier can be recreated using the `kable` function, by first creating a data frame with the information, and then piping it through to the function:
````{verbatim}
```{r kable table}
# Create a data frame with the table information
data.frame(col1 = c("This", "Very"),
col2 = c("Is", "Simple"),
col3 = c("A", "Table")) %>%
knitr::kable(., col.names = c("Header 1", "Header 2", "Header 3"))
```
````
```{r kable table run, echo = FALSE}
# Create a data frame with the table information
data.frame(col1 = c("This", "Very"),
col2 = c("Is", "Simple"),
col3 = c("A", "Table")) %>%
knitr::kable(., col.names = c("Header 1", "Header 2", "Header 3"))
```
**Other ways to create tables**
Although the `kable` function and RMarkdown syntax tables do not require additional R packages to be installed, they are fairly simple and do not give many options to customise the tables. For a more flexible alternative, I would recommend looking at the `flextable` package, which gives a much wider range of customisible features. The `flextable` user manual can be accessed for free from [this website](https://ardata-fr.github.io/flextable-book/).
### Exercise 8 {.unnumbered}
Create an RMarkdown file that creates a html report describing the trends in core spending power in English local authorities between 2015 and 2020. Your report should include:
- A summary table of the total spending per year per region
- A suitable visualisation showing how the total annual spending has changed over this period, compared between regions
- A short interpretation of the table and visualisation
:::{.callout-note}
You are not expected to be an expert in this data! Interpret these outputs as you would any other numeric variable measured over time.
:::