-
Notifications
You must be signed in to change notification settings - Fork 89
/
05-foundational-skills_1.Rmd
325 lines (205 loc) · 19.5 KB
/
05-foundational-skills_1.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
# Getting started with R and RStudio {#c05}
**Abstract**
This chapter provides readers with instructions on installing the software necessary to complete any of the subsequent walkthrough chapters. Readers who are familiar with R and RStudio, including installing and loading packages in R, may find this to be a helpful review or may prefer to skip the chapter entirely. This chapter also covers the basics of writing and running code in RStudio. It closes with an introduction to using the {swirl} package. It is designed to help you get started using R and RStudio, assuming no prior use of either.
This chapter, the following chapter (on [Foundational Skills](#c06)), and each of the walkthrough chapters include an overview of the topics emphasized and the functions introduced in the chapter.
## Topics emphasized
For all of the topics emphasized sections, we indicate topics that are unique to each chapter. However, there is a great deal of overlap between chapters. For example, visualizing data is regularly used as a part of carrying out a data science project in education. Therefore, consider these topics emphasized to be the particular focus of the chapters they reference.
- Installing R, RStudio, and R packages
## Functions introduced
For the "functions introduced" sections, notice that some functions look different than others. For example, `pak::pak()` is different than `install.packages()`.
The `pak()` function comes from a specific *package* (which we'll discuss in great depth in this and the following chapter). If you had a hunch that this function comes from the `pak` package, you'd be correct. The `::` symbols (described more in [Chapter 6](#c06)) mean that the function comes from a particular package. That way, you'll know which package you will need if you want to use the function. Not sure what some of these terms mean quite yet? Read on in this chapter to learn more about installing and using packages.
- `install.packages()`
- `pak::pak()`
- `library()`
- `print()`
- `readr::read_csv()`
- `here::here()`
- `swirl::swirl()`
- `swirl::install_course()`
## Chapter overview
This chapter is designed to help you to get started using R and RStudio, assuming no prior use of either.
We will be covering the following topics in this chapter:
- Downloading R and RStudio
- RStudio layout and customization
- Writing and running code in RStudio
- Installing the {dataedu} package
- Exploring R with the {swirl} package
If you already have experience using R and RStudio, you may find some of the contents of this chapter to be a refresher---or as a chance to learn a new things about getting set up. If you are looking to get started with the basics of data loading and manipulation using the {tidyverse} [@wickham2019] right now, consider reading this chapter and then starting [Chapter 6](#c06), which covers foundational skills.
## Downloading R and RStudio
First, you will need to download the latest versions of R [@rcoreteam] and RStudio [@rstudio]. R is a free environment for statistical computing and graphics using the programming language R.
RStudio is a set of integrated tools created by [Posit](https://posit.co/) that allow for a more user-friendly experience for using R.
Although you will likely use RStudio as your main console and editor, you must first install R as RStudio uses R behind the scenes. Both R and RStudio are freely available, cross-platform, and open-source.
### To download R:
- Visit [CRAN](https://cran.r-project.org/) (https:[]()//cran.r-project.org/) to download R
- Find your operating system (Mac, Windows, or Linux)
- Select the "latest release" on the page for your operating system
- Download and install the application
Don't worry---you won't mess anything up if you download the wrong file.
### To download RStudio:
- Visit [Posit's website](https://posit.co/downloads/) (https[]()://posit.co/downloads/) to download RStudio
- Under the column called "RStudio Desktop", click "Download RStudio"
- Click "Download RStudio" on the RStudio Desktop page, which should automatically detect your operating system
If you have problems installing these, consider [the Data Carpentry page](https://datacarpentry.org/R-ecology-lesson/) (https[]()://datacarpentry.org/R-ecology-lesson/) and then reach out for help. Another excellent place to get help is the [Posit Community forum](https://forum.posit.co/) (https[]()://forum.posit.co/).
## RStudio layout and customization: getting to know R through RStudio
Now that we've installed both R and RStudio, we will be accessing R _through_ RStudio. One of the most reliable ways to tell if you're opening R or RStudio is to look at the icons:
```{r fig5-1, fig.cap = "Icons", echo = FALSE, fig.alt="R Icon and RStudio Icon"}
knitr::include_graphics("./man/figures/Figure 5.1.png")
```
Whenever we want to work with R, we'll open RStudio. RStudio interfaces directly with R, and is an **I**ntegrated **D**evelopment **E**nvironment (IDE). This means that RStudio comes with built-in features that make using R easier. If you'd like more information on the difference between R and RStudio, consider reading the "Getting Started" section of the _[Modern Dive](https://moderndive.com/1-getting-started.html#)_ (https[]()://moderndive.com/1-getting-started.html#) textbook [@statisticalinf].
You do not _have_ to use RStudio to access R, and many people don't!
Other IDEs that work with R include:
- [Positron](https://github.com/posit-dev/positron/releases) (https[]()://github.com/posit-dev/positron/releases)
- [Jupyter Notebook](https://jupyter.org/) (https[]()://jupyter.org/)
- [Visual Studio Code](https://code.visualstudio.com/) (https[]()://code.visualstudio.com/)
- [VIM](https://github.com/jalvesaq/Nvim-R) (https[]()://github.com/jalvesaq/Nvim-R)
- [IntelliJ IDEA](https://plugins.jetbrains.com/plugin/6632-r-language-for-intellij) (https[]()://plugins.jetbrains.com/plugin/6632-r-language-for-intellij)
- [EMACS Speaks Statistics (ESS)](https://ess.r-project.org/) (https[]()://ess.r-project.org/)
This is a non-exhaustive list and most of these options require a good deal of familiarity with a given IDE.
We bring up alternative IDEs---particularly ESS---because RStudio, as of this writing, is not fully accessible for learners who use screen readers. We have chosen to use RStudio in this text in order to standardize the experience, but we encourage you to choose the IDE that best suits your needs.
### RStudio layout
When we open RStudio for the first time, we should see something similar to this:
```{r fig5-2, fig.cap = "RStudio Layout", echo = FALSE, fig.alt="Default RStudio layout", fig.alt="RStudio Default Layout with console, environment, and files"}
knitr::include_graphics("./man/figures/Figure 5.2.png")
```
We'll refer to these three "panes" as the "Console pane", the "Environment pane", and the "Files pane".
The large square on the left is the Console pane, the square in the top right is the Environment pane, and the square in the bottom right is the Files pane.
As you work with R more, you'll find yourself using the tabs within each of the panes.
When you create a new file, such as an R script, an R Markdown file, or a Shiny app, RStudio will open a fourth pane, known as the "Source pane". The Source pane should show up as a square in the top left. You can open an `.R` script in the Source pane by going to "File", selecting "New File", and then selecting "R Script":
```{r fig5-3, fig.cap = "Creating a New R Script in RStudio", echo = FALSE, fig.alt="Creating a New Script in RStudio by going to file then R script"}
knitr::include_graphics("./man/figures/Figure 5.3.png")
```
You do not need to do anything specific with this file, but we encourage you to experiment with it if you would like.
### Customizing RStudio
One of the balances to strike in this text is a balance between best practices in your _workflow_ (how you'll use R in your projects) and your _R code_. A best practice for your _workflow_ is to ensure that you're starting with a blank slate every time you open R through RStudio.
To accomplish this, go to "Tools", and select "Global Options" from the dropdown menu.
```{r fig5-4, fig.cap = "Selecting Global Options from the Tool Dropdown Menu", echo = FALSE, fig.alt="Selecting Global Options from the Tool Dropdown Menu"}
knitr::include_graphics("./man/figures/Figure 5.4.png")
```
The "General" tab will open, with several checkboxes selected and unselected. It's important to select "Never" next to the "Save workspace to .RData on exit:" prompt. After selecting "Never", go through and check and uncheck boxes so that your General tab looks like this:
```{r fig5-5, fig.cap = "General Tab from Global Options", echo = FALSE, fig.alt="General tab from the Global Options in RStudio"}
knitr::include_graphics("./man/figures/Figure 5.5.png")
```
Last, click on the "Appearance" tab from within Global Options. From here you can select your RStudio Font, Font Size, and Theme. Go through the options and select an appearance that works best for you, and know that you can always come back and change it.
### Minimized and missing panes
If, at any point, you find that one of your panes has disappeared, one of two things has likely happened:
- A pane has been minimized
- A pane has been closed
Look at the Environment pane as an example. If the Environment pane has been minimized, we'll see something like this:
```{r fig5-6, fig.cap = "RStudio Layout with the Environment Pane Minimized", fig.alt = "RStudio layout with a minimized Environment Pane", echo = FALSE}
knitr::include_graphics("./man/figures/Figure 5.6.png")
```
You'll know that the Environment pane has been minimized because, although we can see the pane headers in the top right, you can't see the information _within_ the Environment pane. To fix this, click on the icon of two squares in the top right of the Environment pane. If you click on the icon of the large square in the top right of the Environment pane, you'll maximize the Environment pane and minimize the Files pane. This is not preferred, since in general you'll want to see all the panes at once.
If the Environment pane has somehow been closed, you can recover it by going to the "View" menu, selecting "Panes", and then selecting "Pane Layout", like so:
```{r fig5-7, fig.cap = "Accessing the Pane Layout from the View Dropdown Menu", fig.alt = "Accessing the Pane Layout from the View Dropdown Menu", echo = FALSE}
knitr::include_graphics("./man/figures/Figure 5.7.png")
```
When you select Pane Layout, you'll see this:
```{r fig5-8, fig.cap = "Pane Layout Options within RStudio", echo = FALSE, fig.alt = "Pane Layout Options within RStudio"}
knitr::include_graphics("./man/figures/Figure 5.8.png")
```
From here, you can select which tabs you'd like to appear within each pane, and you can even change where each pane appears within RStudio. If the Environment Pane had been closed, select it from the Pane Layout in order to re-open it within RStudio.
## Writing and running code in RStudio
Up to this point, you've been exploring the RStudio interface and setting up your preferences. Now, you'll shift to basic coding practices.
In order to run code in R, you need to type your code either in the Console or within an `.R` script.
When possible, create an `.R` script as you're learning, as it allows you to type all of your code, add comments, and then save your `.R` script for reference. If you work entirely in the Console, anything that you type in the Console will disappear as soon as you restart or close R, and you will not be able to reference it in the future.
### Writing code in the console
To run code in the Console, you type your code next to the `>` and hit `Enter`. Practicing running code in the Console by exploring some basic properties of coding in R.
In the Console, type `3 + 4` and hit `Enter`.
You should see the following:
```{r fig5-9, fig.cap = "Using the Console as a Calculator", echo = FALSE, fig.alt = "Adding 3 and 4 on the console"}
knitr::include_graphics("./man/figures/Figure 5.9.png")
```
You've just used R to add the numbers 3 and 4. R has returned the sum of `3 + 4` on a new line, next to `[1]`. The `[1]` tells you that there is one row of data.
You can also use R to print out text. Type the following in the Console and hit `Enter`:
```{r, eval = FALSE}
print("I am learning R")
```
You should see this in the Console:
```{r fig5-10, fig.cap = "Printing Text to the Console", fig.alt = "Printing I am learning R on the console", echo = FALSE}
knitr::include_graphics("./man/figures/Figure 5.10.png")
```
There's one error that you're likely going to come across, both when running code in the Console as well as in an R script. Explore that error now by running the following code in the Console and hitting `Enter`:
```{r, eval = FALSE}
print("This is going to cause a problem"
```
Make sure that you left off the closing parenthesis. What you'll see in the Console is:
```{r fig5-11, fig.cap = "Incomplete Parentheses Change What R Expects Next", echo = FALSE, fig.alt="Printing this is going to cause a problem with the last parantheses missing"}
knitr::include_graphics("./man/figures/Figure 5.11.png")
```
When you're missing a closing parenthesis, R is expecting you to provide more code. You'll know this because instead of seeing a carat `>` in our Console, you see a `+`, and R has not returned the expected print statement.
There are two ways to fix this problem:
- Type the closing `)` in the Console and hit `Enter`
- Hit the `Esc` key
Run this intentional error, and try each of the options above. Compare the output of each, and think about how they're different. Can you think of when you might want to use one option instead of the other?
### Writing code in an R script
There are three main ways to run code in an `.R` script:
- Highlight the line(s) of code you'd like to run and press `Ctrl` + `Enter`
- Highlight the line(s) of code you'd like to run and click the "Run" button in the `R script` pane
- To run _every_ line of code in your file you can press `Ctrl` + `Shift` + `Enter`
Create a new `.R` script, or open the one you created earlier in this chapter. Next, type in the following code and run it using each of the options listed above.
```{r eval = FALSE}
print("We're going to use R as a calculator.")
print("First up, addition!")
12 + 8
632 + 41
print("Next, subtraction!")
48 - 6
0.65 - 1.42
```
Feel free to spend some more time writing and running code within your `.R` script, or move on to the next section, where you'll add comments to our code.
### Commenting your code in R
It is considered good practice to comment your code when working in an `.R` script. Even if you are the only person working on your code, it's helpful to write yourself notes about what you were trying to do with a specific piece of code.
Moreover, writing comments in your code as you work through the examples in this book is a great way to reinforce you're learning.
And remember, comments are ignored by R when running a script, so they will not affect your code or analysis.
To comment out a line of code, place a pound sign (also called an octothorpe) `#` in front of the line of code that you want to exclude when you're running your script.
Be careful when excluding certain lines of code, especially in longer files, as it can be easy to forget where you've commented. Rather than commenting out individual lines of code, it is often better to start a new section of code to tinker with until it works as expected.
You can also write comments in line with your code, like this:
```{r eval = FALSE}
#' This will be a short code example.
#' You are not expected to know what this does,
#' nor do you need to try running it on your computer.
library(readr) # Load the readr package
library(here) # Load the here package
data <- read_csv(here("file_path", "file_name.csv")) # Save file_name.csv as data
```
If you think you'll be writing more than one line of comments, you can do a pound sign followed by a single quotation mark (`#'`). This will continue to comment out lines of text or code each time you hit `Enter`. You can delete the `#'` on a new line where you want to write code for R to run. This method is useful when you're writing a long description of what you're doing in R.
_Note: "Commenting" refers to adding in actual text comments, whereas "commenting out" refers to using the pound sign (octothorpe) in front of a line of code so that R ignores it._
_You'll also note the phrase "uncomment code", which means you should delete or omit the_ `#` _or_ `#'` _in an example._
## Installing the {dataedu} package
This next section will go over installing the {dataedu} package that's used throughout this book. This package provides readers with resources for using this book.
The package serves four main functions:
1. Mass installation of all the packages used in the book
2. Reproducible code for the walkthroughs
3. Access to the data used in each of the walkthroughs
4. The "dataedu" theme and color palette for reuse
If you feel that you need more information before you're ready to install the package, you can skip this section. You'll learn about packages, their installation, and how to load them into R in more depth in [Chapter 6](#c06).
However, if you're feeling adventurous, give it a shot by running the code below. *Please note that the {dataedu} package requires R version 3.6 or higher to run.*
```{r, eval = FALSE}
# Install {pak}
install.packages("pak")
# Install the {dataedu} package (requires R version 3.6 or higher)
pak::pak("data-edu/dataedu")
```
*A special note on {tabulizer}:*
One of the walkthroughs uses [{tabulapdf}](https://github.com/ropensci/tabulapdf), created by ROpenSci to read PDFs.
{tabulapdf} requires the installation of [RJava](https://cran.r-project.org/web/packages/rJava/index.html), which can be a tricky process on Mac computers.
Neither {tabulapdf} nor {RJava} are included in `mass_install()`. Reading through the notes on the [{tabulapdf} GitHub repository](https://github.com/ropensci/tabulapdf) if you choose to install it.
## Exploring R with the {swirl} package
If you were able to install the {dataedu} package without problems and you're eager to start exploring what R can do, supplement your learning through [{swirl}](https://swirlstats.com/students.html) (https[]()://swirlstats.com/students.html).
Install {swirl} by running the following code:
```{r, eval = FALSE}
install.packages("swirl")
```
{swirl} is a set of packages (see more on packages in [Chapter 6](#c06)) providing an interactive method for learning R by using R in the RStudio Console.
Since you've already installed R, RStudio, and the {swirl} package, you can follow the instructions on the {swirl} webpage or run the following code _in your Console pane_ to get started with a beginner-level course in {swirl}:
```{r, eval = FALSE}
library(swirl)
install_course("R_Programming_E")
swirl()
```
There are multiple courses available on {swirl}, and you can access them by installing them and then running the `swirl()` command in your console.
The authors are not affiliated with {swirl} in any way, nor is {swirl} required to progress through this text. But it is a great resource to keep on your radar!
## Conclusion
Congratulations! At this point in the book, you've installed R and RStudio, explored the RStudio IDE, and written some basic code.
At this point, you're set up to move on to [Chapter 6](#c06), where you'll do a deeper dive into projects, packages, and functions, and how those relate to your future data tasks. You'll also learn about documentation.
Alternatively, if that sounds familiar to you already, feel free to jump ahead to a walkthrough of your choosing.